Git development

Git development
 help / color / mirror / Atom feed

* Re: JGit vs. Git
From: Johannes Schindelin @ 2009-01-06 21:41 UTC (permalink / raw)
  To: Vagmi Mudumbai; +Cc: git
In-Reply-To: <a55cfe9d0901052250k2be203dfvb0b437a523f2cecc@mail.gmail.com>

Hi,

On Tue, 6 Jan 2009, Vagmi Mudumbai wrote:

> I am working on Windows with msysGit behind a HTTP Proxy. (Life cant
> get worse, I guess.) .

FWIW I think all should work well if you use a proxy, such as 
http://www.meadowy.org/~gotoh/projects/connect

Hth,
Dscho

^ permalink raw reply

* Re: [PATCH/RFC v2 2/4] Use 'lstat_cache()' instead of 'has_symlink_leading_path()'
From: Linus Torvalds @ 2009-01-06 21:08 UTC (permalink / raw)
  To: Kjetil Barvik; +Cc: git, Junio C Hamano
In-Reply-To: <1231274192-30478-3-git-send-email-barvik@broadpark.no>

On Tue, 6 Jan 2009, Kjetil Barvik wrote:
?
> Start using the optimised, faster and more effective symlink/directory
> cache.  The previously used call:
> 
>    has_symlink_leading_path(len, name);
> 
> should be identically with the following call to lstat_cache():
> 
>    lstat_cache(len, name,
>                LSTAT_SYMLINK|LSTAT_DIR,
>                LSTAT_SYMLINK);

I think the new interface looks worse.

Why don't you just do a new inline function that says

	static inline int has_symlink_leading_path(int len, const char *name)
	{
		return lstat_cache(len, name,
			LSTAT_SYMLINK|LSTAT_DIR,
			LSTAT_SYMLINK);
	}

and now you don't need this big patch, and people who don't care about 
those magic flags don't need to have them. End result: more readable code.

Then, the new users that want _new_ semantics can use the extended 
version.

This is how git has done pretty much all "generalized" versions. See the 
whole ce_modified() vs ie_modified() thing: they're the same function, 
it's just that 'ce_modified()' is the traditional simpler interface that 
works on the default index, while ie_modified() is the "full" version that 
takes all the details that most uses don't even want to know about.

				Linus

^ permalink raw reply

* Re: [JGIT PATCH] Improve the sideband progress scaper to be more accurate
From: Nicolas Pitre @ 2009-01-06 20:55 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Robin Rosenberg, git
In-Reply-To: <20090106192747.GD24578@spearce.org>

On Tue, 6 Jan 2009, Shawn O. Pearce wrote:

> Nicolas Pitre <nico@cam.org> wrote:
> > There may indeed be line fragments sent over the sideband channel, as 
> > well as the opposite which is multiple lines sent at once in a single 
> > packet.  If you look at sideband.c you'll find about all those cases.
> 
> Thanks, that's what I thought...
>  
> > In general, what you have to do is:
> > 
> >  - for each packet:
> >    - split into multiple chunks on line breaks ('\r' or '\n')
> >    - for each chunk:
> >      - if last chunk didn't end with a line break, or if current 
> >        chunk is empty or only contains a line break, then skip printing 
> >        the "remote:" prefix.  Otherwise print it.
> >      - print the current chunk up to any line break
> >      - if current chunk contains a line break and other characters then
> >        print a sequence to clear the remaining of the screen line
> >      - print the line break if any
> 
> Hmm.  I should note that C Git still screws this up sometimes.  I've
> seen 1.6.1 git fetch mess up the output from repo.or.cz's sideband.
> I'm sure Pasky isn't running JGit's daemon, its too damn fast. :-)
> 
> I don't have a spew of it handy, but late last week I saw it screw
> up while doing a clone off repo.or.cz.

Next time you see such things please send me a snapshot.

What is still possible and IMHO not worth the effort to fix is some 
interaction between local and remote displays which, if intermixed in 
some unlucky way, do screw up the final appearance.  For example, if the 
remote has sent a partial line, and before the remaining of it is 
received/printed you actually have some local process also displaying a 
line of its own.  If that was to become a real issue, then the fix would 
involve caching any partial line sent from the remote until the 
associated line break is received.

Nicolas

^ permalink raw reply

* [PATCH 4/3] shortlog: handle multi-line subjects like log --pretty=oneline et. al. do
From: René Scharfe @ 2009-01-06 20:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: markus.heidelberg, git
In-Reply-To: <7vocynz8y6.fsf@gitster.siamese.dyndns.org>

The commit message parser of git shortlog used to treat only the first
non-empty line of the commit message as the subject.  Other log commands
(e.g. --pretty=oneline) show the whole first paragraph instead (unwrapped
into a single line).

For consistency, this patch borrows format_subject() from pretty.c to
make shortlog do the same.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
 builtin-shortlog.c |    9 ++++++---
 pretty.c           |    4 ++--
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/builtin-shortlog.c b/builtin-shortlog.c
index d03f14f..e492906 100644
--- a/builtin-shortlog.c
+++ b/builtin-shortlog.c
@@ -29,6 +29,9 @@ static int compare_by_number(const void *a1, const void *a2)
 		return -1;
 }
 
+const char *format_subject(struct strbuf *sb, const char *msg,
+			   const char *line_separator);
+
 static void insert_one_record(struct shortlog *log,
 			      const char *author,
 			      const char *oneline)
@@ -41,6 +44,7 @@ static void insert_one_record(struct shortlog *log,
 	size_t len;
 	const char *eol;
 	const char *boemail, *eoemail;
+	struct strbuf subject = STRBUF_INIT;
 
 	boemail = strchr(author, '<');
 	if (!boemail)
@@ -89,9 +93,8 @@ static void insert_one_record(struct shortlog *log,
 	while (*oneline && isspace(*oneline) && *oneline != '\n')
 		oneline++;
 	len = eol - oneline;
-	while (len && isspace(oneline[len-1]))
-		len--;
-	buffer = xmemdupz(oneline, len);
+	format_subject(&subject, oneline, " ");
+	buffer = strbuf_detach(&subject, NULL);
 
 	if (dot3) {
 		int dot3len = strlen(dot3);
diff --git a/pretty.c b/pretty.c
index 343dca5..421d9c5 100644
--- a/pretty.c
+++ b/pretty.c
@@ -486,8 +486,8 @@ static void parse_commit_header(struct format_commit_context *context)
 	context->commit_header_parsed = 1;
 }
 
-static const char *format_subject(struct strbuf *sb, const char *msg,
-				  const char *line_separator)
+const char *format_subject(struct strbuf *sb, const char *msg,
+			   const char *line_separator)
 {
 	int first = 1;
 
-- 
1.6.1

^ permalink raw reply related

* [PATCH] strbuf: instate cleanup rule in case of non-memory errors
From: René Scharfe @ 2009-01-06 20:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Pierre Habouzit, Linus Torvalds, git
In-Reply-To: <20090104122108.GC29325@artemis.corp>

Make all strbuf functions that can fail free() their memory on error if
they have allocated it.  They don't shrink buffers that have been grown,
though.

This allows for easier error handling, as callers only need to call
strbuf_release() if A) the command succeeded or B) if they would have had
to do so anyway because they added something to the strbuf themselves.

Bonus hunk: document strbuf_readlink.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
 Documentation/technical/api-strbuf.txt |   11 +++++++++--
 strbuf.c                               |   17 +++++++++++++----
 2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/Documentation/technical/api-strbuf.txt b/Documentation/technical/api-strbuf.txt
index a8ee2fe..9a4e3ea 100644
--- a/Documentation/technical/api-strbuf.txt
+++ b/Documentation/technical/api-strbuf.txt
@@ -133,8 +133,10 @@ Functions
 
 * Adding data to the buffer
 
-NOTE: All of these functions in this section will grow the buffer as
-      necessary.
+NOTE: All of the functions in this section will grow the buffer as necessary.
+If they fail for some reason other than memory shortage and the buffer hadn't
+been allocated before (i.e. the `struct strbuf` was set to `STRBUF_INIT`),
+then they will free() it.
 
 `strbuf_addch`::
 
@@ -235,6 +237,11 @@ same behaviour as well.
 	Read the contents of a file, specified by its path. The third argument
 	can be used to give a hint about the file size, to avoid reallocs.
 
+`strbuf_readlink`::
+
+	Read the target of a symbolic link, specified by its path.  The third
+	argument can be used to give a hint about the size, to avoid reallocs.
+
 `strbuf_getline`::
 
 	Read a line from a FILE* pointer. The second argument specifies the line
diff --git a/strbuf.c b/strbuf.c
index bdf4954..6ed0684 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -256,18 +256,21 @@ size_t strbuf_expand_dict_cb(struct strbuf *sb, const char *placeholder,
 size_t strbuf_fread(struct strbuf *sb, size_t size, FILE *f)
 {
 	size_t res;
+	size_t oldalloc = sb->alloc;
 
 	strbuf_grow(sb, size);
 	res = fread(sb->buf + sb->len, 1, size, f);
-	if (res > 0) {
+	if (res > 0)
 		strbuf_setlen(sb, sb->len + res);
-	}
+	else if (res < 0 && oldalloc == 0)
+		strbuf_release(sb);
 	return res;
 }
 
 ssize_t strbuf_read(struct strbuf *sb, int fd, size_t hint)
 {
 	size_t oldlen = sb->len;
+	size_t oldalloc = sb->alloc;
 
 	strbuf_grow(sb, hint ? hint : 8192);
 	for (;;) {
@@ -275,7 +278,10 @@ ssize_t strbuf_read(struct strbuf *sb, int fd, size_t hint)
 
 		cnt = xread(fd, sb->buf + sb->len, sb->alloc - sb->len - 1);
 		if (cnt < 0) {
-			strbuf_setlen(sb, oldlen);
+			if (oldalloc == 0)
+				strbuf_release(sb);
+			else
+				strbuf_setlen(sb, oldlen);
 			return -1;
 		}
 		if (!cnt)
@@ -292,6 +298,8 @@ ssize_t strbuf_read(struct strbuf *sb, int fd, size_t hint)
 
 int strbuf_readlink(struct strbuf *sb, const char *path, size_t hint)
 {
+	size_t oldalloc = sb->alloc;
+
 	if (hint < 32)
 		hint = 32;
 
@@ -311,7 +319,8 @@ int strbuf_readlink(struct strbuf *sb, const char *path, size_t hint)
 		/* .. the buffer was too small - try again */
 		hint *= 2;
 	}
-	strbuf_release(sb);
+	if (oldalloc == 0)
+		strbuf_release(sb);
 	return -1;
 }
 
-- 
1.6.1

^ permalink raw reply related

* Re: [ANNOUNCE] Git homepage change
From: Miklos Vajna @ 2009-01-06 20:40 UTC (permalink / raw)
  To: Boyd Lynn Gerber; +Cc: Scott Chacon, David Bryson, Mike Hommey, git
In-Reply-To: <alpine.LNX.2.00.0901060907350.9096@suse104.zenez.com>

[-- Attachment #1: Type: text/plain, Size: 337 bytes --]

On Tue, Jan 06, 2009 at 09:12:30AM -0700, Boyd Lynn Gerber <gerberb@zenez.com> wrote:
> archive.  The only problem is I have not figured out how to download all 
> 303 articles, quickly/easily.  You would think one of the majordomo 
> commands to get articles would work.  They do not.

I would use:

http://gmane.org/export.php

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* [PATCH/RFC v2 4/4] remove the old 'has_symlink_leading_path()' function
From: Kjetil Barvik @ 2009-01-06 20:36 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Kjetil Barvik
In-Reply-To: <1231274192-30478-1-git-send-email-barvik@broadpark.no>

It has been replaced by the more cache effective 'lstat_cache()'
function.  see lstat_cache.c

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
:100644 100644 7449b10... edd4798... M	Makefile
:100644 100644 4c56f3f... 8b8da85... M	cache.h
:100644 000000 5a5e781... 0000000... D	symlinks.c
 Makefile   |    1 -
 cache.h    |    1 -
 symlinks.c |   64 ------------------------------------------------------------
 3 files changed, 0 insertions(+), 66 deletions(-)
 delete mode 100644 symlinks.c

diff --git a/Makefile b/Makefile
index 7449b105b03e862d53244d50ed035b4ddabef028..edd4798429ad3828f5b9dcff20f73c6a7269520e 100644
--- a/Makefile
+++ b/Makefile
@@ -483,7 +483,6 @@ LIB_OBJS += sha1_name.o
 LIB_OBJS += shallow.o
 LIB_OBJS += sideband.o
 LIB_OBJS += strbuf.o
-LIB_OBJS += symlinks.o
 LIB_OBJS += tag.o
 LIB_OBJS += trace.o
 LIB_OBJS += transport.o
diff --git a/cache.h b/cache.h
index 4c56f3faa0b87398e27346358e40af964a1828bb..8b8da85f32600050dddfcdf6b0c0e6ecb8f27072 100644
--- a/cache.h
+++ b/cache.h
@@ -720,7 +720,6 @@ struct checkout {
 
 extern void clear_created_dirs_cache(void);
 extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath);
-extern int has_symlink_leading_path(int len, const char *name);
 
 #define LSTAT_DIR       (1u << 0)
 #define LSTAT_NOENT     (1u << 1)
diff --git a/symlinks.c b/symlinks.c
deleted file mode 100644
index 5a5e781a15d7d9cb60797958433eca896b31ec85..0000000000000000000000000000000000000000
--- a/symlinks.c
+++ /dev/null
@@ -1,64 +0,0 @@
-#include "cache.h"
-
-struct pathname {
-	int len;
-	char path[PATH_MAX];
-};
-
-/* Return matching pathname prefix length, or zero if not matching */
-static inline int match_pathname(int len, const char *name, struct pathname *match)
-{
-	int match_len = match->len;
-	return (len > match_len &&
-		name[match_len] == '/' &&
-		!memcmp(name, match->path, match_len)) ? match_len : 0;
-}
-
-static inline void set_pathname(int len, const char *name, struct pathname *match)
-{
-	if (len < PATH_MAX) {
-		match->len = len;
-		memcpy(match->path, name, len);
-		match->path[len] = 0;
-	}
-}
-
-int has_symlink_leading_path(int len, const char *name)
-{
-	static struct pathname link, nonlink;
-	char path[PATH_MAX];
-	struct stat st;
-	char *sp;
-	int known_dir;
-
-	/*
-	 * See if the last known symlink cache matches.
-	 */
-	if (match_pathname(len, name, &link))
-		return 1;
-
-	/*
-	 * Get rid of the last known directory part
-	 */
-	known_dir = match_pathname(len, name, &nonlink);
-
-	while ((sp = strchr(name + known_dir + 1, '/')) != NULL) {
-		int thislen = sp - name ;
-		memcpy(path, name, thislen);
-		path[thislen] = 0;
-
-		if (lstat(path, &st))
-			return 0;
-		if (S_ISDIR(st.st_mode)) {
-			set_pathname(thislen, path, &nonlink);
-			known_dir = thislen;
-			continue;
-		}
-		if (S_ISLNK(st.st_mode)) {
-			set_pathname(thislen, path, &link);
-			return 1;
-		}
-		break;
-	}
-	return 0;
-}
-- 
1.6.1.rc1.49.g7f705

^ permalink raw reply related

* [PATCH/RFC v2 3/4] create_directories() inside entry.c: only check each directory once!
From: Kjetil Barvik @ 2009-01-06 20:36 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Kjetil Barvik
In-Reply-To: <1231274192-30478-1-git-send-email-barvik@broadpark.no>

When we do an 'git checkout' after some time we end up in the
'checkout_entry()' function inside entry.c, and from here we call the
'create_directories()' function to make sure the all the directories
exists for the possible new file or entry.

The 'create_directories()' function happily started to check that all
path component exists.  This resulted in tons and tons of calls to
lstat() or stat() when we checkout files nested deep inside a
directory.

We try to avoid this by remembering the last checked and possible
newly created directory.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
:100644 100644 9adc4a2... 4c56f3f... M	cache.h
:100644 100644 aa2ee46... 666a8ce... M	entry.c
:100644 100644 0938b86... 037678d... M	unpack-trees.c
 cache.h        |    1 +
 entry.c        |   86 ++++++++++++++++++++++++++++++++++++++++++++------------
 unpack-trees.c |    1 +
 3 files changed, 70 insertions(+), 18 deletions(-)

diff --git a/cache.h b/cache.h
index 9adc4a224648bb17a50e6792274225eeb6144451..4c56f3faa0b87398e27346358e40af964a1828bb 100644
--- a/cache.h
+++ b/cache.h
@@ -718,6 +718,7 @@ struct checkout {
 		 refresh_cache:1;
 };
 
+extern void clear_created_dirs_cache(void);
 extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath);
 extern int has_symlink_leading_path(int len, const char *name);
 
diff --git a/entry.c b/entry.c
index aa2ee46a84033585d8e07a585610c5a697af82c2..666a8ce3a132e85a45b0828521f3c2119c77833e 100644
--- a/entry.c
+++ b/entry.c
@@ -1,33 +1,76 @@
 #include "cache.h"
 #include "blob.h"
 
-static void create_directories(const char *path, const struct checkout *state)
+static char dirs_path[PATH_MAX];
+static int  dirs_len = 0;
+
+static inline int
+greatest_common_created_dirs_prefix(int len, const char *name)
 {
-	int len = strlen(path);
-	char *buf = xmalloc(len + 1);
-	const char *slash = path;
+	int max_len, match_len = 0, i = 0;
 
-	while ((slash = strchr(slash+1, '/')) != NULL) {
-		struct stat st;
-		int stat_status;
+	max_len = len < dirs_len ? len : dirs_len;
+	while (i < max_len && name[i] == dirs_path[i]) {
+		if (name[i] == '/') match_len = i;
+		i++;
+	}
+	if (i == dirs_len && len > dirs_len && name[dirs_len] == '/')
+		match_len = dirs_len;
+	return match_len;
+}
+
+static inline void
+update_created_dirs_cache(int last_slash)
+{
+	if (last_slash > 0 && last_slash < PATH_MAX) {
+		dirs_len = last_slash;
+	} else {
+		dirs_len = 0;
+	}
+}
 
-		len = slash - path;
-		memcpy(buf, path, len);
-		buf[len] = 0;
+void clear_created_dirs_cache(void)
+{
+	dirs_len = 0;
+}
+
+static void
+create_directories(int len, const char *path, const struct checkout *state)
+{
+	int i, max_len, last_slash, stat_status;
+	struct stat st;
+
+	/* Check the cache for previously checked or created
+	 * directories (and components) within this function.  There
+	 * is no need to check or re-create directory components more
+	 * than once!
+	 */
+	max_len = len < PATH_MAX ? len : PATH_MAX;
+	i = last_slash = greatest_common_created_dirs_prefix(max_len, path);
 
-		if (len <= state->base_dir_len)
+	while (i < max_len) {
+		do {
+			dirs_path[i] = path[i];
+			i++;
+		} while (i < max_len && path[i] != '/');
+		if (i >= max_len)
+			break;
+		last_slash = i;
+		dirs_path[last_slash] = '\0';
+
+		if (last_slash <= state->base_dir_len)
 			/*
 			 * checkout-index --prefix=<dir>; <dir> is
 			 * allowed to be a symlink to an existing
 			 * directory.
 			 */
-			stat_status = stat(buf, &st);
+			stat_status = stat(dirs_path, &st);
 		else
 			/*
 			 * if there currently is a symlink, we would
 			 * want to replace it with a real directory.
 			 */
-			stat_status = lstat(buf, &st);
+			stat_status = lstat(dirs_path, &st);
 
 		if (!stat_status && S_ISDIR(st.st_mode))
 			continue; /* ok, it is already a directory. */
@@ -38,14 +81,14 @@ static void create_directories(const char *path, const struct checkout *state)
 		 * error codepath; we do not care, as we unlink and
 		 * mkdir again in such a case.
 		 */
-		if (mkdir(buf, 0777)) {
+		if (mkdir(dirs_path, 0777)) {
 			if (errno == EEXIST && state->force &&
-			    !unlink(buf) && !mkdir(buf, 0777))
+			    !unlink(dirs_path) && !mkdir(dirs_path, 0777))
 				continue;
-			die("cannot create directory at %s", buf);
+			die("cannot create directory at %s", dirs_path);
 		}
 	}
-	free(buf);
+	update_created_dirs_cache(last_slash);
 }
 
 static void remove_subtree(const char *path)
@@ -55,6 +98,11 @@ static void remove_subtree(const char *path)
 	char pathbuf[PATH_MAX];
 	char *name;
 
+	/* To be utterly safe we invalidate the cache of the
+	 * previously created directories.
+	 */
+	clear_created_dirs_cache();
+
 	if (!dir)
 		die("cannot opendir %s (%s)", path, strerror(errno));
 	strcpy(pathbuf, path);
@@ -195,12 +243,14 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *t
 	static char path[PATH_MAX + 1];
 	struct stat st;
 	int len = state->base_dir_len;
+	int path_len;
 
 	if (topath)
 		return write_entry(ce, topath, state, 1);
 
 	memcpy(path, state->base_dir, len);
 	strcpy(path + len, ce->name);
+	path_len = len + ce_namelen(ce);
 
 	if (!lstat(path, &st)) {
 		unsigned changed = ce_match_stat(ce, &st, CE_MATCH_IGNORE_VALID);
@@ -229,6 +279,6 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *t
 			return error("unable to unlink old '%s' (%s)", path, strerror(errno));
 	} else if (state->not_new)
 		return 0;
-	create_directories(path, state);
+	create_directories(path_len, path, state);
 	return write_entry(ce, path, state, 0);
 }
diff --git a/unpack-trees.c b/unpack-trees.c
index 0938b86278f9b6aa14ef32fe15392f34b694d8cc..037678dfd75b7ef8235105e072fb276fcc6c36a8 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -121,6 +121,7 @@ static int check_updates(struct unpack_trees_options *o)
 		}
 	}
 
+	clear_created_dirs_cache();
 	for (i = 0; i < index->cache_nr; i++) {
 		struct cache_entry *ce = index->cache[i];
 
-- 
1.6.1.rc1.49.g7f705

^ permalink raw reply related

* [PATCH/RFC v2 2/4] Use 'lstat_cache()' instead of 'has_symlink_leading_path()'
From: Kjetil Barvik @ 2009-01-06 20:36 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Kjetil Barvik
In-Reply-To: <1231274192-30478-1-git-send-email-barvik@broadpark.no>

Start using the optimised, faster and more effective symlink/directory
cache.  The previously used call:

   has_symlink_leading_path(len, name);

should be identically with the following call to lstat_cache():

   lstat_cache(len, name,
               LSTAT_SYMLINK|LSTAT_DIR,
               LSTAT_SYMLINK);

The primary reason for the new name of the function (instead of using
the old name and add 2 extra arguments), is that it is now more
general, for instance, it now also can cache the fact that a directory
does not exists.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
:100644 100644 719de8b... 152c52c... M	builtin-add.c
:100644 100644 a8f75ed... 0eb2b21... M	builtin-apply.c
:100644 100644 5604977... 2844ddd... M	builtin-update-index.c
:100644 100644 ae96c64... b67ee06... M	diff-lib.c
:100644 100644 0131983... 9f2a1b1... M	dir.c
:100644 100644 54f301d... 0938b86... M	unpack-trees.c
 builtin-add.c          |    5 ++++-
 builtin-apply.c        |    5 ++++-
 builtin-update-index.c |    5 ++++-
 diff-lib.c             |    5 ++++-
 dir.c                  |    4 +++-
 unpack-trees.c         |    9 +++++++--
 6 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/builtin-add.c b/builtin-add.c
index 719de8b0f2d2d831f326d948aa18700e5c474950..152c52c0f22b3e71931b2a5629e1472602817785 100644
--- a/builtin-add.c
+++ b/builtin-add.c
@@ -121,7 +121,9 @@ static const char **validate_pathspec(int argc, const char **argv, const char *p
 	if (pathspec) {
 		const char **p;
 		for (p = pathspec; *p; p++) {
-			if (has_symlink_leading_path(strlen(*p), *p)) {
+			if (lstat_cache(strlen(*p), *p,
+					LSTAT_SYMLINK|LSTAT_DIR,
+					LSTAT_SYMLINK)) {
 				int len = prefix ? strlen(prefix) : 0;
 				die("'%s' is beyond a symbolic link", *p + len);
 			}
@@ -225,6 +227,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 
 	argc = parse_options(argc, argv, builtin_add_options,
 			  builtin_add_usage, 0);
+	clear_lstat_cache();
 	if (patch_interactive)
 		add_interactive = 1;
 	if (add_interactive)
diff --git a/builtin-apply.c b/builtin-apply.c
index a8f75ed3ed411d8cf7a3ec9dfefef7407c50f447..0eb2b21ee245919189c780a64cc494ca35f7934a 100644
--- a/builtin-apply.c
+++ b/builtin-apply.c
@@ -2354,7 +2354,9 @@ static int check_to_create_blob(const char *new_name, int ok_if_exists)
 		 * In such a case, path "new_name" does not exist as
 		 * far as git is concerned.
 		 */
-		if (has_symlink_leading_path(strlen(new_name), new_name))
+		if (lstat_cache(strlen(new_name), new_name,
+				LSTAT_SYMLINK|LSTAT_DIR,
+				LSTAT_SYMLINK))
 			return 0;
 
 		return error("%s: already exists in working directory", new_name);
@@ -3154,6 +3156,7 @@ int cmd_apply(int argc, const char **argv, const char *unused_prefix)
 	if (apply_default_whitespace)
 		parse_whitespace_option(apply_default_whitespace);
 
+	clear_lstat_cache();
 	for (i = 1; i < argc; i++) {
 		const char *arg = argv[i];
 		char *end;
diff --git a/builtin-update-index.c b/builtin-update-index.c
index 560497750586ec61be4e34de6dedd9c307129817..2844ddd95d208b35d47c1cc2ca40af4b68bd94ab 100644
--- a/builtin-update-index.c
+++ b/builtin-update-index.c
@@ -195,7 +195,9 @@ static int process_path(const char *path)
 	struct stat st;
 
 	len = strlen(path);
-	if (has_symlink_leading_path(len, path))
+	if (lstat_cache(len, path,
+			LSTAT_SYMLINK|LSTAT_DIR,
+			LSTAT_SYMLINK))
 		return error("'%s' is beyond a symbolic link", path);
 
 	/*
@@ -581,6 +583,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 	if (entries < 0)
 		die("cache corrupted");
 
+	clear_lstat_cache();
 	for (i = 1 ; i < argc; i++) {
 		const char *path = argv[i];
 		const char *p;
diff --git a/diff-lib.c b/diff-lib.c
index ae96c64ca209f4df9008198e8a04b160bed618c7..b67ee068e377e5aa1c7ed4abd88c0e307ba8fb4f 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -31,7 +31,9 @@ static int check_removed(const struct cache_entry *ce, struct stat *st)
 			return -1;
 		return 1;
 	}
-	if (has_symlink_leading_path(ce_namelen(ce), ce->name))
+	if (lstat_cache(ce_namelen(ce), ce->name,
+			LSTAT_SYMLINK|LSTAT_DIR,
+			LSTAT_SYMLINK))
 		return 1;
 	if (S_ISDIR(st->st_mode)) {
 		unsigned char sub[20];
@@ -69,6 +71,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 		diff_unmerged_stage = 2;
 	entries = active_nr;
 	symcache[0] = '\0';
+	clear_lstat_cache();
 	for (i = 0; i < entries; i++) {
 		struct stat st;
 		unsigned int oldmode, newmode;
diff --git a/dir.c b/dir.c
index 0131983dfbc143ce5dae77e067663bb2e7d5f126..9f2a1b1f245c3f1d4f12d54d45bb193c53fa15b5 100644
--- a/dir.c
+++ b/dir.c
@@ -719,7 +719,9 @@ int read_directory(struct dir_struct *dir, const char *path, const char *base, i
 {
 	struct path_simplify *simplify;
 
-	if (has_symlink_leading_path(strlen(path), path))
+	if (lstat_cache(strlen(path), path,
+			LSTAT_SYMLINK|LSTAT_DIR,
+			LSTAT_SYMLINK))
 		return dir->nr;
 
 	simplify = create_simplify(pathspec);
diff --git a/unpack-trees.c b/unpack-trees.c
index 54f301da67be879c80426bc21776427fdd38c02e..0938b86278f9b6aa14ef32fe15392f34b694d8cc 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -61,7 +61,9 @@ static void unlink_entry(struct cache_entry *ce)
 	char *cp, *prev;
 	char *name = ce->name;
 
-	if (has_symlink_leading_path(ce_namelen(ce), ce->name))
+	if (lstat_cache(ce_namelen(ce), ce->name,
+			LSTAT_SYMLINK|LSTAT_NOENT|LSTAT_DIR,
+			LSTAT_SYMLINK|LSTAT_NOENT))
 		return;
 	if (unlink(name))
 		return;
@@ -105,6 +107,7 @@ static int check_updates(struct unpack_trees_options *o)
 		cnt = 0;
 	}
 
+	clear_lstat_cache();
 	for (i = 0; i < index->cache_nr; i++) {
 		struct cache_entry *ce = index->cache[i];
 
@@ -584,7 +587,9 @@ static int verify_absent(struct cache_entry *ce, const char *action,
 	if (o->index_only || o->reset || !o->update)
 		return 0;
 
-	if (has_symlink_leading_path(ce_namelen(ce), ce->name))
+	if (lstat_cache(ce_namelen(ce), ce->name,
+			LSTAT_SYMLINK|LSTAT_NOENT|LSTAT_DIR,
+			LSTAT_SYMLINK|LSTAT_NOENT))
 		return 0;
 
 	if (!lstat(ce->name, &st)) {
-- 
1.6.1.rc1.49.g7f705

^ permalink raw reply related

* [PATCH/RFC v2 1/4] Optimised, faster, more effective symlink/directory detection
From: Kjetil Barvik @ 2009-01-06 20:36 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Kjetil Barvik
In-Reply-To: <1231274192-30478-1-git-send-email-barvik@broadpark.no>

This patch is work based on the following 2 commits:

Linus Torvalds: c40641b77b0274186fd1b327d5dc3246f814aaaf
Junio C Hamano: f859c846e90b385c7ef873df22403529208ade50

Changes includes the following:

- The cache functionality is more effective.  Previously when A/B/C/D
  was in the cache and A/B/C/E/file.c was called for, there was no
  match at all from the cache.  Now we use the fact that the paths
  "A", "A/B" and "A/B/C" is already tested, and we only need to do an
  lstat() call on "A/B/C/E".

- We only cache/store the last path regardless of it's type.  Since the
  cache functionality is always used with alphabetically sorted names
  (at least it seams so for me), there is no need to store both the
  last symlink-leading path and the last real-directory path.  Note
  that if the cache is not called with (mostly) alphabetically sorted
  names, neither the old, nor this new one, would be very effective.

- We also can cache the fact that a directory does not exist.
  Previously we could end up doing lots of lstat() calls for a removed
  directory which previously contained lots of files.  Since we
  already have simplified the cache functionality and only store the
  last path (see above), this new functionality was easy to add.

- Previously, when symlink A/B/C/S was cached/stored in the
  symlink-leading path, and A/B/C/file.c was called for, it was not
  easy to use the fact that we already known that the paths "A", "A/B"
  and "A/B/C" is real directories.  Since we now only store one single
  path (the last one), we also get similar logic for free regarding
  the new "non-exsisting-directory-cache".

- Avoid copying the first path components of the name 2 zillions times
  when we tests new path components.  Since we always cache/store the
  last path, we can copy each component as we test those directly into
  the cache.  Previously we ended up doing a memcpy() for the full
  path/name right before each lstat() call, and when updating the
  cache for each time we have tested an new path component.

- We also use less memory, that is PATH_MAX bytes less memory on the
  stack and PATH_MAX bytes less memory on the heap.

- Introduce a 3rd argument, 'unsigned int track_flags', to the
  cache-test function, check_lstat_cache().  This new argument can be
  used to tell the cache functionality which types of directories
  should be cached.

- Also introduce a 'void clear_lstat_cache(void)' function, which
  should be used to clean the cache before usage.  If for instance,
  you have changed the types of directories which should be cached,
  the cache could contain a path which was not wanted.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
:100644 100644 aabf013... 7449b10... M	Makefile
:100644 100644 231c06d... 9adc4a2... M	cache.h
:000000 100644 0000000... 2f025c8... A	lstat_cache.c
 Makefile      |    1 +
 cache.h       |   15 +++++++++
 lstat_cache.c |   99 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 115 insertions(+), 0 deletions(-)
 create mode 100644 lstat_cache.c

diff --git a/Makefile b/Makefile
index aabf0130b99bee5204c8e668ba8f40caea77dae2..7449b105b03e862d53244d50ed035b4ddabef028 100644
--- a/Makefile
+++ b/Makefile
@@ -446,6 +446,7 @@ LIB_OBJS += list-objects.o
 LIB_OBJS += ll-merge.o
 LIB_OBJS += lockfile.o
 LIB_OBJS += log-tree.o
+LIB_OBJS += lstat_cache.o
 LIB_OBJS += mailmap.o
 LIB_OBJS += match-trees.o
 LIB_OBJS += merge-file.o
diff --git a/cache.h b/cache.h
index 231c06d7726b575f6e522d5b0c0fe43557e8c651..9adc4a224648bb17a50e6792274225eeb6144451 100644
--- a/cache.h
+++ b/cache.h
@@ -721,6 +721,21 @@ struct checkout {
 extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath);
 extern int has_symlink_leading_path(int len, const char *name);
 
+#define LSTAT_DIR       (1u << 0)
+#define LSTAT_NOENT     (1u << 1)
+#define LSTAT_SYMLINK   (1u << 2)
+#define LSTAT_LSTATERR  (1u << 3)
+#define LSTAT_ERR       (1u << 4)
+extern unsigned int check_lstat_cache(int len, const char *name,
+				      unsigned int track_flags);
+static inline unsigned int lstat_cache(int len, const char *name,
+				       unsigned int track_flags,
+				       unsigned int mask_flags)
+{
+	return check_lstat_cache(len, name, track_flags) & mask_flags;
+}
+extern void clear_lstat_cache(void);
+
 extern struct alternate_object_database {
 	struct alternate_object_database *next;
 	char *name;
diff --git a/lstat_cache.c b/lstat_cache.c
new file mode 100644
index 0000000000000000000000000000000000000000..2f025c87b0be0e713af7b1400e71284a22c25e9f
--- /dev/null
+++ b/lstat_cache.c
@@ -0,0 +1,99 @@
+#include "cache.h"
+
+static char         cache_path[PATH_MAX];
+static int          cache_len   = 0;
+static unsigned int cache_flags = 0;
+
+static inline int
+greatest_common_path_cache_prefix(int len, const char *name)
+{
+	int max_len, match_len = 0, i = 0;
+
+	max_len = len < cache_len ? len : cache_len;
+	while (i < max_len && name[i] == cache_path[i]) {
+		if (name[i] == '/') match_len = i;
+		i++;
+	}
+	if (i == cache_len && len > cache_len && name[cache_len] == '/')
+		match_len = cache_len;
+	return match_len;
+}
+
+/*
+ * Check if name 'name' of length 'len' has a symlink leading
+ * component, or if the directory exists and is real, or not.
+ *
+ * To speed up the check, some information is allowed to be cached.
+ * This is indicated by the 'track_flags' argument.
+ */
+unsigned int
+check_lstat_cache(int len, const char *name, unsigned int track_flags)
+{
+	int match_len, last_slash, max_len;
+	unsigned int match_flags, ret_flags, save_flags;
+	struct stat st;
+
+	/* Check if match from the cache for 2 "excluding" path types.
+	 */
+	match_len = last_slash =
+		greatest_common_path_cache_prefix(len, name);
+	match_flags =
+		cache_flags & track_flags & (LSTAT_NOENT|LSTAT_SYMLINK);
+	if (match_flags && match_len == cache_len)
+		return match_flags;
+
+	/* Okay, no match from the cache so far, so now we have to
+	 * check the rest of the path components.
+	 */
+	ret_flags = LSTAT_DIR;
+	max_len = len < PATH_MAX ? len : PATH_MAX;
+	while (match_len < max_len) {
+		do {
+			cache_path[match_len] = name[match_len];
+			match_len++;
+		} while (match_len < max_len && name[match_len] != '/');
+		if (match_len >= max_len)
+			break;
+		last_slash = match_len;
+		cache_path[last_slash] = '\0';
+
+		if (lstat(cache_path, &st)) {
+			ret_flags = LSTAT_LSTATERR;
+			if (errno == ENOENT)
+				ret_flags |= LSTAT_NOENT;
+		} else if (S_ISDIR(st.st_mode)) {
+			continue;
+		} else if (S_ISLNK(st.st_mode)) {
+			ret_flags = LSTAT_SYMLINK;
+		} else {
+			ret_flags = LSTAT_ERR;
+		}
+		break;
+	}
+
+	/* At the end update the cache.  Note that max 3 different
+	 * path types can be cached for the moment!
+	 */
+	save_flags = ret_flags & track_flags &
+		(LSTAT_NOENT|LSTAT_SYMLINK|LSTAT_DIR);
+	if (save_flags && last_slash > 0 && last_slash < PATH_MAX) {
+		cache_path[last_slash] = '\0';
+		cache_len   = last_slash;
+		cache_flags = save_flags;
+	} else {
+		clear_lstat_cache();
+	}
+	return ret_flags;
+}
+
+/*
+ * Before usage of the check_lstat_cache() function one should call
+ * clear_lstat_cache() (at an appropriate place) to make sure that the
+ * cache is clean.
+ */
+void clear_lstat_cache(void)
+{
+	cache_path[0] = '\0';
+	cache_len     = 0;
+	cache_flags   = 0;
+}
-- 
1.6.1.rc1.49.g7f705

^ permalink raw reply related

* [PATCH/RFC v2 0/4] git checkout: optimise away lots of lstat() calls
From: Kjetil Barvik @ 2009-01-06 20:36 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Kjetil Barvik

I have just started to clone some interesting Linux git trees to watch
the development more closely, and therefore also started to use git. I
noticed that 'git checkout' takes some time, and especially that the
'git checkout' command does lots and lots of lstat() calls.

After some more investigation and thinking, I have made 4 patches and
been able to optimise away over 42% of all lstat() calls in some cases
for the 'git checkout' command.  I have not tested other git porcelain
commands for reduced lstat() calls, but I would guess that the more
effective 'lstat_cache()' compared to 'has_leading_symlink_cache()',
should also give better numbers in other cases.

All the 4 patches is against git master, and the git 'make test'
test suite still passes after each patch.

To document the improvement, below is some numbers, which compares
before and after all 4 patches. To reproduce the numbers:

- git clone the Linux git tree to be able to get the Linux tags
  'v2.6.25' and 'v2.6.27'.
- git checkout -b my-v2.6.27 v2.6.27
- git checkout -b my-v2.6.25 v2.6.25

Then, when the current branch is 'my-v2.6.25' do:

  strace -o strace_to27 -T git checkout -q my-v2.6.27

And then you pretty print and collect stats from the 'strace_to27'
file.  If someone wants a copy of the strace_stat.pl script, which I
made/used to do the pretty printing, then give me a hint.

Below is the stats/numbers from the current git version (before the 4
patches).  Notice that we do an lstat() call on the "arch" directory
over 6000 times!

TOTAL      185151 100.000% OK:165544 NOT: 19607  11.136001 sec   60 usec/call
lstat64    120954  65.327% OK:107013 NOT: 13941   5.388727 sec   45 usec/call
  strings  120954 tot  30163 uniq   4.010 /uniq   5.388727 sec   45 usec/call
  files     61491 tot  28712 uniq   2.142 /uniq   2.740520 sec   45 usec/call
  dirs      45522 tot   1436 uniq  31.701 /uniq   1.994448 sec   44 usec/call
  errors    13941 tot   5189 uniq   2.687 /uniq   0.653759 sec   47 usec/call
             6297   5.206% OK:  6297 NOT:     0  "arch"
             4544   3.757% OK:  4544 NOT:     0  "drivers"
             1816   1.501% OK:  1816 NOT:     0  "arch/arm"
             1499   1.239% OK:  1499 NOT:     0  "include"
              912   0.754% OK:   912 NOT:     0  "arch/powerpc"
              764   0.632% OK:   764 NOT:     0  "fs"
              746   0.617% OK:   746 NOT:     0  "drivers/net"
              662   0.547% OK:   662 NOT:     0  "net"
              652   0.539% OK:   325 NOT:   327  "arch/sparc/include"
              636   0.526% OK:   636 NOT:     0  "drivers/media"
              606   0.501% OK:   606 NOT:     0  "include/linux"
              533   0.441% OK:   533 NOT:     0  "arch/sh"
              522   0.432% OK:   260 NOT:   262  "arch/powerpc/include"
              488   0.403% OK:   243 NOT:   245  "arch/sh/include"
              413   0.341% OK:   413 NOT:     0  "arch/sparc"
              390   0.322% OK:   390 NOT:     0  "arch/x86"
              383   0.317% OK:   383 NOT:     0  "Documentation"
              370   0.306% OK:   184 NOT:   186  "arch/ia64/include"
              366   0.303% OK:   366 NOT:     0  "drivers/media/video"
              348   0.288% OK:   173 NOT:   175  "arch/arm/include"

Here is the stats/numbers after applying the 4 patches.  Notice how
nice the top 20 entries list now looks!

TOTAL      133655 100.000% OK:121615 NOT: 12040  10.429999 sec   78 usec/call
lstat64     69603  52.077% OK: 63218 NOT:  6385   3.419920 sec   49 usec/call
  strings   69603 tot  30163 uniq   2.308 /uniq   3.419920 sec   49 usec/call
  files     61491 tot  28712 uniq   2.142 /uniq   3.034869 sec   49 usec/call
  dirs       1727 tot   1164 uniq   1.484 /uniq   0.075681 sec   44 usec/call
  errors     6385 tot   5189 uniq   1.230 /uniq   0.309370 sec   48 usec/call
                4   0.006% OK:     4 NOT:     0  ".gitignore"
                4   0.006% OK:     4 NOT:     0  ".mailmap"
                4   0.006% OK:     4 NOT:     0  "CREDITS"
                4   0.006% OK:     4 NOT:     0  "Documentation/00-INDEX"
                4   0.006% OK:     4 NOT:     0  "Documentation/ABI/testing/sysfs-block"
                4   0.006% OK:     4 NOT:     0  "Documentation/ABI/testing/sysfs-firmware-acpi"
                4   0.006% OK:     4 NOT:     0  "Documentation/CodingStyle"
                4   0.006% OK:     4 NOT:     0  "Documentation/DMA-API.txt"
                4   0.006% OK:     4 NOT:     0  "Documentation/DMA-mapping.txt"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/Makefile"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/gadget.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/kernel-api.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/kernel-locking.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/procfs-guide.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/procfs_example.c"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/rapidio.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/s390-drivers.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/uio-howto.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/videobook.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/writing_usb_driver.tmpl"

Note that the overall used system time as recorded from 'strace -T',
does not drop so much that the reduced lstat() time should indicate
for _this_ particular test run.  This is because now each unlink()
call takes much more time, at least for me on an slow ide disk (using
ext3) on a laptop.

A simple test gives me an overall improvement of 2.937 seconds: real
time drops from 28.195s (best of 5 runs with 'time git ...'), to
25.381s (best of 5 runs).

I have also noticed that inside the unlink_entry() function in file
unpack-trees.c, one could often end up calling rmdir() lots and lots
of times on none-empty directories.  Maybe one should schedule each
directory for removal by an appropriate function, and then at the end
call a new function to clean all the directories at once?

Comments?

----
  Changes since v1:
  = inside patch 1/4
    - always null terminate the cache_path array
    - added a paragraph to the commit message for this patch
    - small cleanup on 2 comments, and a small line indentation change
      [[Following is updates after comments from Junio C Hamano - Thanks!]]
    - removed the 'static inline update_path_cache()' function
    - replaced the else-part of the above inline function with a call
      to the 'clear_lstat_cache()' function.
    - deleted the '|| errno == ENOTDIR' part inside the big while-loop
      inside check_lstat_cache(), and updated the named BIT-fields
      accordingly
  = inside patch 2/4
    - moved a paragraph out from the commit message for this patch and
      into this cover-letter
      [[Following is updates after comments from Junio C Hamano - Thanks!]]
    - Removed the '|LSTAT_NOTDIR' part from the call to lstat_cache()
      inside function 'check_removed()' inside file diff-lib.c


Kjetil Barvik (4):
  Optimised, faster, more effective symlink/directory detection
  Use 'lstat_cache()' instead of 'has_symlink_leading_path()'
  create_directories() inside entry.c: only check each directory once!
  remove the old 'has_symlink_leading_path()' function

 Makefile               |    2 +-
 builtin-add.c          |    5 ++-
 builtin-apply.c        |    5 ++-
 builtin-update-index.c |    5 ++-
 cache.h                |   17 ++++++++-
 diff-lib.c             |    5 ++-
 dir.c                  |    4 +-
 entry.c                |   86 +++++++++++++++++++++++++++++++++---------
 lstat_cache.c          |   99 ++++++++++++++++++++++++++++++++++++++++++++++++
 symlinks.c             |   64 -------------------------------
 unpack-trees.c         |   10 ++++-
 11 files changed, 211 insertions(+), 91 deletions(-)
 create mode 100644 lstat_cache.c
 delete mode 100644 symlinks.c

^ permalink raw reply

* Re: [PATCH] fast-export: print usage when no options specified
From: Miklos Vajna @ 2009-01-06 20:11 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, jidanni, git
In-Reply-To: <alpine.DEB.1.00.0901062023100.30769@pacific.mpi-cbg.de>

[-- Attachment #1: Type: text/plain, Size: 1138 bytes --]

On Tue, Jan 06, 2009 at 08:28:59PM +0100, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> Maybe this should be part of the commit message?
> 
> -- snip --
> Some people find it surprising that fast-export does not output a usage 
> when called without parameters, as rev-list does.
> 
> This assumes that a user usually does not want to export HEAD by default.
> -- snap --

OK, if this is the only problem, I can resend it with this included. ;-)

> However, I have to say that I would find exporting HEAD a rather sensible 
> default.
> 
> But I am not _that_ strongly opposed to the patch.  Just would like to 
> hear some opinions first.

According to man git, fast-export is not a plumbing, though I think most
user won't type it manually multiple times, they'll write a script, that
checks if there is something to convert, then run git fast-export | foo
fast-import, so it's really like plumbing. That's why I found the
rev-list-like behaviour more logical.

However, I think defaulting to HEAD is still better than the current
situation, so if that's the consensus, I'm fine with that as well.

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: JGit vs. Git
From: Stephen Bannasch @ 2009-01-06 19:55 UTC (permalink / raw)
  To: Robin Rosenberg, Vagmi Mudumbai; +Cc: git
In-Reply-To: <200901061212.07240.robin.rosenberg.lists@dewire.com>

At 12:12 PM +0100 1/6/09, Robin Rosenberg wrote:
>tisdag 06 januari 2009 07:50:11 skrev Vagmi Mudumbai:
>  > I am working on Windows with msysGit behind a HTTP Proxy. (Life cant
>>  get worse, I guess.) . I planned on using grit via JRuby but grit uses
>>  fork which is not available on funny platforms like windows. And JRuby
>>  guys do not have any plan on supporting fork even on platforms on
>>  which for is supported. If JGit is a pure Java based implementation of
>fork isn't really supported on Windows. Cygwin goes to great lengths to
>emulate it. Trying to do that within the context of an arbitrary JVM seems
>like a daunting task. Consider submitting patches to make grit not use fork...
>just kidding.., please help us improve JGit instead :)

Or think about extending the Ruby gem grit to also use JGit.  Which 
would certainly improve grit and  probably help improve JGit also.

I've thought about this a bit -- but it hasn't gotten to the top of 
my list yet ...

There are some examples of Ruby Gems which use Java libraries when 
run in JRuby and native C libraries when used from MRI.

   hpricot:   http://github.com/why/hpricot/tree
   redcloth:  http://github.com/jgarber/redcloth/tree/master

^ permalink raw reply

* Re: Quick command to count commits
From: Ted Pavlic @ 2009-01-06 19:55 UTC (permalink / raw)
  To: Henk; +Cc: git
In-Reply-To: <1231267896595-2118851.post@n2.nabble.com>

I doubt it's going to be much faster, but it's easier to type

	git shortlog -s|numsum

I *think* that should give you the same number, and it forces git to do 
more of the counting (rather than wc).

Strangely, on one repo that I test that on, I get slightly different 
numbers from that command and yours. I'm not quite sure why (I guess 
shortlog doesn't count all commits?).

--Ted

On 1/6/09 1:51 PM, Henk wrote:
>
> Hi,
>
> For GitExtensions (windows git ui) I need a command to count all commits. I
> now use this command:
> git.cmd rev-list --all --abbrev-commit | wc -l
>
> This works perfect but its very slow in big repositories. Is there a faster
> way to count the commits?
>
> Henk

-- 
Ted Pavlic <ted@tedpavlic.com>

^ permalink raw reply

* Re: BUG?? INSTALL MAKEFILE
From: Ted Pavlic @ 2009-01-06 19:44 UTC (permalink / raw)
  To: git
In-Reply-To: <49638625.3090109@tedpavlic.com>

> 	git clean -f

DOH. I meant to do

	git clean -fx

(or just a git ls-files). That config.mak.autogen is certainly not 
checked into the repo, and a quick test confirms that "make install" 
certainly does set prefix to the home directory.

That being said, I'm sure that in the recent past I had to use configure 
to install git into home directories.

--Ted

-- 
Ted Pavlic <ted@tedpavlic.com>

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Johannes Schindelin @ 2009-01-06 19:40 UTC (permalink / raw)
  To: Pierre Habouzit; +Cc: Linus Torvalds, davidel, Francis Galiegue, Git ML
In-Reply-To: <20090106111712.GB30766@artemis.corp>

Hi,

On Tue, 6 Jan 2009, Pierre Habouzit wrote:

> On jeu, jan 01, 2009 at 04:38:09 +0000, Johannes Schindelin wrote:
> > 
> > Nothing fancy, really, just a straight-forward implementation of the
> > heavily under-documented and under-analyzed paience diff algorithm.
> 
> Btw, what is the status of this series ? I see it neither in pu nor in 
> next. And I would gladly see it included in git.

AFAICT people wanted to be reasonably sure that it is worth the effort.

Although I would like to see it in once it is fleshed out -- even if it 
does not meet our usefulness standard -- because people said Git is 
inferior for not providing a patience diff.  If we have --patience, we can 
say "but we have it, it's just not useful, check for yourself".

> On jeu, jan 01, 2009 at 07:45:21 +0000, Linus Torvalds wrote:
> > 
> > So could we have some actual real data on it?
> 
> For example, look at the following (shamelessly stolen from the real
> life example http://glandium.org/blog/?p=120).

Due to the lines being much longer than 80 characters, this example was 
not useful to me.

Ciao,
Dscho

^ permalink raw reply

* Re: [PATCH 2/3] unpack-trees: fix path search bug in verify_absent
From: Johannes Schindelin @ 2009-01-06 19:35 UTC (permalink / raw)
  To: Clemens Buchacher; +Cc: Junio C Hamano, git
In-Reply-To: <20090104200133.GA2202@localhost>

Hi,

On Sun, 4 Jan 2009, Clemens Buchacher wrote:

> On Sun, Jan 04, 2009 at 02:01:14AM -0800, Junio C Hamano wrote:
> >     The function's purpose is ....  Before entering the loop to count 
> >     the number of entries to skip, this check to detect if we do not 
> >     even have to count appears.  When this check triggers, we know we 
> >     do not want to skip anything, and returning constant 0 is much 
> >     clearer than returning a variable cnt that was initialized to 0 
> >     near the beginning of the function; we haven't even started using 
> >     it to count yet.
> > 
> > But the point is, if that is the reason the author thinks it is an 
> > improvement, that probably needs to be stated.
> 
> If you want to check the validity of the patch you have to view it in
> context anyways.

Umm.

You can make reviewing your patch attractive and easy, and you can make it 
unattractive and difficult.

If you explain in the commit message that you replaced "cnt" by "0" 
because it is initialized to 0 at that point anyway, it is a _much bigger_ 
pleasure to review your patch.

Let alone a much bigger pleasure for you, 6 months from now, when somebody 
says "why does this silly function return 0, when it should return cnt?"

BTW exactly for that reason, I'd like to leave it as "cnt".  Because code 
_will_ change, and it's quite possible that cnt will not be 0 at that 
point in the future.

> Compared to understanding the change to the code, it takes much longer 
> to parse and understand the above paragraph _plus_ verify its agreement 
> with the code. I think you will agree that there is a limit to the 
> amount of documentation that's still useful.

Just look at a concrete case: me.  I saw that part of the patch, even 
before coming to the real meat of it.  And that head-scratching already 
removed all the enthusiasm I had to look at unpack-trees.c again, so you 
lost a reviewer.

> What's sad, however, is that we are now discussing style and commenting 
> issues of a line of code, which, as by my analysis of [PATCH 3/3] never 
> actually gets executed in the first place. I would have been much more 
> curious about your comments on that.

Exactly,
Dscho

^ permalink raw reply

* Re: Quick command to count commits
From: Linus Torvalds @ 2009-01-06 19:30 UTC (permalink / raw)
  To: Henk; +Cc: git
In-Reply-To: <1231267896595-2118851.post@n2.nabble.com>



On Tue, 6 Jan 2009, Henk wrote:
>
> For GitExtensions (windows git ui) I need a command to count all commits. I
> now use this command:
> git.cmd rev-list --all --abbrev-commit | wc -l
> 
> This works perfect but its very slow in big repositories. Is there a faster
> way to count the commits?

Nope. Possibly drop the --abbrev-commit part, but it's not going to hurt 
_that_ much, and maybe avoiding piping the data can be a win in some 
cases.

Basically, to get commit counts, you need to traverse the whole history, 
or at least cache it. So the only way to speed things up is

 - make sure your repository is well-packed. That will speed things up by 
   an absolutely huge amount, if they weren't well-packed before. Just a 
   single large pack-file, not lots of small packs, and not lots of loose 
   objects.

 - you can certainly cache it. Just index by the sha1sum of all the heads, 
   and you have a great cache. Just keep a single entry. So _if_ the 
   repository seldom changes, and you do this a lot, you'll at least only 
   pay the price once.

IOW, do something like this:

	#!/bin/sh
	revs=$(git rev-parse --all)
	index=$(echo "$revs" | sha1sum | cut -d' ' -f1)
	cached=$(cat .git/commit_nr_cache)
	cached_index=$(echo "$cached" | cut -d' ' -f1)
	if [ "$index" == "$cached_index" ]; then
		echo "$cached" | cut -d' ' -f2
		exit
	fi
	nr=$(git rev-list $revs | wc -l)
	echo "$index $nr" > .git/commit_nr_cache
	echo $nr

and you now have a stupid single-entry cache.

Totally untested. You'll need to do _some_ work yourself ;)

		Linus

^ permalink raw reply

* Re: [PATCH] fast-export: print usage when no options specified
From: Johannes Schindelin @ 2009-01-06 19:28 UTC (permalink / raw)
  To: Miklos Vajna; +Cc: Junio C Hamano, jidanni, git
In-Reply-To: <1230955152-23646-1-git-send-email-vmiklos@frugalware.org>

Hi,

On Sat, 3 Jan 2009, Miklos Vajna wrote:

> Signed-off-by: Miklos Vajna <vmiklos@frugalware.org>
> ---
> 
> I found that behaviour surprising as well when I saw it the first time.
> git rev-list outputs some usage info in that case, git log just logs
> HEAD. Given that unconditionally exporting HEAD with no arguments is
> probably not something most users want, here is a patch to make the
> behaviour like what git rev-list already has.

Maybe this should be part of the commit message?

-- snip --
Some people find it surprising that fast-export does not output a usage 
when called without parameters, as rev-list does.

This assumes that a user usually does not want to export HEAD by default.
-- snap --

However, I have to say that I would find exporting HEAD a rather sensible 
default.

But I am not _that_ strongly opposed to the patch.  Just would like to 
hear some opinions first.

Ciao,
Dscho

^ permalink raw reply

* Re: [JGIT PATCH] Improve the sideband progress scaper to be more accurate
From: Shawn O. Pearce @ 2009-01-06 19:27 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Robin Rosenberg, git
In-Reply-To: <alpine.LFD.2.00.0901061343000.26118@xanadu.home>

Nicolas Pitre <nico@cam.org> wrote:
> There may indeed be line fragments sent over the sideband channel, as 
> well as the opposite which is multiple lines sent at once in a single 
> packet.  If you look at sideband.c you'll find about all those cases.

Thanks, that's what I thought...

> In general, what you have to do is:
> 
>  - for each packet:
>    - split into multiple chunks on line breaks ('\r' or '\n')
>    - for each chunk:
>      - if last chunk didn't end with a line break, or if current 
>        chunk is empty or only contains a line break, then skip printing 
>        the "remote:" prefix.  Otherwise print it.
>      - print the current chunk up to any line break
>      - if current chunk contains a line break and other characters then
>        print a sequence to clear the remaining of the screen line
>      - print the line break if any

Hmm.  I should note that C Git still screws this up sometimes.  I've
seen 1.6.1 git fetch mess up the output from repo.or.cz's sideband.
I'm sure Pasky isn't running JGit's daemon, its too damn fast. :-)

I don't have a spew of it handy, but late last week I saw it screw
up while doing a clone off repo.or.cz.

Robin, maybe we should apply this patch?  Its along the lines
Nico is describing above, but because our progress code doesn't do
direct display its a little less complex, we don't have to worry
about clearing the current line.  But I think I'm doing the chunk
splitting and filtering he's talking about.

-- 
Shawn.

^ permalink raw reply

* Re: [JGIT PATCH] Improve the sideband progress scaper to be more accurate
From: Nicolas Pitre @ 2009-01-06 19:23 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: Shawn O. Pearce, git
In-Reply-To: <200901032312.53148.robin.rosenberg@dewire.com>

On Sat, 3 Jan 2009, Robin Rosenberg wrote:

> onsdag 31 december 2008 20:04:01 skrev Shawn O. Pearce:
> > By matching only whole lines we should be able to improve the
> > progress scaper so we avoid ugly output like we had been seeing:
> > 
> >   EGIT.contrib/jgit clone git://repo.or.cz/libgit2.git LIBGIT2
> >   Initialized empty Git repository in /home/me/SW/LIBGIT2/.git
> >   Counting objects:       547
> >   Compressing objects:    100% (192/192)
> >   ts:                     100% (192/192)
> >   Compressing objects:    100% (192/192)
> >   ng objects:             100% (192/192)
> > 
> > Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
> > ---
> >  Robin Rosenberg <robin.rosenberg@dewire.com> wrote:
> >  > Would it be hard to get the progress look better?
> > 
> >  Maybe this does the trick.  Its hard to reproduce so its hard to
> >  come up with the condition that was giving us the problem before.
> >  I suspect its because we were getting line fragments on the sideband
> >  channel, but I'm not sure that was really the case.
> 
> Nasty. I couldn't reproduce it myself. I'll hold onto this one for a while and
> see if I'll get the opportunity to test it live with this problem.

There may indeed be line fragments sent over the sideband channel, as 
well as the opposite which is multiple lines sent at once in a single 
packet.  If you look at sideband.c you'll find about all those cases.

In general, what you have to do is:

 - for each packet:
   - split into multiple chunks on line breaks ('\r' or '\n')
   - for each chunk:
     - if last chunk didn't end with a line break, or if current 
       chunk is empty or only contains a line break, then skip printing 
       the "remote:" prefix.  Otherwise print it.
     - print the current chunk up to any line break
     - if current chunk contains a line break and other characters then
       print a sequence to clear the remaining of the screen line
     - print the line break if any

That should cover all cases.  Note that the case where a previous chunk 
didn't end with a line break is always seen when moving from one packet 
to another, so that state cannot be local to the inner loop.

Nicolas

^ permalink raw reply

* Re: Quick command to count commits
From: Shawn O. Pearce @ 2009-01-06 18:55 UTC (permalink / raw)
  To: Henk; +Cc: git
In-Reply-To: <1231267896595-2118851.post@n2.nabble.com>

Henk <henk_westhuis@hotmail.com> wrote:
> For GitExtensions (windows git ui) I need a command to count all commits. I
> now use this command:
> git.cmd rev-list --all --abbrev-commit | wc -l
> 
> This works perfect but its very slow in big repositories. Is there a faster
> way to count the commits?

No.  But drop the --abbrev-commit flag, abbreviating the object ids
would take longer than just dumping the full 40 bytes to stdout.
It requires a scan of parts of the object database for every commit
to determine what is a unique abbreviation.

This isn't a very common operation.  Its expensive to do, as you
can see.  What's the reason for supplying it, and needing it so
quickly?  Nobody really cares how many commits are in their history,
unless they are trying to produce stats to describe how big their
repository is.  In which case the size of the .git/objects/pack
*.pack and *.idx files is much more relevant, especially the .idx
files as it gives a really quick estimate on the number of total
objects in the repository.

-- 
Shawn.

^ permalink raw reply

* [PATCH] diff: accept -- when using --no-index
From: Thomas Rast @ 2009-01-06 18:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Accept -- as an "end of options" marker even when using --no-index.
Previously, the -- triggered a "normal" index/tree diff and
subsequently failed because of the unrecognized (in that mode)
--no-index.

Note that the second loop can treat '--' as a normal option, because
the preceding checks ensure it is the third-to-last argument.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>

---
 diff-no-index.c                                    |    8 ++++++--
 t/t4013-diff-various.sh                            |    1 +
 .../diff.diff_--no-index_--name-status_--_dir2_dir |    3 +++
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/diff-no-index.c b/diff-no-index.c
index b60d345..d1e8e4b 100644
--- a/diff-no-index.c
+++ b/diff-no-index.c
@@ -173,8 +173,10 @@ void diff_no_index(struct rev_info *revs,
 
 	/* Were we asked to do --no-index explicitly? */
 	for (i = 1; i < argc; i++) {
-		if (!strcmp(argv[i], "--"))
-			return;
+		if (!strcmp(argv[i], "--")) {
+			i++;
+			break;
+		}
 		if (!strcmp(argv[i], "--no-index"))
 			no_index = 1;
 		if (argv[i][0] != '-')
@@ -214,6 +216,8 @@ void diff_no_index(struct rev_info *revs,
 			i++;
 		else if (!strcmp(argv[1], "-q"))
 			options |= DIFF_SILENT_ON_REMOVED;
+		else if (!strcmp(argv[i], "--"))
+			i++;
 		else {
 			j = diff_opt_parse(&revs->diffopt, argv + i, argc - i);
 			if (!j)
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index aeb5405..aba5320 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -261,6 +261,7 @@ diff --patch-with-stat -r initial..side
 diff --patch-with-raw -r initial..side
 diff --name-status dir2 dir
 diff --no-index --name-status dir2 dir
+diff --no-index --name-status -- dir2 dir
 diff master master^ side
 EOF
 
diff --git a/t/t4013/diff.diff_--no-index_--name-status_--_dir2_dir b/t/t4013/diff.diff_--no-index_--name-status_--_dir2_dir
new file mode 100644
index 0000000..6756f8d
--- /dev/null
+++ b/t/t4013/diff.diff_--no-index_--name-status_--_dir2_dir
@@ -0,0 +1,3 @@
+$ git diff --no-index --name-status -- dir2 dir
+A	dir/sub
+$
-- 
tg: (e9b8523..) t/diff-no-index-invocation (depends on: origin/master)

^ permalink raw reply related

* Quick command to count commits
From: Henk @ 2009-01-06 18:51 UTC (permalink / raw)
  To: git

Hi,

For GitExtensions (windows git ui) I need a command to count all commits. I
now use this command:
git.cmd rev-list --all --abbrev-commit | wc -l

This works perfect but its very slow in big repositories. Is there a faster
way to count the commits?

Henk
-- 
View this message in context: http://n2.nabble.com/Quick-command-to-count-commits-tp2118851p2118851.html
Sent from the git mailing list archive at Nabble.com.

^ permalink raw reply

* Re: JGit vs. Git
From: Shawn O. Pearce @ 2009-01-06 18:45 UTC (permalink / raw)
  To: Vagmi Mudumbai; +Cc: git
In-Reply-To: <a55cfe9d0901052250k2be203dfvb0b437a523f2cecc@mail.gmail.com>

Vagmi Mudumbai <vagmi.mudumbai@gmail.com> wrote:
> 1) Is JGit a drop in replacement of Git? In sense, if I were to pack
> in an SCM with my app, Can I pack jgit instead of C Git?

As Robin said, its not a full drop-in replacement.  That said, its
getting there.  We're now only really missing patch application,
diff generation, merge support, and submodule support.  Most of
the JGit core can handle submodules by skipping over them during
object traversal, but the part that talks to the filesystem to do
a checkout doesn't recognize them.

JGit is mostly a library, not a command line replacement.  But it
does have commands like "jgit init", "jgit clone", "jgit fetch",
"jgit push", even some server side tools like "jgit daemon",
"jgit receive-pack" and "jgit upload-pack".

To be honest I don't think anyone actually uses the command line
pgm stuff to work with Git.  Anyone using JGit is actually running
it embedded in some type of application like an IDE plugin or a
server like Gerrit.

> 2) I noticed that there are no 'add' and 'commit' commands (at least
> from the source) in the org.spearce.git.pgm project. I am looking at
> the repo.or.cz/egit.git repo. I had a brief look at the
> lib/GitIndex.java and lib/Repository.java. GitIndex has the add
> methods to add files/entries to the index. I am still stumped on how
> commits can be done with JGit. Any help is hugely appreciated.

As Robin said, look at the unit tests.  Basically you want to use
the Commit class to populate out the data fields, then pass it off to
an ObjectWriter instance to store it into the tree.  Finally you'll
need to use a RefUpdate (obtained from Repository's updateRef method)
to store the new ObjectId of that Commit into a ref like HEAD.

Unfortunately we have two commit representations in JGit.  If you
are trying to read data from a Repository the RevCommit (obtained
from a RevWalk) is orders of magnitude faster than the Commit class.

> I am working on Windows with msysGit behind a HTTP Proxy. (Life cant
> get worse, I guess.) . I planned on using grit via JRuby but grit uses
> fork which is not available on funny platforms like windows. And JRuby
> guys do not have any plan on supporting fork even on platforms on
> which for is supported. If JGit is a pure Java based implementation of
> Git with more or less the same functionality, then my work becomes a
> lot easier.

I plan on writing patch apply sometime this quarter I think.  I have
most of what I need to rip a patch apart and inspect it prior to
application, now I just need to line it up onto the base object
and issue the output version.

Diff might also come in the next few months.  Dscho has a nice
prototype working, but there's still some work to be done on it.

-- 
Shawn.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox