* [PATCH/RFC v2 0/7] git checkout: more cleanups, optimisation, less lstat() calls
@ 2009-02-01 20:23 Kjetil Barvik
2009-02-01 20:23 ` [PATCH/RFC v2 1/7] symlinks.c: small cleanup and optimisation Kjetil Barvik
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: Kjetil Barvik @ 2009-02-01 20:23 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Kjetil Barvik
Changes since v1
--- patch 1/7 ---
Kept the "match_len >= cache.len" trick only inside the
longest_match_lstat_cache() function. Also removed the arguments to
reset_lstat_cache()
--- patch 2/7 ---
Applied all comments from Junio
--- patch 3/7 ---
Kept long format strings in printf() calls, even if source lines get
longer than 80 chars. The patch touch less source lines because of
this.
--- patch 4/7 ---
Added sanity check to the if-test under which fstat() is called.
--- patch 5/7 ---
Updated commit log message
--- patch 7/7 ---
New patch which reduces the number of rmdir() calls significantly.
When tested with 'git chekcout -q my-v2.6.27' (checkout from Linux
v2.6.25 to v2.6.27), the numbers look like this before this patch:
rmdir 4532 3.314% OK: 174 NOT: 4358 0.263268 sec 58 usec/call
strings 4532 tot 354 uniq 12.802 /uniq 0.263268 sec 58 usec/call
errors 4358 tot 347 uniq 12.559 /uniq 0.249116 sec 57 usec/call
233 5.141% OK: 1 NOT: 232 "include/asm-powerpc"
187 4.126% OK: 1 NOT: 186 "include/asm-sh"
172 3.795% OK: 0 NOT: 172 "include/asm-arm"
169 3.729% OK: 1 NOT: 168 "include/asm-sparc64"
161 3.553% OK: 1 NOT: 160 "include/asm-sparc"
144 3.177% OK: 1 NOT: 143 "include/asm-ia64"
143 3.155% OK: 1 NOT: 142 "include/asm-m68knommu"
133 2.935% OK: 1 NOT: 132 "include/asm-alpha"
126 2.780% OK: 1 NOT: 125 "include/asm-s390"
122 2.692% OK: 1 NOT: 121 "include/asm-v850"
Notice the nice change in the top 10 list after this patch:
rmdir 331 0.255% OK: 174 NOT: 157 0.033435 sec 101 usec/call
strings 331 tot 331 uniq 1.000 /uniq 0.033435 sec 101 usec/call
errors 157 tot 157 uniq 1.000 /uniq 0.016300 sec 104 usec/call
1 0.302% OK: 0 NOT: 1 "Documentation"
1 0.302% OK: 0 NOT: 1 "Documentation/filesystems/configfs"
1 0.302% OK: 0 NOT: 1 "Documentation/firmware_class"
1 0.302% OK: 1 NOT: 0 "Documentation/hrtimer"
1 0.302% OK: 1 NOT: 0 "Documentation/hrtimers"
1 0.302% OK: 0 NOT: 1 "Documentation/i2c/busses"
1 0.302% OK: 1 NOT: 0 "Documentation/i386"
1 0.302% OK: 0 NOT: 1 "Documentation/networking"
1 0.302% OK: 0 NOT: 1 "Documentation/power"
1 0.302% OK: 0 NOT: 1 "Documentation/powerpc"
Kjetil Barvik (7):
symlinks.c: small cleanup and optimisation
remove some memcpy() and strchr() calls inside create_directories()
cleanup of write_entry() in entry.c
write_entry(): use fstat() instead of lstat() when file is open
combine-diff.c: remove a call to fstat() inside show_patch_diff()
lstat_cache(): print a warning if doing ping-pong between cache types
unpack-trees.c: introduce schedule_dir_for_removal()
combine-diff.c | 4 +--
entry.c | 106 +++++++++++++++++++++++++++-----------------------------
symlinks.c | 67 +++++++++++++++++++++++++----------
unpack-trees.c | 97 ++++++++++++++++++++++++++++++++++++++-------------
4 files changed, 172 insertions(+), 102 deletions(-)
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH/RFC v2 1/7] symlinks.c: small cleanup and optimisation
2009-02-01 20:23 [PATCH/RFC v2 0/7] git checkout: more cleanups, optimisation, less lstat() calls Kjetil Barvik
@ 2009-02-01 20:23 ` Kjetil Barvik
2009-02-01 20:23 ` [PATCH/RFC v2 2/7] remove some memcpy() and strchr() calls inside create_directories() Kjetil Barvik
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Kjetil Barvik @ 2009-02-01 20:23 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Kjetil Barvik
Simplify the if-else test in longest_match_lstat_cache() such that we
only have one simple if test. Instead of testing for 'i == cache.len'
or 'i == len', we transform this to a common test for 'i == max_len'.
And to further optimise we use 'i >= max_len' instead of 'i ==
max_len', the reason is that it is now the exact opposite of one part
inside the while-loop termination expression 'i < max_len && name[i]
== cache.path[i]', and then the compiler can probably reuse a test
instruction from it.
We also throw away the arguments to reset_lstat_cache(), such that all
the safeguard logic inside lstat_cache() is handled at one place.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
symlinks.c | 44 ++++++++++++++++++++++++--------------------
1 files changed, 24 insertions(+), 20 deletions(-)
diff --git a/symlinks.c b/symlinks.c
index f262b7c..ae57e56 100644
--- a/symlinks.c
+++ b/symlinks.c
@@ -25,27 +25,30 @@ static inline int longest_match_lstat_cache(int len, const char *name,
}
i++;
}
- /* Is the cached path string a substring of 'name'? */
- if (i == cache.len && cache.len < len && name[cache.len] == '/') {
- match_len_prev = match_len;
- match_len = cache.len;
- /* Is 'name' a substring of the cached path string? */
- } else if ((i == len && len < cache.len && cache.path[len] == '/') ||
- (i == len && len == cache.len)) {
+ /*
+ * Is the cached path string a substring of 'name', is 'name'
+ * a substring of the cached path string, or is 'name' and the
+ * cached path string the exact same string?
+ */
+ if (i >= max_len && ((len > cache.len && name[cache.len] == '/') ||
+ (len < cache.len && cache.path[len] == '/') ||
+ (len == cache.len))) {
match_len_prev = match_len;
- match_len = len;
+ match_len = i;
}
*previous_slash = match_len_prev;
return match_len;
}
-static inline void reset_lstat_cache(int track_flags, int prefix_len_stat_func)
+static inline void reset_lstat_cache(void)
{
cache.path[0] = '\0';
cache.len = 0;
cache.flags = 0;
- cache.track_flags = track_flags;
- cache.prefix_len_stat_func = prefix_len_stat_func;
+ /*
+ * The track_flags and prefix_len_stat_func members is only
+ * set by the safeguard rule inside lstat_cache()
+ */
}
#define FL_DIR (1 << 0)
@@ -77,11 +80,13 @@ static int lstat_cache(int len, const char *name,
if (cache.track_flags != track_flags ||
cache.prefix_len_stat_func != prefix_len_stat_func) {
/*
- * As a safeguard we clear the cache if the values of
- * track_flags and/or prefix_len_stat_func does not
- * match with the last supplied values.
+ * As a safeguard rule we clear the cache if the
+ * values of track_flags and/or prefix_len_stat_func
+ * does not match with the last supplied values.
*/
- reset_lstat_cache(track_flags, prefix_len_stat_func);
+ reset_lstat_cache();
+ cache.track_flags = track_flags;
+ cache.prefix_len_stat_func = prefix_len_stat_func;
match_len = last_slash = 0;
} else {
/*
@@ -153,7 +158,7 @@ static int lstat_cache(int len, const char *name,
cache.path[last_slash] = '\0';
cache.len = last_slash;
cache.flags = save_flags;
- } else if (track_flags & FL_DIR &&
+ } else if ((track_flags & FL_DIR) &&
last_slash_dir > 0 && last_slash_dir <= PATH_MAX) {
/*
* We have a separate test for the directory case,
@@ -170,7 +175,7 @@ static int lstat_cache(int len, const char *name,
cache.len = last_slash_dir;
cache.flags = FL_DIR;
} else {
- reset_lstat_cache(track_flags, prefix_len_stat_func);
+ reset_lstat_cache();
}
return ret_flags;
}
@@ -190,8 +195,7 @@ void invalidate_lstat_cache(int len, const char *name)
cache.len = previous_slash;
cache.flags = FL_DIR;
} else
- reset_lstat_cache(cache.track_flags,
- cache.prefix_len_stat_func);
+ reset_lstat_cache();
}
}
@@ -200,7 +204,7 @@ void invalidate_lstat_cache(int len, const char *name)
*/
void clear_lstat_cache(void)
{
- reset_lstat_cache(0, 0);
+ reset_lstat_cache();
}
#define USE_ONLY_LSTAT 0
--
1.6.1.349.g99fa5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH/RFC v2 2/7] remove some memcpy() and strchr() calls inside create_directories()
2009-02-01 20:23 [PATCH/RFC v2 0/7] git checkout: more cleanups, optimisation, less lstat() calls Kjetil Barvik
2009-02-01 20:23 ` [PATCH/RFC v2 1/7] symlinks.c: small cleanup and optimisation Kjetil Barvik
@ 2009-02-01 20:23 ` Kjetil Barvik
2009-02-01 20:23 ` [PATCH/RFC v2 3/7] cleanup of write_entry() in entry.c Kjetil Barvik
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Kjetil Barvik @ 2009-02-01 20:23 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Kjetil Barvik
Remove the call to memcpy() and strchr() for each path component
tested, and instead add each path component as we go forward inside
the while-loop.
Impact: small optimisation
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
* Kjetil Barvik <barvik@broadpark.no> writes:
| OK, maybe I instead should have tried to merge the function
| create_directories() with the safe_create_leading_directories() and
| *_const() functions? What do pepople think?
* Junio C Hamano <gitster@pobox.com> writes
| Strictly speaking, the safe_create_leading_* functions are meant to
| work on paths inside $GIT_DIR and are not meant for paths inside the
| work tree, which is this function is about. Their semantics are
| different with respect to adjust_shared_perm().
Ok, I think I keep the create_directories() function as is. Also
because I think a shared "create_directories()" function could maybe
make the calls to the lstat_cache() function less likely to be
sorted alphabetically, and therefore be less cache effective.
| I see existing (mis)uses of the safe_create_leading_* in
| builtin-apply.c, builtin-clone.c (the one that creates the
| work_tree, the other one is Ok), merge-recursive.c (both call sites)
| that are used to touch the work tree, but all places that create
| regular files in the work tree do not run adjust_shared_perm() but
| simply honor the user's umask.
Inside update_file_flags() in merge-recursive.c I see a call to
make_room_for_path() (if update_wd is non-zero) which calls
safe_create_leading_directories_const().
If we get lots of calls to make_room_for_path() in some cases, maybe
we will also save some stat() calls if we fix the call to
safe_create_*() at this place?
Remember that safe_create_*() calls stat() on each path component to
make sure that the component exists, same as create_directories()
used to do before.
entry.c | 23 ++++++++++++++---------
1 files changed, 14 insertions(+), 9 deletions(-)
diff --git a/entry.c b/entry.c
index 05aa58d..1879b83 100644
--- a/entry.c
+++ b/entry.c
@@ -2,15 +2,19 @@
#include "blob.h"
#include "dir.h"
-static void create_directories(const char *path, const struct checkout *state)
+static void create_directories(const char *path, int path_len,
+ const struct checkout *state)
{
- int len = strlen(path);
- char *buf = xmalloc(len + 1);
- const char *slash = path;
-
- while ((slash = strchr(slash+1, '/')) != NULL) {
- len = slash - path;
- memcpy(buf, path, len);
+ char *buf = xmalloc(path_len + 1);
+ int len = 0;
+
+ while (len < path_len) {
+ do {
+ buf[len] = path[len];
+ len++;
+ } while (len < path_len && path[len] != '/');
+ if (len >= path_len)
+ break;
buf[len] = 0;
/*
@@ -190,6 +194,7 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *t
memcpy(path, state->base_dir, len);
strcpy(path + len, ce->name);
+ len += ce_namelen(ce);
if (!lstat(path, &st)) {
unsigned changed = ce_match_stat(ce, &st, CE_MATCH_IGNORE_VALID);
@@ -218,6 +223,6 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *t
return error("unable to unlink old '%s' (%s)", path, strerror(errno));
} else if (state->not_new)
return 0;
- create_directories(path, state);
+ create_directories(path, len, state);
return write_entry(ce, path, state, 0);
}
--
1.6.1.349.g99fa5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH/RFC v2 3/7] cleanup of write_entry() in entry.c
2009-02-01 20:23 [PATCH/RFC v2 0/7] git checkout: more cleanups, optimisation, less lstat() calls Kjetil Barvik
2009-02-01 20:23 ` [PATCH/RFC v2 1/7] symlinks.c: small cleanup and optimisation Kjetil Barvik
2009-02-01 20:23 ` [PATCH/RFC v2 2/7] remove some memcpy() and strchr() calls inside create_directories() Kjetil Barvik
@ 2009-02-01 20:23 ` Kjetil Barvik
2009-02-01 20:23 ` [PATCH/RFC v2 4/7] write_entry(): use fstat() instead of lstat() when file is open Kjetil Barvik
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Kjetil Barvik @ 2009-02-01 20:23 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Kjetil Barvik
The switch-cases for S_IFREG and S_IFLNK was so similar that it will
be better to do some cleanup and use the common parts of it.
And the entry.c file should now be clean for 'gcc -Wextra' warnings.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
entry.c | 75 +++++++++++++++++++++++++-------------------------------------
1 files changed, 30 insertions(+), 45 deletions(-)
diff --git a/entry.c b/entry.c
index 1879b83..0f1f3f0 100644
--- a/entry.c
+++ b/entry.c
@@ -78,7 +78,7 @@ static int create_file(const char *path, unsigned int mode)
return open(path, O_WRONLY | O_CREAT | O_EXCL, mode);
}
-static void *read_blob_entry(struct cache_entry *ce, const char *path, unsigned long *size)
+static void *read_blob_entry(struct cache_entry *ce, unsigned long *size)
{
enum object_type type;
void *new = read_sha1_file(ce->sha1, &type, size);
@@ -93,36 +93,51 @@ static void *read_blob_entry(struct cache_entry *ce, const char *path, unsigned
static int write_entry(struct cache_entry *ce, char *path, const struct checkout *state, int to_tempfile)
{
- int fd;
- long wrote;
-
- switch (ce->ce_mode & S_IFMT) {
- char *new;
- struct strbuf buf;
- unsigned long size;
-
+ unsigned int ce_mode_s_ifmt = ce->ce_mode & S_IFMT;
+ int fd, ret;
+ char *new;
+ struct strbuf buf = STRBUF_INIT;
+ unsigned long size;
+ size_t wrote, newsize = 0;
+
+ switch (ce_mode_s_ifmt) {
case S_IFREG:
- new = read_blob_entry(ce, path, &size);
+ case S_IFLNK:
+ new = read_blob_entry(ce, &size);
if (!new)
return error("git checkout-index: unable to read sha1 file of %s (%s)",
path, sha1_to_hex(ce->sha1));
+ if (ce_mode_s_ifmt == S_IFLNK && has_symlinks && !to_tempfile) {
+ ret = symlink(new, path);
+ free(new);
+ if (ret)
+ return error("git checkout-index: unable to create symlink %s (%s)",
+ path, strerror(errno));
+ break;
+ }
+
/*
* Convert from git internal format to working tree format
*/
- strbuf_init(&buf, 0);
- if (convert_to_working_tree(ce->name, new, size, &buf)) {
- size_t newsize = 0;
+ if (ce_mode_s_ifmt == S_IFREG &&
+ convert_to_working_tree(ce->name, new, size, &buf)) {
free(new);
new = strbuf_detach(&buf, &newsize);
size = newsize;
}
if (to_tempfile) {
- strcpy(path, ".merge_file_XXXXXX");
+ if (ce_mode_s_ifmt == S_IFREG)
+ strcpy(path, ".merge_file_XXXXXX");
+ else
+ strcpy(path, ".merge_link_XXXXXX");
fd = mkstemp(path);
- } else
+ } else if (ce_mode_s_ifmt == S_IFREG) {
fd = create_file(path, ce->ce_mode);
+ } else {
+ fd = create_file(path, 0666);
+ }
if (fd < 0) {
free(new);
return error("git checkout-index: unable to create file %s (%s)",
@@ -135,36 +150,6 @@ static int write_entry(struct cache_entry *ce, char *path, const struct checkout
if (wrote != size)
return error("git checkout-index: unable to write file %s", path);
break;
- case S_IFLNK:
- new = read_blob_entry(ce, path, &size);
- if (!new)
- return error("git checkout-index: unable to read sha1 file of %s (%s)",
- path, sha1_to_hex(ce->sha1));
- if (to_tempfile || !has_symlinks) {
- if (to_tempfile) {
- strcpy(path, ".merge_link_XXXXXX");
- fd = mkstemp(path);
- } else
- fd = create_file(path, 0666);
- if (fd < 0) {
- free(new);
- return error("git checkout-index: unable to create "
- "file %s (%s)", path, strerror(errno));
- }
- wrote = write_in_full(fd, new, size);
- close(fd);
- free(new);
- if (wrote != size)
- return error("git checkout-index: unable to write file %s",
- path);
- } else {
- wrote = symlink(new, path);
- free(new);
- if (wrote)
- return error("git checkout-index: unable to create "
- "symlink %s (%s)", path, strerror(errno));
- }
- break;
case S_IFGITLINK:
if (to_tempfile)
return error("git checkout-index: cannot create temporary subproject %s", path);
--
1.6.1.349.g99fa5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH/RFC v2 4/7] write_entry(): use fstat() instead of lstat() when file is open
2009-02-01 20:23 [PATCH/RFC v2 0/7] git checkout: more cleanups, optimisation, less lstat() calls Kjetil Barvik
` (2 preceding siblings ...)
2009-02-01 20:23 ` [PATCH/RFC v2 3/7] cleanup of write_entry() in entry.c Kjetil Barvik
@ 2009-02-01 20:23 ` Kjetil Barvik
2009-02-01 20:23 ` [PATCH/RFC v2 5/7] combine-diff.c: remove a call to fstat() inside show_patch_diff() Kjetil Barvik
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Kjetil Barvik @ 2009-02-01 20:23 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Kjetil Barvik
Currently inside write_entry() we do an lstat(path, &st) call on a
file which have just been opened inside the exact same function. It
should be better to call fstat(fd, &st) on the file while it is open,
and it should be at least as fast as the lstat() method.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
entry.c | 12 +++++++++---
1 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/entry.c b/entry.c
index 0f1f3f0..35054a5 100644
--- a/entry.c
+++ b/entry.c
@@ -94,11 +94,12 @@ static void *read_blob_entry(struct cache_entry *ce, unsigned long *size)
static int write_entry(struct cache_entry *ce, char *path, const struct checkout *state, int to_tempfile)
{
unsigned int ce_mode_s_ifmt = ce->ce_mode & S_IFMT;
- int fd, ret;
+ int fd, ret, fstat_done = 0;
char *new;
struct strbuf buf = STRBUF_INIT;
unsigned long size;
size_t wrote, newsize = 0;
+ struct stat st;
switch (ce_mode_s_ifmt) {
case S_IFREG:
@@ -145,6 +146,11 @@ static int write_entry(struct cache_entry *ce, char *path, const struct checkout
}
wrote = write_in_full(fd, new, size);
+ /* use fstat() only when path == ce->name */
+ if (state->refresh_cache && !to_tempfile && !state->base_dir_len) {
+ fstat(fd, &st);
+ fstat_done = 1;
+ }
close(fd);
free(new);
if (wrote != size)
@@ -161,8 +167,8 @@ static int write_entry(struct cache_entry *ce, char *path, const struct checkout
}
if (state->refresh_cache) {
- struct stat st;
- lstat(ce->name, &st);
+ if (!fstat_done)
+ lstat(ce->name, &st);
fill_stat_cache_info(ce, &st);
}
return 0;
--
1.6.1.349.g99fa5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH/RFC v2 5/7] combine-diff.c: remove a call to fstat() inside show_patch_diff()
2009-02-01 20:23 [PATCH/RFC v2 0/7] git checkout: more cleanups, optimisation, less lstat() calls Kjetil Barvik
` (3 preceding siblings ...)
2009-02-01 20:23 ` [PATCH/RFC v2 4/7] write_entry(): use fstat() instead of lstat() when file is open Kjetil Barvik
@ 2009-02-01 20:23 ` Kjetil Barvik
2009-02-01 20:23 ` [PATCH/RFC v2 6/7] lstat_cache(): print a warning if doing ping-pong between cache types Kjetil Barvik
2009-02-01 20:23 ` [PATCH/RFC v2 7/7] unpack-trees.c: introduce schedule_dir_for_removal() Kjetil Barvik
6 siblings, 0 replies; 8+ messages in thread
From: Kjetil Barvik @ 2009-02-01 20:23 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Kjetil Barvik
Currently inside show_patch_diff() we have an fstat() call after an
ok lstat() call. Since before the call to fstat() we have already
tested for the link case with S_ISLNK(), the fstat() can be removed.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
combine-diff.c | 4 +---
1 files changed, 1 insertions(+), 3 deletions(-)
diff --git a/combine-diff.c b/combine-diff.c
index bccc018..4300319 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -713,9 +713,7 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
result_size = buf.len;
result = strbuf_detach(&buf, NULL);
elem->mode = canon_mode(st.st_mode);
- }
- else if (0 <= (fd = open(elem->path, O_RDONLY)) &&
- !fstat(fd, &st)) {
+ } else if (0 <= (fd = open(elem->path, O_RDONLY))) {
size_t len = xsize_t(st.st_size);
ssize_t done;
int is_file, i;
--
1.6.1.349.g99fa5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH/RFC v2 6/7] lstat_cache(): print a warning if doing ping-pong between cache types
2009-02-01 20:23 [PATCH/RFC v2 0/7] git checkout: more cleanups, optimisation, less lstat() calls Kjetil Barvik
` (4 preceding siblings ...)
2009-02-01 20:23 ` [PATCH/RFC v2 5/7] combine-diff.c: remove a call to fstat() inside show_patch_diff() Kjetil Barvik
@ 2009-02-01 20:23 ` Kjetil Barvik
2009-02-01 20:23 ` [PATCH/RFC v2 7/7] unpack-trees.c: introduce schedule_dir_for_removal() Kjetil Barvik
6 siblings, 0 replies; 8+ messages in thread
From: Kjetil Barvik @ 2009-02-01 20:23 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Kjetil Barvik
This is a debug patch which is only to be used while the lstat_cache()
is in the test stage, and should be removed/reverted before the final
relase.
I think it should be useful to catch these warnings, as I it could be
an indication of that the cache would not be very effective if it is
doing ping-pong by switching between different cache types too many
times.
Also, if someone is experimenting with the lstat_cache(), this patch
will maybe be useful while debugging.
If someone is able to trigger the warning, then send a mail to the GIT
mailing list, containing the first 15 lines of the warning, and a
short description of the GIT commands to trigger the warnings.
I hope someone is willing to use this patch for a while, to be able to
catch possible ping-pong's.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
symlinks.c | 23 +++++++++++++++++++++++
1 files changed, 23 insertions(+), 0 deletions(-)
diff --git a/symlinks.c b/symlinks.c
index ae57e56..fe4b188 100644
--- a/symlinks.c
+++ b/symlinks.c
@@ -51,6 +51,11 @@ static inline void reset_lstat_cache(void)
*/
}
+#define SWITCHES_BEFORE_WARNING 10
+static unsigned int cache_switches, number_of_warnings;
+static unsigned int current_cache_func, last_cache_func;
+static unsigned int total_calls;
+
#define FL_DIR (1 << 0)
#define FL_NOENT (1 << 1)
#define FL_SYMLINK (1 << 2)
@@ -77,6 +82,7 @@ static int lstat_cache(int len, const char *name,
int match_flags, ret_flags, save_flags, max_len, ret;
struct stat st;
+ total_calls++;
if (cache.track_flags != track_flags ||
cache.prefix_len_stat_func != prefix_len_stat_func) {
/*
@@ -88,6 +94,17 @@ static int lstat_cache(int len, const char *name,
cache.track_flags = track_flags;
cache.prefix_len_stat_func = prefix_len_stat_func;
match_len = last_slash = 0;
+ cache_switches++;
+ if (cache_switches > SWITCHES_BEFORE_WARNING) {
+ if (number_of_warnings < 10 || number_of_warnings % 1000 == 0)
+ printf("warning from %s:%d cache_switches:%u > %u "\
+ "(current:%u last:%u total:%u)\n",
+ __FILE__, __LINE__,
+ cache_switches, SWITCHES_BEFORE_WARNING,
+ current_cache_func, last_cache_func,
+ total_calls);
+ number_of_warnings++;
+ }
} else {
/*
* Check to see if we have a match from the cache for
@@ -214,6 +231,8 @@ void clear_lstat_cache(void)
*/
int has_symlink_leading_path(int len, const char *name)
{
+ last_cache_func = current_cache_func;
+ current_cache_func = 1;
return lstat_cache(len, name,
FL_SYMLINK|FL_DIR, USE_ONLY_LSTAT) &
FL_SYMLINK;
@@ -225,6 +244,8 @@ int has_symlink_leading_path(int len, const char *name)
*/
int has_symlink_or_noent_leading_path(int len, const char *name)
{
+ last_cache_func = current_cache_func;
+ current_cache_func = 2;
return lstat_cache(len, name,
FL_SYMLINK|FL_NOENT|FL_DIR, USE_ONLY_LSTAT) &
(FL_SYMLINK|FL_NOENT);
@@ -239,6 +260,8 @@ int has_symlink_or_noent_leading_path(int len, const char *name)
*/
int has_dirs_only_path(int len, const char *name, int prefix_len)
{
+ last_cache_func = current_cache_func;
+ current_cache_func = 3;
return lstat_cache(len, name,
FL_DIR|FL_FULLPATH, prefix_len) &
FL_DIR;
--
1.6.1.349.g99fa5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH/RFC v2 7/7] unpack-trees.c: introduce schedule_dir_for_removal()
2009-02-01 20:23 [PATCH/RFC v2 0/7] git checkout: more cleanups, optimisation, less lstat() calls Kjetil Barvik
` (5 preceding siblings ...)
2009-02-01 20:23 ` [PATCH/RFC v2 6/7] lstat_cache(): print a warning if doing ping-pong between cache types Kjetil Barvik
@ 2009-02-01 20:23 ` Kjetil Barvik
6 siblings, 0 replies; 8+ messages in thread
From: Kjetil Barvik @ 2009-02-01 20:23 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Kjetil Barvik
Currently inside unlink_entry() if we get a successful removal of one
file with unlink(), we try to remove the leading directories each and
every time. So if one directory containing 200 files is moved to an
other location we get 199 failed calls to rmdir() and 1 successful
call.
To fix this and to avoid some unnecessary calls to rmdir(), we only
schedule each directory for removal and we wait much longer before we
do the actually call to rmdir().
Since the unlink_entry() function is called with alphabetically sorted
names, this new function end up being very effective to avoid
unnecessary calls to rmdir(). In some cases over 95% of all calls to
rmdir() is removed with this patch.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
I noticed that I had to copy-and-paste almost all lines from the
longest_match_lstat_cache() in symlinks.c into this new function, so
maybe I should do the following:
Move the 3 new functions below inside the same file (dir.c?) as the
lstat_cache() functions, rename longest_match_lstat_cache() to
longest_dir_match(), do some cleanup/fix'es, and let it be called
from the 3 functions which currently need it.
Good idea?
unpack-trees.c | 97 ++++++++++++++++++++++++++++++++++++++++++--------------
1 files changed, 73 insertions(+), 24 deletions(-)
diff --git a/unpack-trees.c b/unpack-trees.c
index e547282..5c12eca 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -52,36 +52,84 @@ static void add_entry(struct unpack_trees_options *o, struct cache_entry *ce,
add_index_entry(&o->result, new, ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE|ADD_CACHE_SKIP_DFCHECK);
}
-/* Unlink the last component and attempt to remove leading
- * directories, in case this unlink is the removal of the
- * last entry in the directory -- empty directories are removed.
+static char path_buf[PATH_MAX];
+static unsigned int path_len;
+
+static void do_remove_scheduled_dirs(unsigned int new_len)
+{
+ while (path_len > new_len) {
+ path_buf[path_len] = '\0';
+ if (rmdir(path_buf))
+ break;
+ do {
+ path_len--;
+ } while (path_len > new_len && path_buf[path_len] != '/');
+ }
+ path_len = new_len;
+ return;
+}
+
+static void schedule_dir_for_removal(const char *name, unsigned int len)
+{
+ unsigned int i = 0, match_len = 0, max_len, last_slash;
+
+ /* Find longest dir-match between 'name' and 'path' */
+ max_len = len < path_len ? len : path_len;
+ while (i < max_len && name[i] == path_buf[i]) {
+ if (name[i] == '/')
+ match_len = i;
+ i++;
+ }
+ if (i >= max_len && ((len > path_len && name[path_len] == '/') ||
+ (len < path_len && path_buf[len] == '/') ||
+ (len == path_len)))
+ match_len = i;
+
+ /* Find the last slash inside 'name' */
+ last_slash = match_len;
+ while (i < len) {
+ if (name[i] == '/')
+ last_slash = i;
+ i++;
+ }
+
+ /*
+ * If we are about to go down the directory tree, we check if
+ * we must first go upwards the tree, such that we then can
+ * remove possible empty directories as we go upwards.
+ */
+ if (match_len < last_slash && match_len < path_len)
+ do_remove_scheduled_dirs(match_len);
+ /*
+ * If we go deeper down the directory tree, we only need to
+ * save the new path components as we go down.
+ */
+ if (match_len < last_slash) {
+ memmove(&path_buf[match_len], &name[match_len],
+ last_slash - match_len);
+ path_len = last_slash;
+ }
+ return;
+}
+
+#define ALL_DIRS 0
+static void remove_scheduled_dirs(void)
+{
+ do_remove_scheduled_dirs(ALL_DIRS);
+ return;
+}
+
+/*
+ * Unlink the last component and schedule the leading directories for
+ * removal, such that empty directories get removed.
*/
static void unlink_entry(struct cache_entry *ce)
{
- char *cp, *prev;
- char *name = ce->name;
-
if (has_symlink_or_noent_leading_path(ce_namelen(ce), ce->name))
return;
- if (unlink(name))
+ if (unlink(ce->name))
return;
- prev = NULL;
- while (1) {
- int status;
- cp = strrchr(name, '/');
- if (prev)
- *prev = '/';
- if (!cp)
- break;
-
- *cp = 0;
- status = rmdir(name);
- if (status) {
- *cp = '/';
- break;
- }
- prev = cp;
- }
+ schedule_dir_for_removal(ce->name, ce_namelen(ce));
}
static struct checkout state;
@@ -117,6 +165,7 @@ static int check_updates(struct unpack_trees_options *o)
continue;
}
}
+ remove_scheduled_dirs();
for (i = 0; i < index->cache_nr; i++) {
struct cache_entry *ce = index->cache[i];
--
1.6.1.349.g99fa5
^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-02-01 20:25 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-01 20:23 [PATCH/RFC v2 0/7] git checkout: more cleanups, optimisation, less lstat() calls Kjetil Barvik
2009-02-01 20:23 ` [PATCH/RFC v2 1/7] symlinks.c: small cleanup and optimisation Kjetil Barvik
2009-02-01 20:23 ` [PATCH/RFC v2 2/7] remove some memcpy() and strchr() calls inside create_directories() Kjetil Barvik
2009-02-01 20:23 ` [PATCH/RFC v2 3/7] cleanup of write_entry() in entry.c Kjetil Barvik
2009-02-01 20:23 ` [PATCH/RFC v2 4/7] write_entry(): use fstat() instead of lstat() when file is open Kjetil Barvik
2009-02-01 20:23 ` [PATCH/RFC v2 5/7] combine-diff.c: remove a call to fstat() inside show_patch_diff() Kjetil Barvik
2009-02-01 20:23 ` [PATCH/RFC v2 6/7] lstat_cache(): print a warning if doing ping-pong between cache types Kjetil Barvik
2009-02-01 20:23 ` [PATCH/RFC v2 7/7] unpack-trees.c: introduce schedule_dir_for_removal() Kjetil Barvik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).