* [PATCH/RFC v5 0/1] git checkout: optimise away lots of lstat() calls
@ 2009-01-09 19:05 Kjetil Barvik
2009-01-09 19:05 ` [PATCH/RFC v5 1/1] more cache effective symlink/directory detection Kjetil Barvik
0 siblings, 1 reply; 5+ messages in thread
From: Kjetil Barvik @ 2009-01-09 19:05 UTC (permalink / raw)
To: git; +Cc: Pete Harlan, Linus Torvalds, Junio C Hamano, Kjetil Barvik
Changes since version 4:
- After looking at the changes in create_directories() inside entry.c
for some more time, I got the idea to just use the lstat_cache()
instead of almost use a separate cache implementation of its own.
- To be able to implement that, I had to be able to tell the cache to
test the full length of input 'name'. I also had to be able to tell
the cache to use the stat() function instead of the lstat() function
for a given prefix length.
- Fixed a missed cache optimisation where you is not able to cache a
symlink or a none existing directory, but still wants to cache the
real directory part. Fixed a second missed cache optimisation where
the 'name' argument is a (complete) substring of the cache on a path
component basis.
- The cache can now also return LSTAT_REG for a regular file, but
please note that it can only cache the directory part of that file.
- Since the changes made to the create_directories() function now is
small and simple (just had to change an if-test, remove 2 unused
variables and update the comments), patch 2/2 is gone and included
into patch 1/1.
- More updates to the comments in the source code.
- Thanks to Pete Harlan for spelling fixes to the commit message!
I think that the patch starts to be in a good shape now, but please
comment on it, even if it is just a rename of a variable/function name
to increase the readability or a spelling fix!
For the fun of it, I did a test with 'git chekcout-index -q --all
--prefix=<a path containing 29 components>' from the my-v2.6.27 Linux
branch, and the savings was really huge: the numbers of lstat() and
stat() calls dropped from 778 145 to 26 674, and the real time also
dropped from 52 to 32 seconds on my laptop! :-)
-----
I have just started to clone some interesting Linux git trees to watch
the development more closely, and therefore also started to use git. I
noticed that 'git checkout' takes some time, and especially that the
'git checkout' command does lots and lots of lstat() calls.
After some more investigation and thinking, I have made 1 patch and
been able to optimise away over 40% of all lstat() calls in some cases
for the 'git checkout' command. Also, if you use a large path to the
'--prefix' argument to the 'git checkout-index' command, and you have
lots of files, the savings can be really huge!
The patch is against git master, and the git 'make test' test suite
still passes after the patch. To document the improvement, below is
some numbers, which compares before and after the patch. To reproduce
the numbers:
- git clone the Linux git tree to be able to get the Linux tags
'v2.6.25' and 'v2.6.27'.
- git checkout -b my-v2.6.27 v2.6.27
- git checkout -b my-v2.6.25 v2.6.25
Then, when the current branch is 'my-v2.6.25', do:
strace -o strace_to27 -T git checkout -q my-v2.6.27
And then pretty print the 'strace_to27' file. Below is the numbers
from the current git version (before the patch). Notice that we do an
lstat() call on the "arch" directory over 6000 times!
TOTAL 185151 100.000% OK:165544 NOT: 19607 11.136001 sec 60 usec/call
lstat64 120954 65.327% OK:107013 NOT: 13941 5.388727 sec 45 usec/call
strings 120954 tot 30163 uniq 4.010 /uniq 5.388727 sec 45 usec/call
files 61491 tot 28712 uniq 2.142 /uniq 2.740520 sec 45 usec/call
dirs 45522 tot 1436 uniq 31.701 /uniq 1.994448 sec 44 usec/call
errors 13941 tot 5189 uniq 2.687 /uniq 0.653759 sec 47 usec/call
6297 5.206% OK: 6297 NOT: 0 "arch"
4544 3.757% OK: 4544 NOT: 0 "drivers"
1816 1.501% OK: 1816 NOT: 0 "arch/arm"
1499 1.239% OK: 1499 NOT: 0 "include"
912 0.754% OK: 912 NOT: 0 "arch/powerpc"
764 0.632% OK: 764 NOT: 0 "fs"
746 0.617% OK: 746 NOT: 0 "drivers/net"
662 0.547% OK: 662 NOT: 0 "net"
652 0.539% OK: 325 NOT: 327 "arch/sparc/include"
636 0.526% OK: 636 NOT: 0 "drivers/media"
606 0.501% OK: 606 NOT: 0 "include/linux"
533 0.441% OK: 533 NOT: 0 "arch/sh"
522 0.432% OK: 260 NOT: 262 "arch/powerpc/include"
488 0.403% OK: 243 NOT: 245 "arch/sh/include"
413 0.341% OK: 413 NOT: 0 "arch/sparc"
390 0.322% OK: 390 NOT: 0 "arch/x86"
383 0.317% OK: 383 NOT: 0 "Documentation"
370 0.306% OK: 184 NOT: 186 "arch/ia64/include"
366 0.303% OK: 366 NOT: 0 "drivers/media/video"
348 0.288% OK: 173 NOT: 175 "arch/arm/include"
Here is the numbers after applying the patch. Notice how nice the top
20 entries list now looks!
TOTAL 134155 100.000% OK:122102 NOT: 12053 11.069389 sec 83 usec/call
lstat64 69876 52.086% OK: 63491 NOT: 6385 3.410007 sec 49 usec/call
strings 69876 tot 30163 uniq 2.317 /uniq 3.410007 sec 49 usec/call
files 61491 tot 28712 uniq 2.142 /uniq 3.023238 sec 49 usec/call
dirs 2000 tot 1436 uniq 1.393 /uniq 0.085953 sec 43 usec/call
errors 6385 tot 5189 uniq 1.230 /uniq 0.300816 sec 47 usec/call
4 0.006% OK: 4 NOT: 0 ".gitignore"
4 0.006% OK: 4 NOT: 0 ".mailmap"
4 0.006% OK: 4 NOT: 0 "CREDITS"
4 0.006% OK: 4 NOT: 0 "Documentation/00-INDEX"
4 0.006% OK: 4 NOT: 0 "Documentation/ABI/testing/sysfs-block"
4 0.006% OK: 4 NOT: 0 "Documentation/ABI/testing/sysfs-firmware-acpi"
4 0.006% OK: 4 NOT: 0 "Documentation/CodingStyle"
4 0.006% OK: 4 NOT: 0 "Documentation/DMA-API.txt"
4 0.006% OK: 4 NOT: 0 "Documentation/DMA-mapping.txt"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/Makefile"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/gadget.tmpl"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/kernel-api.tmpl"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/kernel-locking.tmpl"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/procfs-guide.tmpl"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/procfs_example.c"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/rapidio.tmpl"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/s390-drivers.tmpl"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/uio-howto.tmpl"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/videobook.tmpl"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/writing_usb_driver.tmpl"
Comments?
Kjetil Barvik (1):
more cache effective symlink/directory detection
builtin-add.c | 1 +
builtin-apply.c | 1 +
builtin-update-index.c | 1 +
cache.h | 24 +++++++-
diff-lib.c | 1 +
entry.c | 39 +++++-------
symlinks.c | 158 +++++++++++++++++++++++++++++++++++-------------
unpack-trees.c | 6 +-
8 files changed, 162 insertions(+), 69 deletions(-)
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH/RFC v5 1/1] more cache effective symlink/directory detection
2009-01-09 19:05 [PATCH/RFC v5 0/1] git checkout: optimise away lots of lstat() calls Kjetil Barvik
@ 2009-01-09 19:05 ` Kjetil Barvik
2009-01-10 10:11 ` René Scharfe
0 siblings, 1 reply; 5+ messages in thread
From: Kjetil Barvik @ 2009-01-09 19:05 UTC (permalink / raw)
To: git; +Cc: Pete Harlan, Linus Torvalds, Junio C Hamano, Kjetil Barvik
Changes includes the following:
- The cache functionality is more effective. Previously when A/B/C/D
was in the cache and A/B/C/E/file.c was called for, there was no
match at all from the cache. Now we use the fact that the paths
"A", "A/B" and "A/B/C" are already tested, and we only need to do an
lstat() call on "A/B/C/E".
- We only cache/store the last path regardless of its type. Since the
cache functionality is always used with alphabetically sorted names
(at least it seems so for me), there is no need to store both the
last symlink-leading path and the last real-directory path. Note
that if the cache is not called with (mostly) alphabetically sorted
names, neither the old, nor this new one, would be very effective.
- We also can cache the fact that a directory does not exist.
Previously we could end up doing lots of lstat() calls for a removed
directory which previously contained lots of files. Since we
already have simplified the cache functionality and only store the
last path (see above), this new functionality was easy to add.
- Previously, when symlink A/B/C/S was cached/stored in the
symlink-leading path, and A/B/C/file.c was called for, it was not
easy to use the fact that we already knew that the paths "A", "A/B"
and "A/B/C" are real directories. Since we now only store one
single path (the last one), we also get similar logic for free
regarding the new "non-existing-directory-cache".
- Avoid copying the first path components of the name 2 zillion times
when we test new path components. Since we always cache/store the
last path, we can copy each component as we test those directly into
the cache. Previously we ended up doing a memcpy() for the full
path/name right before each lstat() call, and when updating the
cache for each time we have tested a new path component.
- We also use less memory, that is, PATH_MAX bytes less memory on the
stack and PATH_MAX bytes less memory on the heap.
- Introduce a 3rd argument, 'unsigned int track_flags', to the
cache-test function, check_lstat_cache(). This new argument can be
used to tell the cache functionality which types of directories
should be cached. This argument can also be used to tell the cache
to always test the complete length of 'name' (add 'LSTAT_FULLPATH').
- Introduce a 4th argument, 'prefix_len_stat_func', which tells the
length of a prefix, where the cache should use the stat() function
instead of the lstat() function to test each path component. This
can for instance be useful at some places in the source code to
handle the --prefix argument to the 'git checkout-index' command.
- Also introduce a 'void clear_lstat_cache(void)' function, which
should be used to clean the cache before usage. If for instance,
you have changed the types of directories which should be cached,
the cache could contain a path which was not wanted.
We also start to use the lstat_cache() function inside the
'create_directories()' function inside entry.c, and we save really
huge amounts of calls to the stat()/lstat() functions in some cases.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
:100644 100644 719de8b... 870961e... M builtin-add.c
:100644 100644 a8f75ed... d3d001a... M builtin-apply.c
:100644 100644 5604977... 8907219... M builtin-update-index.c
:100644 100644 231c06d... 0bf31d3... M cache.h
:100644 100644 ae96c64... c9caa0e... M diff-lib.c
:100644 100644 aa2ee46... 293400c... M entry.c
:100644 100644 5a5e781... 314ba4f... M symlinks.c
:100644 100644 54f301d... c3d1429... M unpack-trees.c
builtin-add.c | 1 +
builtin-apply.c | 1 +
builtin-update-index.c | 1 +
cache.h | 24 +++++++-
diff-lib.c | 1 +
entry.c | 39 +++++-------
symlinks.c | 158 +++++++++++++++++++++++++++++++++++-------------
unpack-trees.c | 6 +-
8 files changed, 162 insertions(+), 69 deletions(-)
diff --git a/builtin-add.c b/builtin-add.c
index 719de8b0f2d2d831f326d948aa18700e5c474950..870961e8ca4e3d6f9333020083d0a232bccd542c 100644
--- a/builtin-add.c
+++ b/builtin-add.c
@@ -225,6 +225,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
argc = parse_options(argc, argv, builtin_add_options,
builtin_add_usage, 0);
+ clear_symlink_cache();
if (patch_interactive)
add_interactive = 1;
if (add_interactive)
diff --git a/builtin-apply.c b/builtin-apply.c
index a8f75ed3ed411d8cf7a3ec9dfefef7407c50f447..d3d001a96be6e502d6338af4467f7c313370d78e 100644
--- a/builtin-apply.c
+++ b/builtin-apply.c
@@ -3154,6 +3154,7 @@ int cmd_apply(int argc, const char **argv, const char *unused_prefix)
if (apply_default_whitespace)
parse_whitespace_option(apply_default_whitespace);
+ clear_symlink_cache();
for (i = 1; i < argc; i++) {
const char *arg = argv[i];
char *end;
diff --git a/builtin-update-index.c b/builtin-update-index.c
index 560497750586ec61be4e34de6dedd9c307129817..8907219fb9cb438113e29ee17854edb5dd4baa4d 100644
--- a/builtin-update-index.c
+++ b/builtin-update-index.c
@@ -581,6 +581,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
if (entries < 0)
die("cache corrupted");
+ clear_symlink_cache();
for (i = 1 ; i < argc; i++) {
const char *path = argv[i];
const char *p;
diff --git a/cache.h b/cache.h
index 231c06d7726b575f6e522d5b0c0fe43557e8c651..0bf31d3e8903a658927938f6da45c77dd289c94a 100644
--- a/cache.h
+++ b/cache.h
@@ -719,7 +719,29 @@ struct checkout {
};
extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath);
-extern int has_symlink_leading_path(int len, const char *name);
+
+#define LSTAT_REG (1u << 0)
+#define LSTAT_DIR (1u << 1)
+#define LSTAT_NOENT (1u << 2)
+#define LSTAT_SYMLINK (1u << 3)
+#define LSTAT_LSTATERR (1u << 4)
+#define LSTAT_ERR (1u << 5)
+#define LSTAT_FULLPATH (1u << 6)
+extern unsigned int lstat_cache(int len, const char *name,
+ unsigned int track_flags, int prefix_len_stat_func);
+extern void clear_lstat_cache(void);
+static inline unsigned int has_symlink_leading_path(int len, const char *name)
+{
+ return lstat_cache(len, name, LSTAT_SYMLINK|LSTAT_DIR, -1) &
+ LSTAT_SYMLINK;
+}
+#define clear_symlink_cache() clear_lstat_cache()
+static inline unsigned int has_symlink_or_noent_leading_path(int len, const char *name)
+{
+ return lstat_cache(len, name, LSTAT_SYMLINK|LSTAT_NOENT|LSTAT_DIR, -1) &
+ (LSTAT_SYMLINK|LSTAT_NOENT);
+}
+#define clear_symlink_or_noent_cache() clear_lstat_cache()
extern struct alternate_object_database {
struct alternate_object_database *next;
diff --git a/diff-lib.c b/diff-lib.c
index ae96c64ca209f4df9008198e8a04b160bed618c7..c9caa0e6ef0f4a8ee8b850869ef6d0f52b712385 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -69,6 +69,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
diff_unmerged_stage = 2;
entries = active_nr;
symcache[0] = '\0';
+ clear_symlink_cache();
for (i = 0; i < entries; i++) {
struct stat st;
unsigned int oldmode, newmode;
diff --git a/entry.c b/entry.c
index aa2ee46a84033585d8e07a585610c5a697af82c2..293400cf5be63fd66b797a68e17bf953c600fe99 100644
--- a/entry.c
+++ b/entry.c
@@ -8,35 +8,28 @@ static void create_directories(const char *path, const struct checkout *state)
const char *slash = path;
while ((slash = strchr(slash+1, '/')) != NULL) {
- struct stat st;
- int stat_status;
-
len = slash - path;
memcpy(buf, path, len);
buf[len] = 0;
- if (len <= state->base_dir_len)
- /*
- * checkout-index --prefix=<dir>; <dir> is
- * allowed to be a symlink to an existing
- * directory.
- */
- stat_status = stat(buf, &st);
- else
- /*
- * if there currently is a symlink, we would
- * want to replace it with a real directory.
- */
- stat_status = lstat(buf, &st);
-
- if (!stat_status && S_ISDIR(st.st_mode))
+ /* For 'checkout-index --prefix=<dir>', <dir> is
+ * allowed to be a symlink to an existing directory,
+ * therefore we must give 'state->base_dir_len' to the
+ * cache, such that we test path components of the
+ * prefix with stat() instead of lstat()
+ *
+ * We must also tell the cache to test the complete
+ * length of the buffer (the '|LSTAT_FULLPATH' part).
+ */
+ if (lstat_cache(len, buf, LSTAT_DIR|LSTAT_FULLPATH,
+ state->base_dir_len) &
+ LSTAT_DIR)
continue; /* ok, it is already a directory. */
- /*
- * We know stat_status == 0 means something exists
- * there and this mkdir would fail, but that is an
- * error codepath; we do not care, as we unlink and
- * mkdir again in such a case.
+ /* If this mkdir() would fail, it could be that there
+ * is already a symlink or something else exists
+ * there, therefore we then try to unlink it and try
+ * one more time to create the directory.
*/
if (mkdir(buf, 0777)) {
if (errno == EEXIST && state->force &&
diff --git a/symlinks.c b/symlinks.c
index 5a5e781a15d7d9cb60797958433eca896b31ec85..314ba4f273f9f776a130b5e5b48c5bda0ca9beed 100644
--- a/symlinks.c
+++ b/symlinks.c
@@ -1,64 +1,136 @@
#include "cache.h"
-struct pathname {
- int len;
- char path[PATH_MAX];
-};
+static char cache_path[PATH_MAX + 2];
+static int cache_len = 0;
+static unsigned int cache_flags = 0;
-/* Return matching pathname prefix length, or zero if not matching */
-static inline int match_pathname(int len, const char *name, struct pathname *match)
+static inline int greatest_match_lstat_cache(int len, const char *name)
{
- int match_len = match->len;
- return (len > match_len &&
- name[match_len] == '/' &&
- !memcmp(name, match->path, match_len)) ? match_len : 0;
-}
+ int max_len, match_len = 0, i = 0;
-static inline void set_pathname(int len, const char *name, struct pathname *match)
-{
- if (len < PATH_MAX) {
- match->len = len;
- memcpy(match->path, name, len);
- match->path[len] = 0;
+ max_len = len < cache_len ? len : cache_len;
+ while (i < max_len && name[i] == cache_path[i]) {
+ if (name[i] == '/') match_len = i;
+ i++;
}
+ if (i == cache_len && len > cache_len && name[cache_len] == '/')
+ match_len = cache_len;
+ else if (i == len && len < cache_len && cache_path[len] == '/')
+ match_len = len;
+ else if (i == len && i == cache_len)
+ match_len = len;
+ return match_len;
}
-int has_symlink_leading_path(int len, const char *name)
+/*
+ * Check if name 'name' of length 'len' has a symlink leading
+ * component, or if the directory exists and is real, or not.
+ *
+ * To speed up the check, some information is allowed to be cached.
+ * This is can be indicated by the 'track_flags' argument, which also
+ * can be used to indicate that we should always check the full path.
+ *
+ * The 'prefix_len_stat_func' parameter can be used to set the length
+ * of the prefix, where the cache should use the stat() function
+ * instead of the lstat() function to test each path component.
+ */
+unsigned int lstat_cache(int len, const char *name,
+ unsigned int track_flags, int prefix_len_stat_func)
{
- static struct pathname link, nonlink;
- char path[PATH_MAX];
+ int match_len, last_slash, last_slash_dir, max_len, ret;
+ unsigned int match_flags, ret_flags, save_flags;
struct stat st;
- char *sp;
- int known_dir;
- /*
- * See if the last known symlink cache matches.
+ /* Check if match from the cache for 2 "excluding" path types.
+ */
+ match_len = last_slash = greatest_match_lstat_cache(len, name);
+ match_flags = cache_flags & track_flags & (LSTAT_NOENT|LSTAT_SYMLINK);
+ if (match_flags && match_len == cache_len)
+ return match_flags;
+
+ /* If 'name' is a substring of the cache on a path component
+ * basis, and a directory is cached, we return immediately.
*/
- if (match_pathname(len, name, &link))
- return 1;
+ match_flags = cache_flags & track_flags & LSTAT_DIR;
+ if (match_flags && match_len == len)
+ return match_flags;
- /*
- * Get rid of the last known directory part
+ /* Okay, no match from the cache so far, so now we have to
+ * check the rest of the path components.
*/
- known_dir = match_pathname(len, name, &nonlink);
+ ret_flags = LSTAT_DIR;
+ last_slash_dir = last_slash;
+ max_len = len < PATH_MAX ? len : PATH_MAX;
+ while (match_len < max_len) {
+ do {
+ cache_path[match_len] = name[match_len];
+ match_len++;
+ } while (match_len < max_len && name[match_len] != '/');
+ if (match_len >= max_len && !(track_flags & LSTAT_FULLPATH))
+ break;
+ last_slash = match_len;
+ cache_path[last_slash] = '\0';
- while ((sp = strchr(name + known_dir + 1, '/')) != NULL) {
- int thislen = sp - name ;
- memcpy(path, name, thislen);
- path[thislen] = 0;
+ if (last_slash <= prefix_len_stat_func)
+ ret = stat(cache_path, &st);
+ else
+ ret = lstat(cache_path, &st);
- if (lstat(path, &st))
- return 0;
- if (S_ISDIR(st.st_mode)) {
- set_pathname(thislen, path, &nonlink);
- known_dir = thislen;
+ if (ret) {
+ ret_flags = LSTAT_LSTATERR;
+ if (errno == ENOENT)
+ ret_flags |= LSTAT_NOENT;
+ } else if (S_ISDIR(st.st_mode)) {
+ last_slash_dir = last_slash;
continue;
- }
- if (S_ISLNK(st.st_mode)) {
- set_pathname(thislen, path, &link);
- return 1;
+ } else if (S_ISLNK(st.st_mode)) {
+ ret_flags = LSTAT_SYMLINK;
+ } else if (S_ISREG(st.st_mode)) {
+ ret_flags = LSTAT_REG;
+ } else {
+ ret_flags = LSTAT_ERR;
}
break;
}
- return 0;
+
+ /* At the end update the cache. Note that max 3 different
+ * path types, LSTAT_NOENT, LSTAT_SYMLINK and LSTAT_DIR, can
+ * be cached for the moment!
+ */
+ save_flags = ret_flags & track_flags & (LSTAT_NOENT|LSTAT_SYMLINK);
+ if (save_flags && last_slash > 0 && last_slash <= PATH_MAX) {
+ cache_path[last_slash] = '\0';
+ cache_len = last_slash;
+ cache_flags = save_flags;
+ } else if (track_flags & LSTAT_DIR &&
+ last_slash_dir > 0 && last_slash_dir <= PATH_MAX) {
+ /* We have separate test for the directory case, since
+ * it could be that we have found a symlink or none
+ * existing directory, and the track_flags says that
+ * we can not cache this fact, and the cache would
+ * then have been empty in this case.
+ *
+ * But, if we is allowed to track real directories, we
+ * can still cache the path components before the last
+ * one (the found symlink or none existing component).
+ */
+ cache_path[last_slash_dir] = '\0';
+ cache_len = last_slash_dir;
+ cache_flags = LSTAT_DIR;
+ } else {
+ clear_lstat_cache();
+ }
+ return ret_flags;
+}
+
+/*
+ * Before usage of the check_lstat_cache() function one should call
+ * clear_lstat_cache() (at an appropriate place) to make sure that the
+ * cache is clean.
+ */
+void clear_lstat_cache(void)
+{
+ cache_path[0] = '\0';
+ cache_len = 0;
+ cache_flags = 0;
}
diff --git a/unpack-trees.c b/unpack-trees.c
index 54f301da67be879c80426bc21776427fdd38c02e..c3d14294aaaefb9e40c99f174b4f5f41e5fad635 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -61,7 +61,7 @@ static void unlink_entry(struct cache_entry *ce)
char *cp, *prev;
char *name = ce->name;
- if (has_symlink_leading_path(ce_namelen(ce), ce->name))
+ if (has_symlink_or_noent_leading_path(ce_namelen(ce), ce->name))
return;
if (unlink(name))
return;
@@ -105,6 +105,7 @@ static int check_updates(struct unpack_trees_options *o)
cnt = 0;
}
+ clear_symlink_or_noent_cache();
for (i = 0; i < index->cache_nr; i++) {
struct cache_entry *ce = index->cache[i];
@@ -118,6 +119,7 @@ static int check_updates(struct unpack_trees_options *o)
}
}
+ clear_lstat_cache();
for (i = 0; i < index->cache_nr; i++) {
struct cache_entry *ce = index->cache[i];
@@ -584,7 +586,7 @@ static int verify_absent(struct cache_entry *ce, const char *action,
if (o->index_only || o->reset || !o->update)
return 0;
- if (has_symlink_leading_path(ce_namelen(ce), ce->name))
+ if (has_symlink_or_noent_leading_path(ce_namelen(ce), ce->name))
return 0;
if (!lstat(ce->name, &st)) {
--
1.6.1.rc1.49.g7f705
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH/RFC v5 1/1] more cache effective symlink/directory detection
2009-01-09 19:05 ` [PATCH/RFC v5 1/1] more cache effective symlink/directory detection Kjetil Barvik
@ 2009-01-10 10:11 ` René Scharfe
2009-01-11 0:48 ` Junio C Hamano
0 siblings, 1 reply; 5+ messages in thread
From: René Scharfe @ 2009-01-10 10:11 UTC (permalink / raw)
To: Kjetil Barvik; +Cc: git, Pete Harlan, Linus Torvalds, Junio C Hamano
Kjetil Barvik schrieb:
> - Also introduce a 'void clear_lstat_cache(void)' function, which
> should be used to clean the cache before usage. If for instance,
> you have changed the types of directories which should be cached,
> the cache could contain a path which was not wanted.
Is it possible to make the cache detect these situations automatically
by saving track_flags along with the cache contents? Not having to
clear the cache manually would be a major feature.
> --- a/cache.h
> +++ b/cache.h
> @@ -719,7 +719,29 @@ struct checkout {
> };
>
> extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath);
> -extern int has_symlink_leading_path(int len, const char *name);
> +
> +#define LSTAT_REG (1u << 0)
> +#define LSTAT_DIR (1u << 1)
> +#define LSTAT_NOENT (1u << 2)
> +#define LSTAT_SYMLINK (1u << 3)
> +#define LSTAT_LSTATERR (1u << 4)
> +#define LSTAT_ERR (1u << 5)
> +#define LSTAT_FULLPATH (1u << 6)
> +extern unsigned int lstat_cache(int len, const char *name,
> + unsigned int track_flags, int prefix_len_stat_func);
> +extern void clear_lstat_cache(void);
> +static inline unsigned int has_symlink_leading_path(int len, const char *name)
> +{
> + return lstat_cache(len, name, LSTAT_SYMLINK|LSTAT_DIR, -1) &
> + LSTAT_SYMLINK;
> +}
> +#define clear_symlink_cache() clear_lstat_cache()
> +static inline unsigned int has_symlink_or_noent_leading_path(int len, const char *name)
> +{
> + return lstat_cache(len, name, LSTAT_SYMLINK|LSTAT_NOENT|LSTAT_DIR, -1) &
> + (LSTAT_SYMLINK|LSTAT_NOENT);
> +}
> +#define clear_symlink_or_noent_cache() clear_lstat_cache()
What's the advantage of inlining the wrappers (expressed in units of
space and/or time)? The interface would be much nicer if you exported
the wrappers, only, and not all those constants along with them.
And why define aliases for clear_lstat_cache()?
> diff --git a/entry.c b/entry.c
> index aa2ee46a84033585d8e07a585610c5a697af82c2..293400cf5be63fd66b797a68e17bf953c600fe99 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -8,35 +8,28 @@ static void create_directories(const char *path, const struct checkout *state)
> const char *slash = path;
>
> while ((slash = strchr(slash+1, '/')) != NULL) {
> - struct stat st;
> - int stat_status;
> -
> len = slash - path;
> memcpy(buf, path, len);
> buf[len] = 0;
>
> - if (len <= state->base_dir_len)
> - /*
> - * checkout-index --prefix=<dir>; <dir> is
> - * allowed to be a symlink to an existing
> - * directory.
> - */
> - stat_status = stat(buf, &st);
> - else
> - /*
> - * if there currently is a symlink, we would
> - * want to replace it with a real directory.
> - */
> - stat_status = lstat(buf, &st);
> -
> - if (!stat_status && S_ISDIR(st.st_mode))
> + /* For 'checkout-index --prefix=<dir>', <dir> is
> + * allowed to be a symlink to an existing directory,
> + * therefore we must give 'state->base_dir_len' to the
> + * cache, such that we test path components of the
> + * prefix with stat() instead of lstat()
> + *
> + * We must also tell the cache to test the complete
> + * length of the buffer (the '|LSTAT_FULLPATH' part).
> + */
> + if (lstat_cache(len, buf, LSTAT_DIR|LSTAT_FULLPATH,
> + state->base_dir_len) &
> + LSTAT_DIR)
> continue; /* ok, it is already a directory. */
I'd say this usage is worth another wrapper.
Also, it's probably worth to split this patch up again. First switching
to your improved implementation of has_symlink_leading_path(), then
introducing has_symlink_or_noent_leading_path() and finally adding
LSTAT_FULLPATH and the fourth parameter of lstat_cache() etc. and using
this feature in entry.c seems like a nice incremental progression.
René
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH/RFC v5 1/1] more cache effective symlink/directory detection
2009-01-10 10:11 ` René Scharfe
@ 2009-01-11 0:48 ` Junio C Hamano
2009-01-11 8:26 ` Kjetil Barvik
0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2009-01-11 0:48 UTC (permalink / raw)
To: René Scharfe; +Cc: Kjetil Barvik, git, Pete Harlan, Linus Torvalds
René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:
> Kjetil Barvik schrieb:
>> - Also introduce a 'void clear_lstat_cache(void)' function, which
>> should be used to clean the cache before usage. If for instance,
>> you have changed the types of directories which should be cached,
>> the cache could contain a path which was not wanted.
>
> Is it possible to make the cache detect these situations automatically
> by saving track_flags along with the cache contents? Not having to
> clear the cache manually would be a major feature.
> Also, it's probably worth to split this patch up again. First switching
> to your improved implementation of has_symlink_leading_path(), then
> introducing has_symlink_or_noent_leading_path() and finally adding
> LSTAT_FULLPATH and the fourth parameter of lstat_cache() etc. and using
> this feature in entry.c seems like a nice incremental progression.
Both are reasonable suggestions. Thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH/RFC v5 1/1] more cache effective symlink/directory detection
2009-01-11 0:48 ` Junio C Hamano
@ 2009-01-11 8:26 ` Kjetil Barvik
0 siblings, 0 replies; 5+ messages in thread
From: Kjetil Barvik @ 2009-01-11 8:26 UTC (permalink / raw)
To: Junio C Hamano; +Cc: René Scharfe, git
Junio C Hamano <gitster@pobox.com> writes:
> René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:
>
>> Kjetil Barvik schrieb:
>>> - Also introduce a 'void clear_lstat_cache(void)' function, which
>>> should be used to clean the cache before usage. If for instance,
>>> you have changed the types of directories which should be cached,
>>> the cache could contain a path which was not wanted.
>>
>> Is it possible to make the cache detect these situations automatically
>> by saving track_flags along with the cache contents? Not having to
>> clear the cache manually would be a major feature.
>
>> Also, it's probably worth to split this patch up again. First switching
>> to your improved implementation of has_symlink_leading_path(), then
>> introducing has_symlink_or_noent_leading_path() and finally adding
>> LSTAT_FULLPATH and the fourth parameter of lstat_cache() etc. and using
>> this feature in entry.c seems like a nice incremental progression.
>
> Both are reasonable suggestions. Thanks.
Ok! Thanks for comments! Version 6 will follow shortly!
-- kjetil
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-01-11 8:27 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-09 19:05 [PATCH/RFC v5 0/1] git checkout: optimise away lots of lstat() calls Kjetil Barvik
2009-01-09 19:05 ` [PATCH/RFC v5 1/1] more cache effective symlink/directory detection Kjetil Barvik
2009-01-10 10:11 ` René Scharfe
2009-01-11 0:48 ` Junio C Hamano
2009-01-11 8:26 ` Kjetil Barvik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).