* [PATCH/RFC v6 0/3] git checkout: optimise away lots of lstat() calls
@ 2009-01-11 13:28 Kjetil Barvik
2009-01-11 13:29 ` [PATCH/RFC v6 1/3] lstat_cache(): more cache effective symlink/directory detection Kjetil Barvik
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Kjetil Barvik @ 2009-01-11 13:28 UTC (permalink / raw)
To: git; +Cc: Ren� Scharfe, Linus Torvalds, Junio C Hamano,
Kjetil Barvik
Changes since version 5:
- After comments from René Scharfe, the usage of the lstat_stat()
function inside entry.c has got it's own wrapper,
has_dirs_only_path().
- All wrapper functions inside cache.h is moved inside symlink.c, and
updated to be normal C extern functions (not inline functions).
Thanks, René!
- The clear_lstat_cache() function is deleted, and instead we do an
safeguard if-test to detect changes to track_flags or
prefix_len_stat_func.
- Introduce a 'static inline reset_lstat_cache()' function, and make
an struct of all the cache variables.
- The patch is split up into 3 parts again.
That's all changes this time! Then some questions:
* Inside commit message for commit c40641b77b0274186fd1b327d5dc3246f814aaaf
* ("Optimize symlink/directory detection") Linus Torvalds writes
| [...]
| This can - and should - probably be extended upon so that we
| eventually never do a bare 'lstat()' on any path entries at *all*
| when checking the index, but always check the full path
| carefully. Right now we do not generally check the whole path for
| all our normal quick index revalidation.
I am not quite sure what Linus is thinking of here, and in
particular what an (eventually) extended interface should look like.
I have been thinking of maybe introduce a "git_lstat()" like wrapper
with the same arguments as the lstat() function, and then when
possible, retrieve the contents of 'struct stat' from the cache.
Would this sort of "git_lstat()" like wrapper be useful?
I think it then could be used to simplify the first 7 lines of the
check_removed() function inside diff-lib.c, to in some cases be able
to do 1 instead of 2 calls to lstat(), and maybe similar things at
other places?
Should the wrapper return an error if it is able to discover an
symlink somewhere in the path, like has_symlink_leading_path()
already does today?
But, unless someone strongly ask for this "git_lstat()" like wrapper
to be included inside this patch series, I guess this should be
future work.
I hope that the now 3 patches is at least one step in the right
direction!
| We should also make sure that we're careful about all the
| invalidation, ie when we remove a link and replace it by a directory
| we should invalidate the symlink cache if it matches (and vice versa
| for the directory cache).
This is an argument for the clear_lstat_cache() function, I guess.
-----
I have just started to clone some interesting Linux git trees to watch
the development more closely, and therefore also started to use git. I
noticed that 'git checkout' takes some time, and especially that the
'git checkout' command does lots and lots of lstat() calls.
After some more investigation and thinking, I have made 3 patches and
been able to optimise away over 40% of all lstat() calls in some cases
for the 'git checkout' command. Also, if you use a large path to the
'--prefix' argument to the 'git checkout-index' command, and you have
lots of files, the savings can be really huge!
The 3 patches is against git master, and the git 'make test' test
suite still passes after each patch. To document the improvement,
below is some numbers, which compares before and after the 3
patches. To reproduce the numbers:
- git clone the Linux git tree to be able to get the Linux tags
'v2.6.25' and 'v2.6.27'.
- git checkout -b my-v2.6.27 v2.6.27
- git checkout -b my-v2.6.25 v2.6.25
Then, when the current branch is 'my-v2.6.25', do:
strace -o strace_to27 -T git checkout -q my-v2.6.27
And then pretty print the 'strace_to27' file. Below is the numbers
from the current git version (before the 3 patches). Notice that we
do an lstat() call on the "arch" directory over 6000 times!
TOTAL 185151 100.000% OK:165544 NOT: 19607 11.136001 sec 60 usec/call
lstat64 120954 65.327% OK:107013 NOT: 13941 5.388727 sec 45 usec/call
strings 120954 tot 30163 uniq 4.010 /uniq 5.388727 sec 45 usec/call
files 61491 tot 28712 uniq 2.142 /uniq 2.740520 sec 45 usec/call
dirs 45522 tot 1436 uniq 31.701 /uniq 1.994448 sec 44 usec/call
errors 13941 tot 5189 uniq 2.687 /uniq 0.653759 sec 47 usec/call
6297 5.206% OK: 6297 NOT: 0 "arch"
4544 3.757% OK: 4544 NOT: 0 "drivers"
1816 1.501% OK: 1816 NOT: 0 "arch/arm"
1499 1.239% OK: 1499 NOT: 0 "include"
912 0.754% OK: 912 NOT: 0 "arch/powerpc"
764 0.632% OK: 764 NOT: 0 "fs"
746 0.617% OK: 746 NOT: 0 "drivers/net"
662 0.547% OK: 662 NOT: 0 "net"
652 0.539% OK: 325 NOT: 327 "arch/sparc/include"
636 0.526% OK: 636 NOT: 0 "drivers/media"
606 0.501% OK: 606 NOT: 0 "include/linux"
533 0.441% OK: 533 NOT: 0 "arch/sh"
522 0.432% OK: 260 NOT: 262 "arch/powerpc/include"
488 0.403% OK: 243 NOT: 245 "arch/sh/include"
413 0.341% OK: 413 NOT: 0 "arch/sparc"
390 0.322% OK: 390 NOT: 0 "arch/x86"
383 0.317% OK: 383 NOT: 0 "Documentation"
370 0.306% OK: 184 NOT: 186 "arch/ia64/include"
366 0.303% OK: 366 NOT: 0 "drivers/media/video"
348 0.288% OK: 173 NOT: 175 "arch/arm/include"
Here is the numbers after applying the 3 patches. Notice how nice the
top 20 entries list now looks!
TOTAL 134155 100.000% OK:122102 NOT: 12053 11.069389 sec 83 usec/call
lstat64 69876 52.086% OK: 63491 NOT: 6385 3.410007 sec 49 usec/call
strings 69876 tot 30163 uniq 2.317 /uniq 3.410007 sec 49 usec/call
files 61491 tot 28712 uniq 2.142 /uniq 3.023238 sec 49 usec/call
dirs 2000 tot 1436 uniq 1.393 /uniq 0.085953 sec 43 usec/call
errors 6385 tot 5189 uniq 1.230 /uniq 0.300816 sec 47 usec/call
4 0.006% OK: 4 NOT: 0 ".gitignore"
4 0.006% OK: 4 NOT: 0 ".mailmap"
4 0.006% OK: 4 NOT: 0 "CREDITS"
4 0.006% OK: 4 NOT: 0 "Documentation/00-INDEX"
4 0.006% OK: 4 NOT: 0 "Documentation/ABI/testing/sysfs-block"
4 0.006% OK: 4 NOT: 0 "Documentation/ABI/testing/sysfs-firmware-acpi"
4 0.006% OK: 4 NOT: 0 "Documentation/CodingStyle"
4 0.006% OK: 4 NOT: 0 "Documentation/DMA-API.txt"
4 0.006% OK: 4 NOT: 0 "Documentation/DMA-mapping.txt"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/Makefile"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/gadget.tmpl"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/kernel-api.tmpl"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/kernel-locking.tmpl"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/procfs-guide.tmpl"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/procfs_example.c"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/rapidio.tmpl"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/s390-drivers.tmpl"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/uio-howto.tmpl"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/videobook.tmpl"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/writing_usb_driver.tmpl"
Comments?
Kjetil Barvik (3):
lstat_cache(): more cache effective symlink/directory detection
lstat_cache(): introduce has_symlink_or_noent_leading_path() function
lstat_cache(): introduce has_dirs_only_path() function
cache.h | 2 +
entry.c | 34 +++------
symlinks.c | 210 ++++++++++++++++++++++++++++++++++++++++++++-----------
unpack-trees.c | 4 +-
4 files changed, 183 insertions(+), 67 deletions(-)
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH/RFC v6 1/3] lstat_cache(): more cache effective symlink/directory detection
2009-01-11 13:28 [PATCH/RFC v6 0/3] git checkout: optimise away lots of lstat() calls Kjetil Barvik
@ 2009-01-11 13:29 ` Kjetil Barvik
2009-01-12 4:05 ` Junio C Hamano
2009-01-11 13:29 ` [PATCH/RFC v6 2/3] lstat_cache(): introduce has_symlink_or_noent_leading_path() function Kjetil Barvik
2009-01-11 13:29 ` [PATCH/RFC v6 3/3] lstat_cache(): introduce has_dirs_only_path() function Kjetil Barvik
2 siblings, 1 reply; 5+ messages in thread
From: Kjetil Barvik @ 2009-01-11 13:29 UTC (permalink / raw)
To: git; +Cc: Ren� Scharfe, Linus Torvalds, Junio C Hamano,
Kjetil Barvik
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 7311 bytes --]
Make the cache functionality more effective. Previously when A/B/C/D
was in the cache and A/B/C/E/file.c was called for, there was no match
at all from the cache. Now we use the fact that the paths "A", "A/B"
and "A/B/C" are already tested, and we only need to do an lstat() call
on "A/B/C/E".
We only cache/store the last path regardless of its type. Since the
cache functionality is always used with alphabetically sorted names
(at least it seems so for me), there is no need to store both the last
symlink-leading path and the last real-directory path. Note that if
the cache is not called with (mostly) alphabetically sorted names,
neither the old, nor this new one, would be very effective.
Previously, when symlink A/B/C/S was cached/stored in the symlink-
leading path, and A/B/C/file.c was called for, it was not easy to use
the fact that we already knew that the paths "A", "A/B" and "A/B/C"
are real directories.
Avoid copying the first path components of the name 2 zillion times
when we test new path components. Since we always cache/store the
last path, we can copy each component as we test those directly into
the cache. Previously we ended up doing a memcpy() for the full
path/name right before each lstat() call, and when updating the cache
for each time we have tested a new path component.
We also use less memory, that is, PATH_MAX bytes less memory on the
stack and PATH_MAX bytes less memory on the heap.
Thanks to Junio C Hamano, Linus Torvalds and René Scharfe for valuable
comments to this patch!
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
:100644 100644 5a5e781... 1061072... M symlinks.c
symlinks.c | 152 ++++++++++++++++++++++++++++++++++++++++++++----------------
1 files changed, 112 insertions(+), 40 deletions(-)
diff --git a/symlinks.c b/symlinks.c
index 5a5e781a15d7d9cb60797958433eca896b31ec85..1061072e9b18b12ca22c643bf9a3ea977eeb9916 100644
--- a/symlinks.c
+++ b/symlinks.c
@@ -1,64 +1,136 @@
#include "cache.h"
-struct pathname {
- int len;
+static struct cache_def {
char path[PATH_MAX];
-};
+ int len;
+ int flags;
+} cache;
-/* Return matching pathname prefix length, or zero if not matching */
-static inline int match_pathname(int len, const char *name, struct pathname *match)
+static inline int greatest_match_lstat_cache(int len, const char *name)
{
- int match_len = match->len;
- return (len > match_len &&
- name[match_len] == '/' &&
- !memcmp(name, match->path, match_len)) ? match_len : 0;
+ int max_len, match_len = 0, i = 0;
+
+ max_len = len < cache.len ? len : cache.len;
+ while (i < max_len && name[i] == cache.path[i]) {
+ if (name[i] == '/') match_len = i;
+ i++;
+ }
+ if (i == cache.len && len > cache.len && name[cache.len] == '/')
+ match_len = cache.len;
+ else if (i == len && len < cache.len && cache.path[len] == '/')
+ match_len = len;
+ else if (i == len && i == cache.len)
+ match_len = len;
+ return match_len;
}
-static inline void set_pathname(int len, const char *name, struct pathname *match)
+static inline void reset_lstat_cache(void)
{
- if (len < PATH_MAX) {
- match->len = len;
- memcpy(match->path, name, len);
- match->path[len] = 0;
- }
+ cache.path[0] = '\0';
+ cache.len = 0;
+ cache.flags = 0;
}
-int has_symlink_leading_path(int len, const char *name)
+#define FL_DIR (1 << 0)
+#define FL_SYMLINK (1 << 1)
+#define FL_LSTATERR (1 << 2)
+#define FL_ERR (1 << 3)
+
+/*
+ * Check if name 'name' of length 'len' has a symlink leading
+ * component, or if the directory exists and is real.
+ *
+ * To speed up the check, some information is allowed to be cached.
+ * This is can be indicated by the 'track_flags' argument.
+ */
+static int lstat_cache(int len, const char *name,
+ int track_flags)
{
- static struct pathname link, nonlink;
- char path[PATH_MAX];
+ int match_len, last_slash, last_slash_dir, max_len;
+ int match_flags, ret_flags, save_flags;
struct stat st;
- char *sp;
- int known_dir;
/*
- * See if the last known symlink cache matches.
+ * Check to see if we have a match from the cache for the
+ * symlink path type.
*/
- if (match_pathname(len, name, &link))
- return 1;
-
+ match_len = last_slash = greatest_match_lstat_cache(len, name);
+ match_flags = cache.flags & track_flags & FL_SYMLINK;
+ if (match_flags && match_len == cache.len)
+ return match_flags;
/*
- * Get rid of the last known directory part
+ * If 'name' is a substring of the cache on a path component
+ * basis, and a directory is cached, we can return
+ * immediately.
*/
- known_dir = match_pathname(len, name, &nonlink);
+ match_flags = cache.flags & track_flags & FL_DIR;
+ if (match_flags && match_len == len)
+ return match_flags;
- while ((sp = strchr(name + known_dir + 1, '/')) != NULL) {
- int thislen = sp - name ;
- memcpy(path, name, thislen);
- path[thislen] = 0;
+ /* Okay, no match from the cache so far, so now we have to
+ * check the rest of the path components.
+ */
+ ret_flags = FL_DIR;
+ last_slash_dir = last_slash;
+ max_len = len < PATH_MAX ? len : PATH_MAX;
+ while (match_len < max_len) {
+ do {
+ cache.path[match_len] = name[match_len];
+ match_len++;
+ } while (match_len < max_len && name[match_len] != '/');
+ if (match_len >= max_len)
+ break;
+ last_slash = match_len;
+ cache.path[last_slash] = '\0';
- if (lstat(path, &st))
- return 0;
- if (S_ISDIR(st.st_mode)) {
- set_pathname(thislen, path, &nonlink);
- known_dir = thislen;
+ if (lstat(cache.path, &st)) {
+ ret_flags = FL_LSTATERR;
+ } else if (S_ISDIR(st.st_mode)) {
+ last_slash_dir = last_slash;
continue;
- }
- if (S_ISLNK(st.st_mode)) {
- set_pathname(thislen, path, &link);
- return 1;
+ } else if (S_ISLNK(st.st_mode)) {
+ ret_flags = FL_SYMLINK;
+ } else {
+ ret_flags = FL_ERR;
}
break;
}
- return 0;
+
+ /* At the end update the cache. Note that max 2 different
+ * path types, FL_SYMLINK and FL_DIR, can be cached for the
+ * moment!
+ */
+ save_flags = ret_flags & track_flags & FL_SYMLINK;
+ if (save_flags && last_slash > 0 && last_slash <= PATH_MAX) {
+ cache.path[last_slash] = '\0';
+ cache.len = last_slash;
+ cache.flags = save_flags;
+ } else if (track_flags & FL_DIR &&
+ last_slash_dir > 0 && last_slash_dir <= PATH_MAX) {
+ /* We have a separate test for the directory case,
+ * since it could be that we have found a symlink and
+ * the track_flags says that we can not cache this
+ * fact, so the cache would then have been left empty
+ * in this case.
+ *
+ * But, if we is allowed to track real directories, we
+ * can still cache the path components before the last
+ * one (the found symlink component).
+ */
+ cache.path[last_slash_dir] = '\0';
+ cache.len = last_slash_dir;
+ cache.flags = FL_DIR;
+ } else {
+ reset_lstat_cache();
+ }
+ return ret_flags;
+}
+
+/* Return non-zero if path 'name' has a leading symlink component.
+ */
+int has_symlink_leading_path(int len, const char *name)
+{
+ return lstat_cache(len, name,
+ FL_SYMLINK|FL_DIR) &
+ FL_SYMLINK;
}
--
1.6.1.rc1.49.g7f705
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH/RFC v6 2/3] lstat_cache(): introduce has_symlink_or_noent_leading_path() function
2009-01-11 13:28 [PATCH/RFC v6 0/3] git checkout: optimise away lots of lstat() calls Kjetil Barvik
2009-01-11 13:29 ` [PATCH/RFC v6 1/3] lstat_cache(): more cache effective symlink/directory detection Kjetil Barvik
@ 2009-01-11 13:29 ` Kjetil Barvik
2009-01-11 13:29 ` [PATCH/RFC v6 3/3] lstat_cache(): introduce has_dirs_only_path() function Kjetil Barvik
2 siblings, 0 replies; 5+ messages in thread
From: Kjetil Barvik @ 2009-01-11 13:29 UTC (permalink / raw)
To: git; +Cc: Ren� Scharfe, Linus Torvalds, Junio C Hamano,
Kjetil Barvik
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 7998 bytes --]
In some cases, especially inside the unpack-trees.c file, and inside
the verify_absent() function, we can avoid some unnecessary calls to
lstat(), if the lstat_cache() function can also can be told to keep
track of none existing directories.
So we update the lstat_cache() function to handle this fact new fact,
introduce a new wrapper function, and the result is that we save lots
of lstat() calls for a removed directory which previously contained
lots of files, when we call this new wrapper of lstat_cache() instead
of the old one.
We do similar changes inside the unlink_entry() function, since if we
can already say that the leading directory component of a pathname
does not exist, it is not necessary to try to remove a pathname below
it!
Thanks to Junio C Hamano, Linus Torvalds and René Scharfe for valuable
comments to this patch!
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
:100644 100644 231c06d... 11181aa... M cache.h
:100644 100644 1061072... 0004e97... M symlinks.c
:100644 100644 54f301d... a3fd383... M unpack-trees.c
cache.h | 1 +
symlinks.c | 87 ++++++++++++++++++++++++++++++++++++--------------------
unpack-trees.c | 4 +-
3 files changed, 59 insertions(+), 33 deletions(-)
diff --git a/cache.h b/cache.h
index 231c06d7726b575f6e522d5b0c0fe43557e8c651..11181aa0079ce94bfbdb2bba77205f49aa3cbcb3 100644
--- a/cache.h
+++ b/cache.h
@@ -720,6 +720,7 @@ struct checkout {
extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath);
extern int has_symlink_leading_path(int len, const char *name);
+extern int has_symlink_or_noent_leading_path(int len, const char *name);
extern struct alternate_object_database {
struct alternate_object_database *next;
diff --git a/symlinks.c b/symlinks.c
index 1061072e9b18b12ca22c643bf9a3ea977eeb9916..0004e97d2547467564b146232f40cba4f5b04a3e 100644
--- a/symlinks.c
+++ b/symlinks.c
@@ -4,6 +4,7 @@ static struct cache_def {
char path[PATH_MAX];
int len;
int flags;
+ int track_flags;
} cache;
static inline int greatest_match_lstat_cache(int len, const char *name)
@@ -24,21 +25,23 @@ static inline int greatest_match_lstat_cache(int len, const char *name)
return match_len;
}
-static inline void reset_lstat_cache(void)
+static inline void reset_lstat_cache(int track_flags)
{
cache.path[0] = '\0';
cache.len = 0;
cache.flags = 0;
+ cache.track_flags = track_flags;
}
#define FL_DIR (1 << 0)
-#define FL_SYMLINK (1 << 1)
-#define FL_LSTATERR (1 << 2)
-#define FL_ERR (1 << 3)
+#define FL_NOENT (1 << 1)
+#define FL_SYMLINK (1 << 2)
+#define FL_LSTATERR (1 << 3)
+#define FL_ERR (1 << 4)
/*
* Check if name 'name' of length 'len' has a symlink leading
- * component, or if the directory exists and is real.
+ * component, or if the directory exists and is real, or not.
*
* To speed up the check, some information is allowed to be cached.
* This is can be indicated by the 'track_flags' argument.
@@ -50,22 +53,32 @@ static int lstat_cache(int len, const char *name,
int match_flags, ret_flags, save_flags;
struct stat st;
- /*
- * Check to see if we have a match from the cache for the
- * symlink path type.
- */
- match_len = last_slash = greatest_match_lstat_cache(len, name);
- match_flags = cache.flags & track_flags & FL_SYMLINK;
- if (match_flags && match_len == cache.len)
- return match_flags;
- /*
- * If 'name' is a substring of the cache on a path component
- * basis, and a directory is cached, we can return
- * immediately.
- */
- match_flags = cache.flags & track_flags & FL_DIR;
- if (match_flags && match_len == len)
- return match_flags;
+ if (cache.track_flags != track_flags) {
+ /*
+ * As a safeguard we clear the cache if the value of
+ * track_flags does not match with the last supplied
+ * value.
+ */
+ reset_lstat_cache(track_flags);
+ match_len = last_slash = 0;
+ } else {
+ /*
+ * Check to see if we have a match from the cache for
+ * the 2 "excluding" path types.
+ */
+ match_len = last_slash = greatest_match_lstat_cache(len, name);
+ match_flags = cache.flags & track_flags & (FL_NOENT|FL_SYMLINK);
+ if (match_flags && match_len == cache.len)
+ return match_flags;
+ /*
+ * If 'name' is a substring of the cache on a path
+ * component basis, and a directory is cached, we
+ * can return immediately.
+ */
+ match_flags = cache.flags & track_flags & FL_DIR;
+ if (match_flags && match_len == len)
+ return match_flags;
+ }
/* Okay, no match from the cache so far, so now we have to
* check the rest of the path components.
@@ -85,6 +98,8 @@ static int lstat_cache(int len, const char *name,
if (lstat(cache.path, &st)) {
ret_flags = FL_LSTATERR;
+ if (errno == ENOENT)
+ ret_flags |= FL_NOENT;
} else if (S_ISDIR(st.st_mode)) {
last_slash_dir = last_slash;
continue;
@@ -96,11 +111,11 @@ static int lstat_cache(int len, const char *name,
break;
}
- /* At the end update the cache. Note that max 2 different
- * path types, FL_SYMLINK and FL_DIR, can be cached for the
- * moment!
+ /* At the end update the cache. Note that max 3 different
+ * path types, FL_NOENT, FL_SYMLINK and FL_DIR, can be cached
+ * for the moment!
*/
- save_flags = ret_flags & track_flags & FL_SYMLINK;
+ save_flags = ret_flags & track_flags & (FL_NOENT|FL_SYMLINK);
if (save_flags && last_slash > 0 && last_slash <= PATH_MAX) {
cache.path[last_slash] = '\0';
cache.len = last_slash;
@@ -108,20 +123,20 @@ static int lstat_cache(int len, const char *name,
} else if (track_flags & FL_DIR &&
last_slash_dir > 0 && last_slash_dir <= PATH_MAX) {
/* We have a separate test for the directory case,
- * since it could be that we have found a symlink and
- * the track_flags says that we can not cache this
- * fact, so the cache would then have been left empty
- * in this case.
+ * since it could be that we have found a symlink or a
+ * none existing directory and the track_flags says
+ * that we can not cache this fact, so the cache would
+ * then have been left empty in this case.
*
* But, if we is allowed to track real directories, we
* can still cache the path components before the last
- * one (the found symlink component).
+ * one (the found symlink or none existing component).
*/
cache.path[last_slash_dir] = '\0';
cache.len = last_slash_dir;
cache.flags = FL_DIR;
} else {
- reset_lstat_cache();
+ reset_lstat_cache(track_flags);
}
return ret_flags;
}
@@ -134,3 +149,13 @@ int has_symlink_leading_path(int len, const char *name)
FL_SYMLINK|FL_DIR) &
FL_SYMLINK;
}
+
+/* Return non-zero if path 'name' has a leading symlink component or
+ * if some leading path component does not exists.
+ */
+int has_symlink_or_noent_leading_path(int len, const char *name)
+{
+ return lstat_cache(len, name,
+ FL_SYMLINK|FL_NOENT|FL_DIR) &
+ (FL_SYMLINK|FL_NOENT);
+}
diff --git a/unpack-trees.c b/unpack-trees.c
index 54f301da67be879c80426bc21776427fdd38c02e..a3fd383afbe951fea8dbe4378cbe489657843c4a 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -61,7 +61,7 @@ static void unlink_entry(struct cache_entry *ce)
char *cp, *prev;
char *name = ce->name;
- if (has_symlink_leading_path(ce_namelen(ce), ce->name))
+ if (has_symlink_or_noent_leading_path(ce_namelen(ce), ce->name))
return;
if (unlink(name))
return;
@@ -584,7 +584,7 @@ static int verify_absent(struct cache_entry *ce, const char *action,
if (o->index_only || o->reset || !o->update)
return 0;
- if (has_symlink_leading_path(ce_namelen(ce), ce->name))
+ if (has_symlink_or_noent_leading_path(ce_namelen(ce), ce->name))
return 0;
if (!lstat(ce->name, &st)) {
--
1.6.1.rc1.49.g7f705
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH/RFC v6 3/3] lstat_cache(): introduce has_dirs_only_path() function
2009-01-11 13:28 [PATCH/RFC v6 0/3] git checkout: optimise away lots of lstat() calls Kjetil Barvik
2009-01-11 13:29 ` [PATCH/RFC v6 1/3] lstat_cache(): more cache effective symlink/directory detection Kjetil Barvik
2009-01-11 13:29 ` [PATCH/RFC v6 2/3] lstat_cache(): introduce has_symlink_or_noent_leading_path() function Kjetil Barvik
@ 2009-01-11 13:29 ` Kjetil Barvik
2 siblings, 0 replies; 5+ messages in thread
From: Kjetil Barvik @ 2009-01-11 13:29 UTC (permalink / raw)
To: git; +Cc: Ren� Scharfe, Linus Torvalds, Junio C Hamano,
Kjetil Barvik
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 8518 bytes --]
Inside the create_directories() function inside entry.c we currently
do an stat() or lstat() call for each path component of the pathname
'path' each and every time called. For the 'git checkout' command,
this function is called on each file for which we must do an update
(ce->ce_flags & CE_UPDATE), so we get lots and lots of calls...
To fix this, we make a new wrapper to the lstat_cache() function, and
call the wrapper function instead of the calls to the stat() or the
lstat() functions. Since the paths given to the create_directories()
function, is sorted alphabetically, the new wrapper would be very
cache effective in this situation.
To support it we must update the lstat_cache() function to be able to
say that "pleace test the complete length of 'name'", and also to give
it the length of a prefix, where the cache should use the stat()
function instead of the lstat() function to test each path component.
Thanks to Junio C Hamano, Linus Torvalds and René Scharfe for valuable
comments to this patch!
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
:100644 100644 11181aa... 7c8c8e4... M cache.h
:100644 100644 aa2ee46... b0295bd... M entry.c
:100644 100644 0004e97... c92fac7... M symlinks.c
cache.h | 1 +
entry.c | 34 +++++++++++-----------------------
symlinks.c | 59 ++++++++++++++++++++++++++++++++++++++++++++---------------
3 files changed, 56 insertions(+), 38 deletions(-)
diff --git a/cache.h b/cache.h
index 11181aa0079ce94bfbdb2bba77205f49aa3cbcb3..7c8c8e484259a69bdc2c75e9e820247b0df04d01 100644
--- a/cache.h
+++ b/cache.h
@@ -721,6 +721,7 @@ struct checkout {
extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath);
extern int has_symlink_leading_path(int len, const char *name);
extern int has_symlink_or_noent_leading_path(int len, const char *name);
+extern int has_dirs_only_path(int len, const char *name, int prefix_len);
extern struct alternate_object_database {
struct alternate_object_database *next;
diff --git a/entry.c b/entry.c
index aa2ee46a84033585d8e07a585610c5a697af82c2..b0295bd5559d4a106d977da7a54b0cdd8399c208 100644
--- a/entry.c
+++ b/entry.c
@@ -8,35 +8,23 @@ static void create_directories(const char *path, const struct checkout *state)
const char *slash = path;
while ((slash = strchr(slash+1, '/')) != NULL) {
- struct stat st;
- int stat_status;
-
len = slash - path;
memcpy(buf, path, len);
buf[len] = 0;
- if (len <= state->base_dir_len)
- /*
- * checkout-index --prefix=<dir>; <dir> is
- * allowed to be a symlink to an existing
- * directory.
- */
- stat_status = stat(buf, &st);
- else
- /*
- * if there currently is a symlink, we would
- * want to replace it with a real directory.
- */
- stat_status = lstat(buf, &st);
-
- if (!stat_status && S_ISDIR(st.st_mode))
+ /* For 'checkout-index --prefix=<dir>', <dir> is
+ * allowed to be a symlink to an existing directory,
+ * and we set 'state->base_dir_len' below, such that
+ * we test the path components of the prefix with the
+ * stat() function instead of the lstat() function.
+ */
+ if (has_dirs_only_path(len, buf, state->base_dir_len))
continue; /* ok, it is already a directory. */
- /*
- * We know stat_status == 0 means something exists
- * there and this mkdir would fail, but that is an
- * error codepath; we do not care, as we unlink and
- * mkdir again in such a case.
+ /* If this mkdir() would fail, it could be that there
+ * is already a symlink or something else exists
+ * there, therefore we then try to unlink it and try
+ * one more time to create the directory.
*/
if (mkdir(buf, 0777)) {
if (errno == EEXIST && state->force &&
diff --git a/symlinks.c b/symlinks.c
index 0004e97d2547467564b146232f40cba4f5b04a3e..c92fac7506a702157e4b780d4331f3c7b412798d 100644
--- a/symlinks.c
+++ b/symlinks.c
@@ -1,10 +1,11 @@
#include "cache.h"
static struct cache_def {
- char path[PATH_MAX];
+ char path[PATH_MAX + 1];
int len;
int flags;
int track_flags;
+ int prefix_len_stat_func;
} cache;
static inline int greatest_match_lstat_cache(int len, const char *name)
@@ -25,12 +26,13 @@ static inline int greatest_match_lstat_cache(int len, const char *name)
return match_len;
}
-static inline void reset_lstat_cache(int track_flags)
+static inline void reset_lstat_cache(int track_flags, int prefix_len_stat_func)
{
cache.path[0] = '\0';
cache.len = 0;
cache.flags = 0;
cache.track_flags = track_flags;
+ cache.prefix_len_stat_func = prefix_len_stat_func;
}
#define FL_DIR (1 << 0)
@@ -38,28 +40,35 @@ static inline void reset_lstat_cache(int track_flags)
#define FL_SYMLINK (1 << 2)
#define FL_LSTATERR (1 << 3)
#define FL_ERR (1 << 4)
+#define FL_FULLPATH (1 << 5)
/*
* Check if name 'name' of length 'len' has a symlink leading
* component, or if the directory exists and is real, or not.
*
* To speed up the check, some information is allowed to be cached.
- * This is can be indicated by the 'track_flags' argument.
+ * This is can be indicated by the 'track_flags' argument, which also
+ * can be used to indicate that we should always check the full path.
+ *
+ * The 'prefix_len_stat_func' parameter can be used to set the length
+ * of the prefix, where the cache should use the stat() function
+ * instead of the lstat() function to test each path component.
*/
static int lstat_cache(int len, const char *name,
- int track_flags)
+ int track_flags, int prefix_len_stat_func)
{
- int match_len, last_slash, last_slash_dir, max_len;
+ int match_len, last_slash, last_slash_dir, max_len, ret;
int match_flags, ret_flags, save_flags;
struct stat st;
- if (cache.track_flags != track_flags) {
+ if (cache.track_flags != track_flags ||
+ cache.prefix_len_stat_func != prefix_len_stat_func) {
/*
- * As a safeguard we clear the cache if the value of
- * track_flags does not match with the last supplied
- * value.
+ * As a safeguard we clear the cache if the values of
+ * track_flags and/or prefix_len_stat_func does not
+ * match with the last supplied values.
*/
- reset_lstat_cache(track_flags);
+ reset_lstat_cache(track_flags, prefix_len_stat_func);
match_len = last_slash = 0;
} else {
/*
@@ -91,12 +100,17 @@ static int lstat_cache(int len, const char *name,
cache.path[match_len] = name[match_len];
match_len++;
} while (match_len < max_len && name[match_len] != '/');
- if (match_len >= max_len)
+ if (match_len >= max_len && !(track_flags & FL_FULLPATH))
break;
last_slash = match_len;
cache.path[last_slash] = '\0';
- if (lstat(cache.path, &st)) {
+ if (last_slash <= prefix_len_stat_func)
+ ret = stat(cache.path, &st);
+ else
+ ret = lstat(cache.path, &st);
+
+ if (ret) {
ret_flags = FL_LSTATERR;
if (errno == ENOENT)
ret_flags |= FL_NOENT;
@@ -136,17 +150,19 @@ static int lstat_cache(int len, const char *name,
cache.len = last_slash_dir;
cache.flags = FL_DIR;
} else {
- reset_lstat_cache(track_flags);
+ reset_lstat_cache(track_flags, prefix_len_stat_func);
}
return ret_flags;
}
+#define USE_ONLY_LSTAT 0
+
/* Return non-zero if path 'name' has a leading symlink component.
*/
int has_symlink_leading_path(int len, const char *name)
{
return lstat_cache(len, name,
- FL_SYMLINK|FL_DIR) &
+ FL_SYMLINK|FL_DIR, USE_ONLY_LSTAT) &
FL_SYMLINK;
}
@@ -156,6 +172,19 @@ int has_symlink_leading_path(int len, const char *name)
int has_symlink_or_noent_leading_path(int len, const char *name)
{
return lstat_cache(len, name,
- FL_SYMLINK|FL_NOENT|FL_DIR) &
+ FL_SYMLINK|FL_NOENT|FL_DIR, USE_ONLY_LSTAT) &
(FL_SYMLINK|FL_NOENT);
}
+
+/* Return non-zero if all path components of 'name' exists as a
+ * directory. If prefix_len > 0, we will test with the stat()
+ * function instead of the lstat() function for a prefix length of
+ * 'prefix_len', thus we then allow for symlink(s) in the prefix part
+ * as long as those points to real existing directories.
+ */
+int has_dirs_only_path(int len, const char *name, int prefix_len)
+{
+ return lstat_cache(len, name,
+ FL_DIR|FL_FULLPATH, prefix_len) &
+ FL_DIR;
+}
--
1.6.1.rc1.49.g7f705
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH/RFC v6 1/3] lstat_cache(): more cache effective symlink/directory detection
2009-01-11 13:29 ` [PATCH/RFC v6 1/3] lstat_cache(): more cache effective symlink/directory detection Kjetil Barvik
@ 2009-01-12 4:05 ` Junio C Hamano
0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2009-01-12 4:05 UTC (permalink / raw)
To: Kjetil Barvik; +Cc: git, René Scharfe, Linus Torvalds
Kjetil Barvik <barvik@broadpark.no> writes:
> -static inline void set_pathname(int len, const char *name, struct pathname *match)
> +static inline void reset_lstat_cache(void)
> {
> - if (len < PATH_MAX) {
> - match->len = len;
> - memcpy(match->path, name, len);
> - match->path[len] = 0;
> - }
> + cache.path[0] = '\0';
> + cache.len = 0;
> + cache.flags = 0;
> }
I see you made this internal to the caching code, but I suspect in the
long run there needs to be a way for callers that use the caching
mechanism to check and then create new paths in the work tree to
invalidate the cached code (namely, builtin-apply.c::write_out_results()
and entry.c::checkout_entry() codepaths).
I'll queue this round to 'pu' anyway, though.
Thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-01-12 4:07 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-11 13:28 [PATCH/RFC v6 0/3] git checkout: optimise away lots of lstat() calls Kjetil Barvik
2009-01-11 13:29 ` [PATCH/RFC v6 1/3] lstat_cache(): more cache effective symlink/directory detection Kjetil Barvik
2009-01-12 4:05 ` Junio C Hamano
2009-01-11 13:29 ` [PATCH/RFC v6 2/3] lstat_cache(): introduce has_symlink_or_noent_leading_path() function Kjetil Barvik
2009-01-11 13:29 ` [PATCH/RFC v6 3/3] lstat_cache(): introduce has_dirs_only_path() function Kjetil Barvik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).