* [PATCH/RFC v2 0/4] git checkout: optimise away lots of lstat() calls
@ 2009-01-06 20:36 Kjetil Barvik
2009-01-06 20:36 ` [PATCH/RFC v2 1/4] Optimised, faster, more effective symlink/directory detection Kjetil Barvik
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Kjetil Barvik @ 2009-01-06 20:36 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Kjetil Barvik
I have just started to clone some interesting Linux git trees to watch
the development more closely, and therefore also started to use git. I
noticed that 'git checkout' takes some time, and especially that the
'git checkout' command does lots and lots of lstat() calls.
After some more investigation and thinking, I have made 4 patches and
been able to optimise away over 42% of all lstat() calls in some cases
for the 'git checkout' command. I have not tested other git porcelain
commands for reduced lstat() calls, but I would guess that the more
effective 'lstat_cache()' compared to 'has_leading_symlink_cache()',
should also give better numbers in other cases.
All the 4 patches is against git master, and the git 'make test'
test suite still passes after each patch.
To document the improvement, below is some numbers, which compares
before and after all 4 patches. To reproduce the numbers:
- git clone the Linux git tree to be able to get the Linux tags
'v2.6.25' and 'v2.6.27'.
- git checkout -b my-v2.6.27 v2.6.27
- git checkout -b my-v2.6.25 v2.6.25
Then, when the current branch is 'my-v2.6.25' do:
strace -o strace_to27 -T git checkout -q my-v2.6.27
And then you pretty print and collect stats from the 'strace_to27'
file. If someone wants a copy of the strace_stat.pl script, which I
made/used to do the pretty printing, then give me a hint.
Below is the stats/numbers from the current git version (before the 4
patches). Notice that we do an lstat() call on the "arch" directory
over 6000 times!
TOTAL 185151 100.000% OK:165544 NOT: 19607 11.136001 sec 60 usec/call
lstat64 120954 65.327% OK:107013 NOT: 13941 5.388727 sec 45 usec/call
strings 120954 tot 30163 uniq 4.010 /uniq 5.388727 sec 45 usec/call
files 61491 tot 28712 uniq 2.142 /uniq 2.740520 sec 45 usec/call
dirs 45522 tot 1436 uniq 31.701 /uniq 1.994448 sec 44 usec/call
errors 13941 tot 5189 uniq 2.687 /uniq 0.653759 sec 47 usec/call
6297 5.206% OK: 6297 NOT: 0 "arch"
4544 3.757% OK: 4544 NOT: 0 "drivers"
1816 1.501% OK: 1816 NOT: 0 "arch/arm"
1499 1.239% OK: 1499 NOT: 0 "include"
912 0.754% OK: 912 NOT: 0 "arch/powerpc"
764 0.632% OK: 764 NOT: 0 "fs"
746 0.617% OK: 746 NOT: 0 "drivers/net"
662 0.547% OK: 662 NOT: 0 "net"
652 0.539% OK: 325 NOT: 327 "arch/sparc/include"
636 0.526% OK: 636 NOT: 0 "drivers/media"
606 0.501% OK: 606 NOT: 0 "include/linux"
533 0.441% OK: 533 NOT: 0 "arch/sh"
522 0.432% OK: 260 NOT: 262 "arch/powerpc/include"
488 0.403% OK: 243 NOT: 245 "arch/sh/include"
413 0.341% OK: 413 NOT: 0 "arch/sparc"
390 0.322% OK: 390 NOT: 0 "arch/x86"
383 0.317% OK: 383 NOT: 0 "Documentation"
370 0.306% OK: 184 NOT: 186 "arch/ia64/include"
366 0.303% OK: 366 NOT: 0 "drivers/media/video"
348 0.288% OK: 173 NOT: 175 "arch/arm/include"
Here is the stats/numbers after applying the 4 patches. Notice how
nice the top 20 entries list now looks!
TOTAL 133655 100.000% OK:121615 NOT: 12040 10.429999 sec 78 usec/call
lstat64 69603 52.077% OK: 63218 NOT: 6385 3.419920 sec 49 usec/call
strings 69603 tot 30163 uniq 2.308 /uniq 3.419920 sec 49 usec/call
files 61491 tot 28712 uniq 2.142 /uniq 3.034869 sec 49 usec/call
dirs 1727 tot 1164 uniq 1.484 /uniq 0.075681 sec 44 usec/call
errors 6385 tot 5189 uniq 1.230 /uniq 0.309370 sec 48 usec/call
4 0.006% OK: 4 NOT: 0 ".gitignore"
4 0.006% OK: 4 NOT: 0 ".mailmap"
4 0.006% OK: 4 NOT: 0 "CREDITS"
4 0.006% OK: 4 NOT: 0 "Documentation/00-INDEX"
4 0.006% OK: 4 NOT: 0 "Documentation/ABI/testing/sysfs-block"
4 0.006% OK: 4 NOT: 0 "Documentation/ABI/testing/sysfs-firmware-acpi"
4 0.006% OK: 4 NOT: 0 "Documentation/CodingStyle"
4 0.006% OK: 4 NOT: 0 "Documentation/DMA-API.txt"
4 0.006% OK: 4 NOT: 0 "Documentation/DMA-mapping.txt"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/Makefile"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/gadget.tmpl"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/kernel-api.tmpl"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/kernel-locking.tmpl"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/procfs-guide.tmpl"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/procfs_example.c"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/rapidio.tmpl"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/s390-drivers.tmpl"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/uio-howto.tmpl"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/videobook.tmpl"
4 0.006% OK: 4 NOT: 0 "Documentation/DocBook/writing_usb_driver.tmpl"
Note that the overall used system time as recorded from 'strace -T',
does not drop so much that the reduced lstat() time should indicate
for _this_ particular test run. This is because now each unlink()
call takes much more time, at least for me on an slow ide disk (using
ext3) on a laptop.
A simple test gives me an overall improvement of 2.937 seconds: real
time drops from 28.195s (best of 5 runs with 'time git ...'), to
25.381s (best of 5 runs).
I have also noticed that inside the unlink_entry() function in file
unpack-trees.c, one could often end up calling rmdir() lots and lots
of times on none-empty directories. Maybe one should schedule each
directory for removal by an appropriate function, and then at the end
call a new function to clean all the directories at once?
Comments?
----
Changes since v1:
= inside patch 1/4
- always null terminate the cache_path array
- added a paragraph to the commit message for this patch
- small cleanup on 2 comments, and a small line indentation change
[[Following is updates after comments from Junio C Hamano - Thanks!]]
- removed the 'static inline update_path_cache()' function
- replaced the else-part of the above inline function with a call
to the 'clear_lstat_cache()' function.
- deleted the '|| errno == ENOTDIR' part inside the big while-loop
inside check_lstat_cache(), and updated the named BIT-fields
accordingly
= inside patch 2/4
- moved a paragraph out from the commit message for this patch and
into this cover-letter
[[Following is updates after comments from Junio C Hamano - Thanks!]]
- Removed the '|LSTAT_NOTDIR' part from the call to lstat_cache()
inside function 'check_removed()' inside file diff-lib.c
Kjetil Barvik (4):
Optimised, faster, more effective symlink/directory detection
Use 'lstat_cache()' instead of 'has_symlink_leading_path()'
create_directories() inside entry.c: only check each directory once!
remove the old 'has_symlink_leading_path()' function
Makefile | 2 +-
builtin-add.c | 5 ++-
builtin-apply.c | 5 ++-
builtin-update-index.c | 5 ++-
cache.h | 17 ++++++++-
diff-lib.c | 5 ++-
dir.c | 4 +-
entry.c | 86 +++++++++++++++++++++++++++++++++---------
lstat_cache.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++
symlinks.c | 64 -------------------------------
unpack-trees.c | 10 ++++-
11 files changed, 211 insertions(+), 91 deletions(-)
create mode 100644 lstat_cache.c
delete mode 100644 symlinks.c
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH/RFC v2 1/4] Optimised, faster, more effective symlink/directory detection
2009-01-06 20:36 [PATCH/RFC v2 0/4] git checkout: optimise away lots of lstat() calls Kjetil Barvik
@ 2009-01-06 20:36 ` Kjetil Barvik
2009-01-06 20:36 ` [PATCH/RFC v2 2/4] Use 'lstat_cache()' instead of 'has_symlink_leading_path()' Kjetil Barvik
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Kjetil Barvik @ 2009-01-06 20:36 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Kjetil Barvik
This patch is work based on the following 2 commits:
Linus Torvalds: c40641b77b0274186fd1b327d5dc3246f814aaaf
Junio C Hamano: f859c846e90b385c7ef873df22403529208ade50
Changes includes the following:
- The cache functionality is more effective. Previously when A/B/C/D
was in the cache and A/B/C/E/file.c was called for, there was no
match at all from the cache. Now we use the fact that the paths
"A", "A/B" and "A/B/C" is already tested, and we only need to do an
lstat() call on "A/B/C/E".
- We only cache/store the last path regardless of it's type. Since the
cache functionality is always used with alphabetically sorted names
(at least it seams so for me), there is no need to store both the
last symlink-leading path and the last real-directory path. Note
that if the cache is not called with (mostly) alphabetically sorted
names, neither the old, nor this new one, would be very effective.
- We also can cache the fact that a directory does not exist.
Previously we could end up doing lots of lstat() calls for a removed
directory which previously contained lots of files. Since we
already have simplified the cache functionality and only store the
last path (see above), this new functionality was easy to add.
- Previously, when symlink A/B/C/S was cached/stored in the
symlink-leading path, and A/B/C/file.c was called for, it was not
easy to use the fact that we already known that the paths "A", "A/B"
and "A/B/C" is real directories. Since we now only store one single
path (the last one), we also get similar logic for free regarding
the new "non-exsisting-directory-cache".
- Avoid copying the first path components of the name 2 zillions times
when we tests new path components. Since we always cache/store the
last path, we can copy each component as we test those directly into
the cache. Previously we ended up doing a memcpy() for the full
path/name right before each lstat() call, and when updating the
cache for each time we have tested an new path component.
- We also use less memory, that is PATH_MAX bytes less memory on the
stack and PATH_MAX bytes less memory on the heap.
- Introduce a 3rd argument, 'unsigned int track_flags', to the
cache-test function, check_lstat_cache(). This new argument can be
used to tell the cache functionality which types of directories
should be cached.
- Also introduce a 'void clear_lstat_cache(void)' function, which
should be used to clean the cache before usage. If for instance,
you have changed the types of directories which should be cached,
the cache could contain a path which was not wanted.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
:100644 100644 aabf013... 7449b10... M Makefile
:100644 100644 231c06d... 9adc4a2... M cache.h
:000000 100644 0000000... 2f025c8... A lstat_cache.c
Makefile | 1 +
cache.h | 15 +++++++++
lstat_cache.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 115 insertions(+), 0 deletions(-)
create mode 100644 lstat_cache.c
diff --git a/Makefile b/Makefile
index aabf0130b99bee5204c8e668ba8f40caea77dae2..7449b105b03e862d53244d50ed035b4ddabef028 100644
--- a/Makefile
+++ b/Makefile
@@ -446,6 +446,7 @@ LIB_OBJS += list-objects.o
LIB_OBJS += ll-merge.o
LIB_OBJS += lockfile.o
LIB_OBJS += log-tree.o
+LIB_OBJS += lstat_cache.o
LIB_OBJS += mailmap.o
LIB_OBJS += match-trees.o
LIB_OBJS += merge-file.o
diff --git a/cache.h b/cache.h
index 231c06d7726b575f6e522d5b0c0fe43557e8c651..9adc4a224648bb17a50e6792274225eeb6144451 100644
--- a/cache.h
+++ b/cache.h
@@ -721,6 +721,21 @@ struct checkout {
extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath);
extern int has_symlink_leading_path(int len, const char *name);
+#define LSTAT_DIR (1u << 0)
+#define LSTAT_NOENT (1u << 1)
+#define LSTAT_SYMLINK (1u << 2)
+#define LSTAT_LSTATERR (1u << 3)
+#define LSTAT_ERR (1u << 4)
+extern unsigned int check_lstat_cache(int len, const char *name,
+ unsigned int track_flags);
+static inline unsigned int lstat_cache(int len, const char *name,
+ unsigned int track_flags,
+ unsigned int mask_flags)
+{
+ return check_lstat_cache(len, name, track_flags) & mask_flags;
+}
+extern void clear_lstat_cache(void);
+
extern struct alternate_object_database {
struct alternate_object_database *next;
char *name;
diff --git a/lstat_cache.c b/lstat_cache.c
new file mode 100644
index 0000000000000000000000000000000000000000..2f025c87b0be0e713af7b1400e71284a22c25e9f
--- /dev/null
+++ b/lstat_cache.c
@@ -0,0 +1,99 @@
+#include "cache.h"
+
+static char cache_path[PATH_MAX];
+static int cache_len = 0;
+static unsigned int cache_flags = 0;
+
+static inline int
+greatest_common_path_cache_prefix(int len, const char *name)
+{
+ int max_len, match_len = 0, i = 0;
+
+ max_len = len < cache_len ? len : cache_len;
+ while (i < max_len && name[i] == cache_path[i]) {
+ if (name[i] == '/') match_len = i;
+ i++;
+ }
+ if (i == cache_len && len > cache_len && name[cache_len] == '/')
+ match_len = cache_len;
+ return match_len;
+}
+
+/*
+ * Check if name 'name' of length 'len' has a symlink leading
+ * component, or if the directory exists and is real, or not.
+ *
+ * To speed up the check, some information is allowed to be cached.
+ * This is indicated by the 'track_flags' argument.
+ */
+unsigned int
+check_lstat_cache(int len, const char *name, unsigned int track_flags)
+{
+ int match_len, last_slash, max_len;
+ unsigned int match_flags, ret_flags, save_flags;
+ struct stat st;
+
+ /* Check if match from the cache for 2 "excluding" path types.
+ */
+ match_len = last_slash =
+ greatest_common_path_cache_prefix(len, name);
+ match_flags =
+ cache_flags & track_flags & (LSTAT_NOENT|LSTAT_SYMLINK);
+ if (match_flags && match_len == cache_len)
+ return match_flags;
+
+ /* Okay, no match from the cache so far, so now we have to
+ * check the rest of the path components.
+ */
+ ret_flags = LSTAT_DIR;
+ max_len = len < PATH_MAX ? len : PATH_MAX;
+ while (match_len < max_len) {
+ do {
+ cache_path[match_len] = name[match_len];
+ match_len++;
+ } while (match_len < max_len && name[match_len] != '/');
+ if (match_len >= max_len)
+ break;
+ last_slash = match_len;
+ cache_path[last_slash] = '\0';
+
+ if (lstat(cache_path, &st)) {
+ ret_flags = LSTAT_LSTATERR;
+ if (errno == ENOENT)
+ ret_flags |= LSTAT_NOENT;
+ } else if (S_ISDIR(st.st_mode)) {
+ continue;
+ } else if (S_ISLNK(st.st_mode)) {
+ ret_flags = LSTAT_SYMLINK;
+ } else {
+ ret_flags = LSTAT_ERR;
+ }
+ break;
+ }
+
+ /* At the end update the cache. Note that max 3 different
+ * path types can be cached for the moment!
+ */
+ save_flags = ret_flags & track_flags &
+ (LSTAT_NOENT|LSTAT_SYMLINK|LSTAT_DIR);
+ if (save_flags && last_slash > 0 && last_slash < PATH_MAX) {
+ cache_path[last_slash] = '\0';
+ cache_len = last_slash;
+ cache_flags = save_flags;
+ } else {
+ clear_lstat_cache();
+ }
+ return ret_flags;
+}
+
+/*
+ * Before usage of the check_lstat_cache() function one should call
+ * clear_lstat_cache() (at an appropriate place) to make sure that the
+ * cache is clean.
+ */
+void clear_lstat_cache(void)
+{
+ cache_path[0] = '\0';
+ cache_len = 0;
+ cache_flags = 0;
+}
--
1.6.1.rc1.49.g7f705
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH/RFC v2 2/4] Use 'lstat_cache()' instead of 'has_symlink_leading_path()'
2009-01-06 20:36 [PATCH/RFC v2 0/4] git checkout: optimise away lots of lstat() calls Kjetil Barvik
2009-01-06 20:36 ` [PATCH/RFC v2 1/4] Optimised, faster, more effective symlink/directory detection Kjetil Barvik
@ 2009-01-06 20:36 ` Kjetil Barvik
2009-01-06 21:08 ` Linus Torvalds
2009-01-06 20:36 ` [PATCH/RFC v2 3/4] create_directories() inside entry.c: only check each directory once! Kjetil Barvik
2009-01-06 20:36 ` [PATCH/RFC v2 4/4] remove the old 'has_symlink_leading_path()' function Kjetil Barvik
3 siblings, 1 reply; 7+ messages in thread
From: Kjetil Barvik @ 2009-01-06 20:36 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Kjetil Barvik
Start using the optimised, faster and more effective symlink/directory
cache. The previously used call:
has_symlink_leading_path(len, name);
should be identically with the following call to lstat_cache():
lstat_cache(len, name,
LSTAT_SYMLINK|LSTAT_DIR,
LSTAT_SYMLINK);
The primary reason for the new name of the function (instead of using
the old name and add 2 extra arguments), is that it is now more
general, for instance, it now also can cache the fact that a directory
does not exists.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
:100644 100644 719de8b... 152c52c... M builtin-add.c
:100644 100644 a8f75ed... 0eb2b21... M builtin-apply.c
:100644 100644 5604977... 2844ddd... M builtin-update-index.c
:100644 100644 ae96c64... b67ee06... M diff-lib.c
:100644 100644 0131983... 9f2a1b1... M dir.c
:100644 100644 54f301d... 0938b86... M unpack-trees.c
builtin-add.c | 5 ++++-
builtin-apply.c | 5 ++++-
builtin-update-index.c | 5 ++++-
diff-lib.c | 5 ++++-
dir.c | 4 +++-
unpack-trees.c | 9 +++++++--
6 files changed, 26 insertions(+), 7 deletions(-)
diff --git a/builtin-add.c b/builtin-add.c
index 719de8b0f2d2d831f326d948aa18700e5c474950..152c52c0f22b3e71931b2a5629e1472602817785 100644
--- a/builtin-add.c
+++ b/builtin-add.c
@@ -121,7 +121,9 @@ static const char **validate_pathspec(int argc, const char **argv, const char *p
if (pathspec) {
const char **p;
for (p = pathspec; *p; p++) {
- if (has_symlink_leading_path(strlen(*p), *p)) {
+ if (lstat_cache(strlen(*p), *p,
+ LSTAT_SYMLINK|LSTAT_DIR,
+ LSTAT_SYMLINK)) {
int len = prefix ? strlen(prefix) : 0;
die("'%s' is beyond a symbolic link", *p + len);
}
@@ -225,6 +227,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
argc = parse_options(argc, argv, builtin_add_options,
builtin_add_usage, 0);
+ clear_lstat_cache();
if (patch_interactive)
add_interactive = 1;
if (add_interactive)
diff --git a/builtin-apply.c b/builtin-apply.c
index a8f75ed3ed411d8cf7a3ec9dfefef7407c50f447..0eb2b21ee245919189c780a64cc494ca35f7934a 100644
--- a/builtin-apply.c
+++ b/builtin-apply.c
@@ -2354,7 +2354,9 @@ static int check_to_create_blob(const char *new_name, int ok_if_exists)
* In such a case, path "new_name" does not exist as
* far as git is concerned.
*/
- if (has_symlink_leading_path(strlen(new_name), new_name))
+ if (lstat_cache(strlen(new_name), new_name,
+ LSTAT_SYMLINK|LSTAT_DIR,
+ LSTAT_SYMLINK))
return 0;
return error("%s: already exists in working directory", new_name);
@@ -3154,6 +3156,7 @@ int cmd_apply(int argc, const char **argv, const char *unused_prefix)
if (apply_default_whitespace)
parse_whitespace_option(apply_default_whitespace);
+ clear_lstat_cache();
for (i = 1; i < argc; i++) {
const char *arg = argv[i];
char *end;
diff --git a/builtin-update-index.c b/builtin-update-index.c
index 560497750586ec61be4e34de6dedd9c307129817..2844ddd95d208b35d47c1cc2ca40af4b68bd94ab 100644
--- a/builtin-update-index.c
+++ b/builtin-update-index.c
@@ -195,7 +195,9 @@ static int process_path(const char *path)
struct stat st;
len = strlen(path);
- if (has_symlink_leading_path(len, path))
+ if (lstat_cache(len, path,
+ LSTAT_SYMLINK|LSTAT_DIR,
+ LSTAT_SYMLINK))
return error("'%s' is beyond a symbolic link", path);
/*
@@ -581,6 +583,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
if (entries < 0)
die("cache corrupted");
+ clear_lstat_cache();
for (i = 1 ; i < argc; i++) {
const char *path = argv[i];
const char *p;
diff --git a/diff-lib.c b/diff-lib.c
index ae96c64ca209f4df9008198e8a04b160bed618c7..b67ee068e377e5aa1c7ed4abd88c0e307ba8fb4f 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -31,7 +31,9 @@ static int check_removed(const struct cache_entry *ce, struct stat *st)
return -1;
return 1;
}
- if (has_symlink_leading_path(ce_namelen(ce), ce->name))
+ if (lstat_cache(ce_namelen(ce), ce->name,
+ LSTAT_SYMLINK|LSTAT_DIR,
+ LSTAT_SYMLINK))
return 1;
if (S_ISDIR(st->st_mode)) {
unsigned char sub[20];
@@ -69,6 +71,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
diff_unmerged_stage = 2;
entries = active_nr;
symcache[0] = '\0';
+ clear_lstat_cache();
for (i = 0; i < entries; i++) {
struct stat st;
unsigned int oldmode, newmode;
diff --git a/dir.c b/dir.c
index 0131983dfbc143ce5dae77e067663bb2e7d5f126..9f2a1b1f245c3f1d4f12d54d45bb193c53fa15b5 100644
--- a/dir.c
+++ b/dir.c
@@ -719,7 +719,9 @@ int read_directory(struct dir_struct *dir, const char *path, const char *base, i
{
struct path_simplify *simplify;
- if (has_symlink_leading_path(strlen(path), path))
+ if (lstat_cache(strlen(path), path,
+ LSTAT_SYMLINK|LSTAT_DIR,
+ LSTAT_SYMLINK))
return dir->nr;
simplify = create_simplify(pathspec);
diff --git a/unpack-trees.c b/unpack-trees.c
index 54f301da67be879c80426bc21776427fdd38c02e..0938b86278f9b6aa14ef32fe15392f34b694d8cc 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -61,7 +61,9 @@ static void unlink_entry(struct cache_entry *ce)
char *cp, *prev;
char *name = ce->name;
- if (has_symlink_leading_path(ce_namelen(ce), ce->name))
+ if (lstat_cache(ce_namelen(ce), ce->name,
+ LSTAT_SYMLINK|LSTAT_NOENT|LSTAT_DIR,
+ LSTAT_SYMLINK|LSTAT_NOENT))
return;
if (unlink(name))
return;
@@ -105,6 +107,7 @@ static int check_updates(struct unpack_trees_options *o)
cnt = 0;
}
+ clear_lstat_cache();
for (i = 0; i < index->cache_nr; i++) {
struct cache_entry *ce = index->cache[i];
@@ -584,7 +587,9 @@ static int verify_absent(struct cache_entry *ce, const char *action,
if (o->index_only || o->reset || !o->update)
return 0;
- if (has_symlink_leading_path(ce_namelen(ce), ce->name))
+ if (lstat_cache(ce_namelen(ce), ce->name,
+ LSTAT_SYMLINK|LSTAT_NOENT|LSTAT_DIR,
+ LSTAT_SYMLINK|LSTAT_NOENT))
return 0;
if (!lstat(ce->name, &st)) {
--
1.6.1.rc1.49.g7f705
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH/RFC v2 3/4] create_directories() inside entry.c: only check each directory once!
2009-01-06 20:36 [PATCH/RFC v2 0/4] git checkout: optimise away lots of lstat() calls Kjetil Barvik
2009-01-06 20:36 ` [PATCH/RFC v2 1/4] Optimised, faster, more effective symlink/directory detection Kjetil Barvik
2009-01-06 20:36 ` [PATCH/RFC v2 2/4] Use 'lstat_cache()' instead of 'has_symlink_leading_path()' Kjetil Barvik
@ 2009-01-06 20:36 ` Kjetil Barvik
2009-01-06 20:36 ` [PATCH/RFC v2 4/4] remove the old 'has_symlink_leading_path()' function Kjetil Barvik
3 siblings, 0 replies; 7+ messages in thread
From: Kjetil Barvik @ 2009-01-06 20:36 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Kjetil Barvik
When we do an 'git checkout' after some time we end up in the
'checkout_entry()' function inside entry.c, and from here we call the
'create_directories()' function to make sure the all the directories
exists for the possible new file or entry.
The 'create_directories()' function happily started to check that all
path component exists. This resulted in tons and tons of calls to
lstat() or stat() when we checkout files nested deep inside a
directory.
We try to avoid this by remembering the last checked and possible
newly created directory.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
:100644 100644 9adc4a2... 4c56f3f... M cache.h
:100644 100644 aa2ee46... 666a8ce... M entry.c
:100644 100644 0938b86... 037678d... M unpack-trees.c
cache.h | 1 +
entry.c | 86 ++++++++++++++++++++++++++++++++++++++++++++------------
unpack-trees.c | 1 +
3 files changed, 70 insertions(+), 18 deletions(-)
diff --git a/cache.h b/cache.h
index 9adc4a224648bb17a50e6792274225eeb6144451..4c56f3faa0b87398e27346358e40af964a1828bb 100644
--- a/cache.h
+++ b/cache.h
@@ -718,6 +718,7 @@ struct checkout {
refresh_cache:1;
};
+extern void clear_created_dirs_cache(void);
extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath);
extern int has_symlink_leading_path(int len, const char *name);
diff --git a/entry.c b/entry.c
index aa2ee46a84033585d8e07a585610c5a697af82c2..666a8ce3a132e85a45b0828521f3c2119c77833e 100644
--- a/entry.c
+++ b/entry.c
@@ -1,33 +1,76 @@
#include "cache.h"
#include "blob.h"
-static void create_directories(const char *path, const struct checkout *state)
+static char dirs_path[PATH_MAX];
+static int dirs_len = 0;
+
+static inline int
+greatest_common_created_dirs_prefix(int len, const char *name)
{
- int len = strlen(path);
- char *buf = xmalloc(len + 1);
- const char *slash = path;
+ int max_len, match_len = 0, i = 0;
- while ((slash = strchr(slash+1, '/')) != NULL) {
- struct stat st;
- int stat_status;
+ max_len = len < dirs_len ? len : dirs_len;
+ while (i < max_len && name[i] == dirs_path[i]) {
+ if (name[i] == '/') match_len = i;
+ i++;
+ }
+ if (i == dirs_len && len > dirs_len && name[dirs_len] == '/')
+ match_len = dirs_len;
+ return match_len;
+}
+
+static inline void
+update_created_dirs_cache(int last_slash)
+{
+ if (last_slash > 0 && last_slash < PATH_MAX) {
+ dirs_len = last_slash;
+ } else {
+ dirs_len = 0;
+ }
+}
- len = slash - path;
- memcpy(buf, path, len);
- buf[len] = 0;
+void clear_created_dirs_cache(void)
+{
+ dirs_len = 0;
+}
+
+static void
+create_directories(int len, const char *path, const struct checkout *state)
+{
+ int i, max_len, last_slash, stat_status;
+ struct stat st;
+
+ /* Check the cache for previously checked or created
+ * directories (and components) within this function. There
+ * is no need to check or re-create directory components more
+ * than once!
+ */
+ max_len = len < PATH_MAX ? len : PATH_MAX;
+ i = last_slash = greatest_common_created_dirs_prefix(max_len, path);
- if (len <= state->base_dir_len)
+ while (i < max_len) {
+ do {
+ dirs_path[i] = path[i];
+ i++;
+ } while (i < max_len && path[i] != '/');
+ if (i >= max_len)
+ break;
+ last_slash = i;
+ dirs_path[last_slash] = '\0';
+
+ if (last_slash <= state->base_dir_len)
/*
* checkout-index --prefix=<dir>; <dir> is
* allowed to be a symlink to an existing
* directory.
*/
- stat_status = stat(buf, &st);
+ stat_status = stat(dirs_path, &st);
else
/*
* if there currently is a symlink, we would
* want to replace it with a real directory.
*/
- stat_status = lstat(buf, &st);
+ stat_status = lstat(dirs_path, &st);
if (!stat_status && S_ISDIR(st.st_mode))
continue; /* ok, it is already a directory. */
@@ -38,14 +81,14 @@ static void create_directories(const char *path, const struct checkout *state)
* error codepath; we do not care, as we unlink and
* mkdir again in such a case.
*/
- if (mkdir(buf, 0777)) {
+ if (mkdir(dirs_path, 0777)) {
if (errno == EEXIST && state->force &&
- !unlink(buf) && !mkdir(buf, 0777))
+ !unlink(dirs_path) && !mkdir(dirs_path, 0777))
continue;
- die("cannot create directory at %s", buf);
+ die("cannot create directory at %s", dirs_path);
}
}
- free(buf);
+ update_created_dirs_cache(last_slash);
}
static void remove_subtree(const char *path)
@@ -55,6 +98,11 @@ static void remove_subtree(const char *path)
char pathbuf[PATH_MAX];
char *name;
+ /* To be utterly safe we invalidate the cache of the
+ * previously created directories.
+ */
+ clear_created_dirs_cache();
+
if (!dir)
die("cannot opendir %s (%s)", path, strerror(errno));
strcpy(pathbuf, path);
@@ -195,12 +243,14 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *t
static char path[PATH_MAX + 1];
struct stat st;
int len = state->base_dir_len;
+ int path_len;
if (topath)
return write_entry(ce, topath, state, 1);
memcpy(path, state->base_dir, len);
strcpy(path + len, ce->name);
+ path_len = len + ce_namelen(ce);
if (!lstat(path, &st)) {
unsigned changed = ce_match_stat(ce, &st, CE_MATCH_IGNORE_VALID);
@@ -229,6 +279,6 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *t
return error("unable to unlink old '%s' (%s)", path, strerror(errno));
} else if (state->not_new)
return 0;
- create_directories(path, state);
+ create_directories(path_len, path, state);
return write_entry(ce, path, state, 0);
}
diff --git a/unpack-trees.c b/unpack-trees.c
index 0938b86278f9b6aa14ef32fe15392f34b694d8cc..037678dfd75b7ef8235105e072fb276fcc6c36a8 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -121,6 +121,7 @@ static int check_updates(struct unpack_trees_options *o)
}
}
+ clear_created_dirs_cache();
for (i = 0; i < index->cache_nr; i++) {
struct cache_entry *ce = index->cache[i];
--
1.6.1.rc1.49.g7f705
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH/RFC v2 4/4] remove the old 'has_symlink_leading_path()' function
2009-01-06 20:36 [PATCH/RFC v2 0/4] git checkout: optimise away lots of lstat() calls Kjetil Barvik
` (2 preceding siblings ...)
2009-01-06 20:36 ` [PATCH/RFC v2 3/4] create_directories() inside entry.c: only check each directory once! Kjetil Barvik
@ 2009-01-06 20:36 ` Kjetil Barvik
3 siblings, 0 replies; 7+ messages in thread
From: Kjetil Barvik @ 2009-01-06 20:36 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Kjetil Barvik
It has been replaced by the more cache effective 'lstat_cache()'
function. see lstat_cache.c
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
:100644 100644 7449b10... edd4798... M Makefile
:100644 100644 4c56f3f... 8b8da85... M cache.h
:100644 000000 5a5e781... 0000000... D symlinks.c
Makefile | 1 -
cache.h | 1 -
symlinks.c | 64 ------------------------------------------------------------
3 files changed, 0 insertions(+), 66 deletions(-)
delete mode 100644 symlinks.c
diff --git a/Makefile b/Makefile
index 7449b105b03e862d53244d50ed035b4ddabef028..edd4798429ad3828f5b9dcff20f73c6a7269520e 100644
--- a/Makefile
+++ b/Makefile
@@ -483,7 +483,6 @@ LIB_OBJS += sha1_name.o
LIB_OBJS += shallow.o
LIB_OBJS += sideband.o
LIB_OBJS += strbuf.o
-LIB_OBJS += symlinks.o
LIB_OBJS += tag.o
LIB_OBJS += trace.o
LIB_OBJS += transport.o
diff --git a/cache.h b/cache.h
index 4c56f3faa0b87398e27346358e40af964a1828bb..8b8da85f32600050dddfcdf6b0c0e6ecb8f27072 100644
--- a/cache.h
+++ b/cache.h
@@ -720,7 +720,6 @@ struct checkout {
extern void clear_created_dirs_cache(void);
extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath);
-extern int has_symlink_leading_path(int len, const char *name);
#define LSTAT_DIR (1u << 0)
#define LSTAT_NOENT (1u << 1)
diff --git a/symlinks.c b/symlinks.c
deleted file mode 100644
index 5a5e781a15d7d9cb60797958433eca896b31ec85..0000000000000000000000000000000000000000
--- a/symlinks.c
+++ /dev/null
@@ -1,64 +0,0 @@
-#include "cache.h"
-
-struct pathname {
- int len;
- char path[PATH_MAX];
-};
-
-/* Return matching pathname prefix length, or zero if not matching */
-static inline int match_pathname(int len, const char *name, struct pathname *match)
-{
- int match_len = match->len;
- return (len > match_len &&
- name[match_len] == '/' &&
- !memcmp(name, match->path, match_len)) ? match_len : 0;
-}
-
-static inline void set_pathname(int len, const char *name, struct pathname *match)
-{
- if (len < PATH_MAX) {
- match->len = len;
- memcpy(match->path, name, len);
- match->path[len] = 0;
- }
-}
-
-int has_symlink_leading_path(int len, const char *name)
-{
- static struct pathname link, nonlink;
- char path[PATH_MAX];
- struct stat st;
- char *sp;
- int known_dir;
-
- /*
- * See if the last known symlink cache matches.
- */
- if (match_pathname(len, name, &link))
- return 1;
-
- /*
- * Get rid of the last known directory part
- */
- known_dir = match_pathname(len, name, &nonlink);
-
- while ((sp = strchr(name + known_dir + 1, '/')) != NULL) {
- int thislen = sp - name ;
- memcpy(path, name, thislen);
- path[thislen] = 0;
-
- if (lstat(path, &st))
- return 0;
- if (S_ISDIR(st.st_mode)) {
- set_pathname(thislen, path, &nonlink);
- known_dir = thislen;
- continue;
- }
- if (S_ISLNK(st.st_mode)) {
- set_pathname(thislen, path, &link);
- return 1;
- }
- break;
- }
- return 0;
-}
--
1.6.1.rc1.49.g7f705
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH/RFC v2 2/4] Use 'lstat_cache()' instead of 'has_symlink_leading_path()'
2009-01-06 20:36 ` [PATCH/RFC v2 2/4] Use 'lstat_cache()' instead of 'has_symlink_leading_path()' Kjetil Barvik
@ 2009-01-06 21:08 ` Linus Torvalds
2009-01-06 23:25 ` Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2009-01-06 21:08 UTC (permalink / raw)
To: Kjetil Barvik; +Cc: git, Junio C Hamano
On Tue, 6 Jan 2009, Kjetil Barvik wrote:
?
> Start using the optimised, faster and more effective symlink/directory
> cache. The previously used call:
>
> has_symlink_leading_path(len, name);
>
> should be identically with the following call to lstat_cache():
>
> lstat_cache(len, name,
> LSTAT_SYMLINK|LSTAT_DIR,
> LSTAT_SYMLINK);
I think the new interface looks worse.
Why don't you just do a new inline function that says
static inline int has_symlink_leading_path(int len, const char *name)
{
return lstat_cache(len, name,
LSTAT_SYMLINK|LSTAT_DIR,
LSTAT_SYMLINK);
}
and now you don't need this big patch, and people who don't care about
those magic flags don't need to have them. End result: more readable code.
Then, the new users that want _new_ semantics can use the extended
version.
This is how git has done pretty much all "generalized" versions. See the
whole ce_modified() vs ie_modified() thing: they're the same function,
it's just that 'ce_modified()' is the traditional simpler interface that
works on the default index, while ie_modified() is the "full" version that
takes all the details that most uses don't even want to know about.
Linus
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH/RFC v2 2/4] Use 'lstat_cache()' instead of 'has_symlink_leading_path()'
2009-01-06 21:08 ` Linus Torvalds
@ 2009-01-06 23:25 ` Junio C Hamano
0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2009-01-06 23:25 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Kjetil Barvik, git
Linus Torvalds <torvalds@linux-foundation.org> writes:
>> ... The previously used call:
>>
>> has_symlink_leading_path(len, name);
>>
>> should be identically with the following call to lstat_cache():
>>
>> lstat_cache(len, name,
>> LSTAT_SYMLINK|LSTAT_DIR,
>> LSTAT_SYMLINK);
>
> I think the new interface looks worse.
>
> Why don't you just do a new inline function that says
>
> static inline int has_symlink_leading_path(int len, const char *name)
> {
> return lstat_cache(len, name,
> LSTAT_SYMLINK|LSTAT_DIR,
> LSTAT_SYMLINK);
> }
>
> and now you don't need this big patch, and people who don't care about
> those magic flags don't need to have them. End result: more readable code.
Excellent.
Not that I did not think a backward compatible macro is much easier to
read; after all, ce/ie you mention is a refactorizaton I did myself.
What I didn't think of was that posing the above question is a much better
way to extract a clear explanation why some of these lstat_cache() calls
have LSTAT_NOENT and some of them don't from the author. It is much
better way than my earlier attempt to do so.
> This is how git has done pretty much all "generalized" versions. See the
> whole ce_modified() vs ie_modified() thing: they're the same function,
> it's just that 'ce_modified()' is the traditional simpler interface that
> works on the default index, while ie_modified() is the "full" version that
> takes all the details that most uses don't even want to know about.
Yup, thanks for a praise ;-)
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-01-06 23:27 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-06 20:36 [PATCH/RFC v2 0/4] git checkout: optimise away lots of lstat() calls Kjetil Barvik
2009-01-06 20:36 ` [PATCH/RFC v2 1/4] Optimised, faster, more effective symlink/directory detection Kjetil Barvik
2009-01-06 20:36 ` [PATCH/RFC v2 2/4] Use 'lstat_cache()' instead of 'has_symlink_leading_path()' Kjetil Barvik
2009-01-06 21:08 ` Linus Torvalds
2009-01-06 23:25 ` Junio C Hamano
2009-01-06 20:36 ` [PATCH/RFC v2 3/4] create_directories() inside entry.c: only check each directory once! Kjetil Barvik
2009-01-06 20:36 ` [PATCH/RFC v2 4/4] remove the old 'has_symlink_leading_path()' function Kjetil Barvik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).