* git-add has gone lstat() mad @ 2007-03-30 19:55 Andy Parkins 2007-03-30 20:20 ` Andy Parkins 0 siblings, 1 reply; 10+ messages in thread From: Andy Parkins @ 2007-03-30 19:55 UTC (permalink / raw) To: git Hello, I was interested in trying out the GIT_WORK_DIR stuff, but ended up being unable to. The thing that stopped me is present in master as well: $ git --version git version 1.5.1.rc3.20.gaa453 $ cd $HOME $ git init $ git add .bashrc At this point the CPU pegs at 100% systime. An strace shows that git is calling lstat64() on every file in my home directory. I killed git before it scanned everything I've ever done. I only want to track one file; is git meant to scan every file in the directory even though I'm not adding any of them? Andy -- Dr Andy Parkins, M Eng (hons), MIET andyparkins@gmail.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git-add has gone lstat() mad 2007-03-30 19:55 git-add has gone lstat() mad Andy Parkins @ 2007-03-30 20:20 ` Andy Parkins 2007-03-31 1:38 ` Junio C Hamano 0 siblings, 1 reply; 10+ messages in thread From: Andy Parkins @ 2007-03-30 20:20 UTC (permalink / raw) To: git On Friday 2007, March 30, Andy Parkins wrote: > At this point the CPU pegs at 100% systime. An strace shows that git > is calling lstat64() on every file in my home directory. I killed > git before it scanned everything I've ever done. Okay. I've tracked down the culprit function, but I have no idea what the fix is. builtin-add.c:fill_directory() calls dir.c:read_directory() which calls dir.c:read_directory_recursive() I can't see why git feels that it has to recurse the entire subtree. It seems to be something to do with the gitignore stuff. Surely there is no need to use a recursive search when no directories are being added? If git-add were given file1 dir1/file2 dir2/dir3/file3 Then only the directories "."; "dir1/"; "dir2"; and "dir2/dir3" need checking for .gitignore files; and in none of those cases does the search need to be recursive. Andy -- Dr Andy Parkins, M Eng (hons), MIET andyparkins@gmail.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git-add has gone lstat() mad 2007-03-30 20:20 ` Andy Parkins @ 2007-03-31 1:38 ` Junio C Hamano 2007-03-31 3:39 ` Linus Torvalds 0 siblings, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2007-03-31 1:38 UTC (permalink / raw) To: Andy Parkins; +Cc: git Andy Parkins <andyparkins@gmail.com> writes: > I can't see why git feels that it has to recurse the entire subtree. It > seems to be something to do with the gitignore stuff. Surely there is > no need to use a recursive search when no directories are being added? I do not think this is anything new. > If git-add were given > > file1 > dir1/file2 > dir2/dir3/file3 The thing is, you are not giving the above three pathnames, although you might think you are. You are giving three path *patterns* and asking git-add: "please run 'git ls-files --others' and add the ones that match these patterns". You can teach it to detect cases where you do not have wildcard (that is both shell glob wildcard and directory names; the latter means "grab everything in that named directory") to limit the set of directories to descend into. Patches are welcome, but applying them to 'master' needs to wait post 1.5.1. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git-add has gone lstat() mad 2007-03-31 1:38 ` Junio C Hamano @ 2007-03-31 3:39 ` Linus Torvalds 2007-03-31 10:18 ` Andy Parkins 2007-04-01 0:39 ` Junio C Hamano 0 siblings, 2 replies; 10+ messages in thread From: Linus Torvalds @ 2007-03-31 3:39 UTC (permalink / raw) To: Junio C Hamano; +Cc: Andy Parkins, Git Mailing List On Fri, 30 Mar 2007, Junio C Hamano wrote: > Andy Parkins <andyparkins@gmail.com> writes: > > > I can't see why git feels that it has to recurse the entire subtree. It > > seems to be something to do with the gitignore stuff. Surely there is > > no need to use a recursive search when no directories are being added? > > I do not think this is anything new. Yeah, it's probably old. That said, it's still ugly. > Patches are welcome, but applying them to 'master' needs to wait > post 1.5.1. Here's a patch. It passes all tests. It's not that complex. But people should double-check. ESPECIALLY the list of special characters (currently '?' '*' '\\' and '['). Andy, does it work for you? NOTE! It changes the "const char **pathspec" into an array of "struct path_simplify", because that way it doesn't need to check for the magic shell expansion characters over and over and over and over again, and can just do it once up-front. All the real meat of the patch is really that conversion, and somebody should double-check that I actually got all the special characters.. The way things are set up, you can now pass a "pathspec" to the "read_directory()" function. If you pass NULL, it acts exactly like it used to do (read everything). If you pass a non-NULL pointer, it will simplify it into a "these are the prefixes without any special characters", and stop any readdir() early if the path in question doesn't match any of the prefixes. NOTE! This does *not* obviate the need for the caller to do the *exact* pathspec match later. It's a first-level filter on "read_directory()", but it does not do the full pathspec thing. Maybe it should. But in the meantime, builtin-add.c really does need to do first read_directory(dir, .., pathspec); if (pathspec) prune_directory(dir, pathspec, baselen); ie the "prune_directory()" part will do the *exact* pathspec pruning, while the "read_directory()" will use the pathspec just to do some quick high-level pruning of the directories it will recurse into. Does this matter? I can say that: git ls-files -o Makefile~ on the kernel took 0.110s for me before, and it now takes 0.014s. And maybe Andy's case more noticeable. Andy? Linus --- builtin-add.c | 2 +- builtin-ls-files.c | 2 +- dir.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++++--- dir.h | 2 +- wt-status.c | 2 +- 5 files changed, 95 insertions(+), 9 deletions(-) diff --git a/builtin-add.c b/builtin-add.c index 9fcf514..871e23f 100644 --- a/builtin-add.c +++ b/builtin-add.c @@ -87,7 +87,7 @@ static void fill_directory(struct dir_struct *dir, const char **pathspec) } /* Read the directory and prune it */ - read_directory(dir, path, base, baselen); + read_directory(dir, path, base, baselen, pathspec); if (pathspec) prune_directory(dir, pathspec, baselen); } diff --git a/builtin-ls-files.c b/builtin-ls-files.c index 4e1d5af..74a6aca 100644 --- a/builtin-ls-files.c +++ b/builtin-ls-files.c @@ -216,7 +216,7 @@ static void show_files(struct dir_struct *dir, const char *prefix) if (baselen) path = base = prefix; - read_directory(dir, path, base, baselen); + read_directory(dir, path, base, baselen, pathspec); if (show_others) show_other_files(dir); if (show_killed) diff --git a/dir.c b/dir.c index b48e19d..f1cf278 100644 --- a/dir.c +++ b/dir.c @@ -8,6 +8,11 @@ #include "cache.h" #include "dir.h" +struct path_simplify { + int len; + const char *path; +}; + int common_prefix(const char **pathspec) { const char *path, *slash, *next; @@ -293,6 +298,31 @@ static int dir_exists(const char *dirname, int len) } /* + * This is an inexact early pruning of any recursive directory + * reading - if the path cannot possibly be in the pathspec, + * return true, and we'll skip it early. + */ +static int simplify_away(const char *path, int pathlen, const struct path_simplify *simplify) +{ + if (simplify) { + for (;;) { + const char *match = simplify->path; + int len = simplify->len; + + if (!match) + break; + if (len > pathlen) + len = pathlen; + if (!memcmp(path, match, len)) + return 0; + simplify++; + } + return 1; + } + return 0; +} + +/* * Read a directory tree. We currently ignore anything but * directories, regular files and symlinks. That's because git * doesn't handle them at all yet. Maybe that will change some @@ -301,7 +331,7 @@ static int dir_exists(const char *dirname, int len) * Also, we ignore the name ".git" (even if it is not a directory). * That likely will not change. */ -static int read_directory_recursive(struct dir_struct *dir, const char *path, const char *base, int baselen, int check_only) +static int read_directory_recursive(struct dir_struct *dir, const char *path, const char *base, int baselen, int check_only, const struct path_simplify *simplify) { DIR *fdir = opendir(path); int contents = 0; @@ -324,6 +354,8 @@ static int read_directory_recursive(struct dir_struct *dir, const char *path, co continue; len = strlen(de->d_name); memcpy(fullname + baselen, de->d_name, len+1); + if (simplify_away(fullname, baselen + len, simplify)) + continue; if (excluded(dir, fullname) != dir->show_ignored) { if (!dir->show_ignored || DTYPE(de) != DT_DIR) { continue; @@ -350,13 +382,13 @@ static int read_directory_recursive(struct dir_struct *dir, const char *path, co if (dir->hide_empty_directories && !read_directory_recursive(dir, fullname, fullname, - baselen + len, 1)) + baselen + len, 1, simplify)) continue; break; } contents += read_directory_recursive(dir, - fullname, fullname, baselen + len, 0); + fullname, fullname, baselen + len, 0, simplify); continue; case DT_REG: case DT_LNK: @@ -386,8 +418,61 @@ static int cmp_name(const void *p1, const void *p2) e2->name, e2->len); } -int read_directory(struct dir_struct *dir, const char *path, const char *base, int baselen) +/* + * Return the length of the "simple" part of a path match limiter. + */ +static int simple_length(const char *match) { + const char special[256] = { + [0] = 1, ['?'] = 1, + ['\\'] = 1, ['*'] = 1, + ['['] = 1 + }; + int len = -1; + + for (;;) { + unsigned char c = *match++; + len++; + if (special[c]) + return len; + } +} + +static struct path_simplify *create_simplify(const char **pathspec) +{ + int nr, alloc = 0; + struct path_simplify *simplify = NULL; + + if (!pathspec) + return NULL; + + for (nr = 0 ; ; nr++) { + const char *match; + if (nr >= alloc) { + alloc = alloc_nr(alloc); + simplify = xrealloc(simplify, alloc * sizeof(*simplify)); + } + match = *pathspec++; + if (!match) + break; + simplify[nr].path = match; + simplify[nr].len = simple_length(match); + } + simplify[nr].path = NULL; + simplify[nr].len = 0; + return simplify; +} + +static void free_simplify(struct path_simplify *simplify) +{ + if (simplify) + free(simplify); +} + +int read_directory(struct dir_struct *dir, const char *path, const char *base, int baselen, const char **pathspec) +{ + struct path_simplify *simplify = create_simplify(pathspec); + /* * Make sure to do the per-directory exclude for all the * directories leading up to our base. @@ -414,7 +499,8 @@ int read_directory(struct dir_struct *dir, const char *path, const char *base, i } } - read_directory_recursive(dir, path, base, baselen, 0); + read_directory_recursive(dir, path, base, baselen, 0, simplify); + free_simplify(simplify); qsort(dir->entries, dir->nr, sizeof(struct dir_entry *), cmp_name); return dir->nr; } diff --git a/dir.h b/dir.h index 7233d65..33c31f2 100644 --- a/dir.h +++ b/dir.h @@ -48,7 +48,7 @@ extern int common_prefix(const char **pathspec); #define MATCHED_EXACTLY 3 extern int match_pathspec(const char **pathspec, const char *name, int namelen, int prefix, char *seen); -extern int read_directory(struct dir_struct *, const char *path, const char *base, int baselen); +extern int read_directory(struct dir_struct *, const char *path, const char *base, int baselen, const char **pathspec); extern int push_exclude_per_directory(struct dir_struct *, const char *, int); extern void pop_exclude_per_directory(struct dir_struct *, int); diff --git a/wt-status.c b/wt-status.c index a25632b..a055990 100644 --- a/wt-status.c +++ b/wt-status.c @@ -260,7 +260,7 @@ static void wt_status_print_untracked(struct wt_status *s) if (file_exists(x)) add_excludes_from_file(&dir, x); - read_directory(&dir, ".", "", 0); + read_directory(&dir, ".", "", 0, NULL); for(i = 0; i < dir.nr; i++) { /* check for matching entry, which is unmerged; lifted from * builtin-ls-files:show_other_files */ ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: git-add has gone lstat() mad 2007-03-31 3:39 ` Linus Torvalds @ 2007-03-31 10:18 ` Andy Parkins 2007-03-31 14:54 ` Randal L. Schwartz 2007-04-01 0:39 ` Junio C Hamano 1 sibling, 1 reply; 10+ messages in thread From: Andy Parkins @ 2007-03-31 10:18 UTC (permalink / raw) To: git; +Cc: Linus Torvalds, Junio C Hamano On Saturday 2007, March 31, Linus Torvalds wrote: > on the kernel took 0.110s for me before, and it now takes 0.014s. And > maybe Andy's case more noticeable. Andy? Blindingly fast. Previously, I never completed a git add .bashrc as it was taking so long. Now it's instant. This is back to true git form - I wasn't entirely sure git-add had done anything :-). git-status confirmed that it had worked successfully though. I've not done any extensive tests for regressions, but I've done cd $HOME git init git add .bashrc git add somedirectory/ And they work fine. So - it's works for me from me, and a big happy grin. Andy -- Dr Andy Parkins, M Eng (hons), MIET andyparkins@gmail.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git-add has gone lstat() mad 2007-03-31 10:18 ` Andy Parkins @ 2007-03-31 14:54 ` Randal L. Schwartz 2007-03-31 15:09 ` Andy Parkins 2007-03-31 19:28 ` Tom Prince 0 siblings, 2 replies; 10+ messages in thread From: Randal L. Schwartz @ 2007-03-31 14:54 UTC (permalink / raw) To: Andy Parkins; +Cc: git >>>>> "Andy" == Andy Parkins <andyparkins@gmail.com> writes: Andy> I've not done any extensive tests for regressions, but I've done Andy> cd $HOME Andy> git init Andy> git add .bashrc Andy> git add somedirectory/ Andy> And they work fine. So - it's works for me from me, and a big happy Andy> grin. Given that git currently doesn't maintain any metadata other than +x/-x, how are you maintaining the metadata for your homedir items? I know some schemes were discussed here in the past, but I'm curious as to what you settled on. For example, your .netrc file needs to be 600, but git won't track that. -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 <merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/> Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training! ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git-add has gone lstat() mad 2007-03-31 14:54 ` Randal L. Schwartz @ 2007-03-31 15:09 ` Andy Parkins 2007-03-31 19:28 ` Tom Prince 1 sibling, 0 replies; 10+ messages in thread From: Andy Parkins @ 2007-03-31 15:09 UTC (permalink / raw) To: git; +Cc: Randal L. Schwartz On Saturday 2007, March 31, Randal L. Schwartz wrote: > Given that git currently doesn't maintain any metadata other than > +x/-x, how are you maintaining the metadata for your homedir items? > I know some schemes were discussed here in the past, but I'm curious > as to what you settled on. For example, your .netrc file needs to be > 600, but git won't track that. I'm not maintaining anything as yet; the experiment so far consists of .bashrc only :-) Just storing my .inputrc, .bashrc, .vimrc and .gitconfig in a repository is probably going to give me all that I want. There aren't many other config files that I regularly update, and certainly not many that I update differently on different machines. However, I plan to use a script to wrap the git calls so that I don't have to type GIT_WORK_DIR and GIT_DIR for every file I want to track. I plan to make it so that if you run this script with a checkout or other working tree changing command then I will run a function within it that reads the metadata out of a file that is stored in the repository which will restore those settings. Ideally this functionality would be in hooks for pre-update-index and post-checkout-index or similar. But I have a feeling that the potential for conflicts makes it hard to do in reality. Andy -- Dr Andy Parkins, M Eng (hons), MIET andyparkins@gmail.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git-add has gone lstat() mad 2007-03-31 14:54 ` Randal L. Schwartz 2007-03-31 15:09 ` Andy Parkins @ 2007-03-31 19:28 ` Tom Prince 1 sibling, 0 replies; 10+ messages in thread From: Tom Prince @ 2007-03-31 19:28 UTC (permalink / raw) To: Randal L. Schwartz; +Cc: Andy Parkins, git On Sat, Mar 31, 2007 at 07:54:37AM -0700, Randal L. Schwartz wrote: > Given that git currently doesn't maintain any metadata other than +x/-x, how > are you maintaining the metadata for your homedir items? I know some > schemes > were discussed here in the past, but I'm curious as to what you settled on. > For example, your .netrc file needs to be 600, but git won't track that. I personally just don't track files that have sensitive data. I tend to have copies of my home dir on somewhat public servers, so I don't want that data copied around anyway. Tom ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git-add has gone lstat() mad 2007-03-31 3:39 ` Linus Torvalds 2007-03-31 10:18 ` Andy Parkins @ 2007-04-01 0:39 ` Junio C Hamano 2007-04-01 8:25 ` Andy Parkins 1 sibling, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2007-04-01 0:39 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andy Parkins, Git Mailing List Linus Torvalds <torvalds@linux-foundation.org> writes: > Here's a patch. It passes all tests. It's not that complex. But people > should double-check. ESPECIALLY the list of special characters (currently > '?' '*' '\\' and '['). I think the above is a good set. This is an optimization different from what I was thinking about. I was hoping that we do not even need to call into read_directory() if all the pathspec[] elements succeeds to lstat() and they are not directories; in such a case we can just stuff them to dir structure by hand, and use the remainder for directory walk. But I like this patch better. We need to look at .gitignore to warn about adding ignored files, so we cannot just stuff what are found to dir without checking if they are ignored. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git-add has gone lstat() mad 2007-04-01 0:39 ` Junio C Hamano @ 2007-04-01 8:25 ` Andy Parkins 0 siblings, 0 replies; 10+ messages in thread From: Andy Parkins @ 2007-04-01 8:25 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Linus Torvalds On Sunday 2007, April 01, Junio C Hamano wrote: > But I like this patch better. We need to look at .gitignore to > warn about adding ignored files, so we cannot just stuff what > are found to dir without checking if they are ignored. I needed the following needed on top of current pu: diff --git a/unpack-trees.c b/unpack-trees.c index fa36495..d4f7589 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -521,7 +521,7 @@ static void verify_clean_subdirectory(const char *path, const char *action, memset(&d, 0, sizeof(d)); if (o->dir) d.exclude_per_dir = o->dir->exclude_per_dir; - i = read_directory(&d, path, pathbuf, namelen+1); + i = read_directory(&d, path, pathbuf, namelen+1, NULL); if (i) die("Updating '%s' would lose untracked files in it", path); ^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-04-01 8:28 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-03-30 19:55 git-add has gone lstat() mad Andy Parkins 2007-03-30 20:20 ` Andy Parkins 2007-03-31 1:38 ` Junio C Hamano 2007-03-31 3:39 ` Linus Torvalds 2007-03-31 10:18 ` Andy Parkins 2007-03-31 14:54 ` Randal L. Schwartz 2007-03-31 15:09 ` Andy Parkins 2007-03-31 19:28 ` Tom Prince 2007-04-01 0:39 ` Junio C Hamano 2007-04-01 8:25 ` Andy Parkins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).