* Possible bug in git-completion.sh @ 2010-01-08 15:17 Jon Schewe 2010-01-08 15:40 ` Michael J Gruber 0 siblings, 1 reply; 28+ messages in thread From: Jon Schewe @ 2010-01-08 15:17 UTC (permalink / raw) To: spearce; +Cc: git If I create a directory "build" at the top of my git repository and then add it to .gitignore, git behaves as expected and ignores the build directory when checking status. Now git-completion.sh has some issues. I have GIT_PS1_SHOWUNTRACKEDFILES to "1", so that I will be notified when there are untracked files in my working directory. When I'm in the top-level directory my prompt looks like expected, no '%'. However if I change to the build directory I get a '%', even though git status shows no untracked files. I see that git-completion.sh is using git ls-files to check this and that function does indeed show output when in my build directory. So the question here: Is git-completion.sh using ls-files improperly or is ls-files behaving improperly? -- Jon Schewe | http://mtu.net/~jpschewe If you see an attachment named signature.asc, this is my digital signature. See http://www.gnupg.org for more information. For I am convinced that neither death nor life, neither angels nor demons, neither the present nor the future, nor any powers, neither height nor depth, nor anything else in all creation, will be able to separate us from the love of God that is in Christ Jesus our Lord. - Romans 8:38-39 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Possible bug in git-completion.sh 2010-01-08 15:17 Possible bug in git-completion.sh Jon Schewe @ 2010-01-08 15:40 ` Michael J Gruber 2010-01-08 16:24 ` Jeff King 0 siblings, 1 reply; 28+ messages in thread From: Michael J Gruber @ 2010-01-08 15:40 UTC (permalink / raw) To: Jon Schewe; +Cc: spearce, git Jon Schewe venit, vidit, dixit 08.01.2010 16:17: > If I create a directory "build" at the top of my git repository and then > add it to .gitignore, git behaves as expected and ignores the build > directory when checking status. Now git-completion.sh has some issues. I > have GIT_PS1_SHOWUNTRACKEDFILES to "1", so that I will be notified when > there are untracked files in my working directory. When I'm in the > top-level directory my prompt looks like expected, no '%'. However if I > change to the build directory I get a '%', even though git status shows > no untracked files. I see that git-completion.sh is using git ls-files > to check this and that function does indeed show output when in my build > directory. So the question here: Is git-completion.sh using ls-files > improperly or is ls-files behaving improperly? > Neither, but: output between status and ls-files is inconsistent. More specifically, different commands behave differently with respect to the treatment of subdirs. ls-files assumes "." implicitly, status does not. "git status ." should give you the same behavior is "git ls-files" in this regard. Michael ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Possible bug in git-completion.sh 2010-01-08 15:40 ` Michael J Gruber @ 2010-01-08 16:24 ` Jeff King 2010-01-08 16:38 ` Junio C Hamano 0 siblings, 1 reply; 28+ messages in thread From: Jeff King @ 2010-01-08 16:24 UTC (permalink / raw) To: Michael J Gruber; +Cc: Jon Schewe, spearce, git On Fri, Jan 08, 2010 at 04:40:26PM +0100, Michael J Gruber wrote: > Jon Schewe venit, vidit, dixit 08.01.2010 16:17: > > If I create a directory "build" at the top of my git repository and then > > add it to .gitignore, git behaves as expected and ignores the build > > directory when checking status. Now git-completion.sh has some issues. I > > have GIT_PS1_SHOWUNTRACKEDFILES to "1", so that I will be notified when > > there are untracked files in my working directory. When I'm in the > > top-level directory my prompt looks like expected, no '%'. However if I > > change to the build directory I get a '%', even though git status shows > > no untracked files. I see that git-completion.sh is using git ls-files > > to check this and that function does indeed show output when in my build > > directory. So the question here: Is git-completion.sh using ls-files > > improperly or is ls-files behaving improperly? > > > > Neither, but: output between status and ls-files is inconsistent. More > specifically, different commands behave differently with respect to the > treatment of subdirs. ls-files assumes "." implicitly, status does not. > "git status ." should give you the same behavior is "git ls-files" in > this regard. It doesn't, and I think there is a bug in ls-files. Try this: git init touch base-cruft mkdir subdir touch subdir/cruft echo subdir >.gitignore git status ;# shows gitignore and base-cruft git ls-files -o --exclude-standard ;# ditto cd subdir git status . ;# shows nothing, since everything in subdir is ignored git ls-files -o --exclude-standard ;# BUG: shows cruft Yes, ls-files operates in the subdirectory, which means it should not show cruft from the root (and it does not). But it should respect .gitignore directives going all the way back to the root, and it doesn't. -Peff ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Possible bug in git-completion.sh 2010-01-08 16:24 ` Jeff King @ 2010-01-08 16:38 ` Junio C Hamano 2010-01-08 16:41 ` Jeff King 0 siblings, 1 reply; 28+ messages in thread From: Junio C Hamano @ 2010-01-08 16:38 UTC (permalink / raw) To: Jeff King; +Cc: Michael J Gruber, Jon Schewe, spearce, git Jeff King <peff@peff.net> writes: > Try this: > > git init > touch base-cruft > mkdir subdir > touch subdir/cruft > echo subdir >.gitignore > git status ;# shows gitignore and base-cruft > git ls-files -o --exclude-standard ;# ditto > cd subdir > git status . ;# shows nothing, since everything in subdir is ignored > git ls-files -o --exclude-standard ;# BUG: shows cruft > > Yes, ls-files operates in the subdirectory, which means it should not > show cruft from the root (and it does not). But it should respect > .gitignore directives going all the way back to the root, and it > doesn't. Shouldn't the ls-files be run from the root with subdir/ as pathspec, if you wanted to do apples-to-apples comparison between it and status? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Possible bug in git-completion.sh 2010-01-08 16:38 ` Junio C Hamano @ 2010-01-08 16:41 ` Jeff King 2010-01-08 16:45 ` Junio C Hamano 0 siblings, 1 reply; 28+ messages in thread From: Jeff King @ 2010-01-08 16:41 UTC (permalink / raw) To: Junio C Hamano; +Cc: Michael J Gruber, Jon Schewe, spearce, git On Fri, Jan 08, 2010 at 08:38:39AM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > Try this: > > > > git init > > touch base-cruft > > mkdir subdir > > touch subdir/cruft > > echo subdir >.gitignore > > git status ;# shows gitignore and base-cruft > > git ls-files -o --exclude-standard ;# ditto > > cd subdir > > git status . ;# shows nothing, since everything in subdir is ignored > > git ls-files -o --exclude-standard ;# BUG: shows cruft > > > > Yes, ls-files operates in the subdirectory, which means it should not > > show cruft from the root (and it does not). But it should respect > > .gitignore directives going all the way back to the root, and it > > doesn't. > > Shouldn't the ls-files be run from the root with subdir/ as pathspec, if > you wanted to do apples-to-apples comparison between it and status? Well, yes, if you wanted to compare apples to apples. But actually, my point in showing "status" at all was to note that Michael's statement that they would be the same is wrong. But just looking at the ls-files output, do you not agree that there is a bug? Respecting gitignore means respecting _all_ gitignore files in the repository, not just the subset that you happen to be in. -Peff ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Possible bug in git-completion.sh 2010-01-08 16:41 ` Jeff King @ 2010-01-08 16:45 ` Junio C Hamano 2010-01-08 16:56 ` Junio C Hamano ` (2 more replies) 0 siblings, 3 replies; 28+ messages in thread From: Junio C Hamano @ 2010-01-08 16:45 UTC (permalink / raw) To: Jeff King; +Cc: Michael J Gruber, Jon Schewe, spearce, git Jeff King <peff@peff.net> writes: > Well, yes, if you wanted to compare apples to apples. But actually, my > point in showing "status" at all was to note that Michael's statement > that they would be the same is wrong. > > But just looking at the ls-files output, do you not agree that there is > a bug? If I agreed, I wouldn't have suggested _you_ to cd up and use pathspec, but instead would have suggested to patch ls-files to make it do so for you. You can see it as a feature that you can use to check what would happen if you stopped ignoring the directory from the higher level. With a patch to always cd-up and use pathspec, that will become impossible. Maybe nobody needs such a feature (I don't), in which case we can declare it as a bug. But I wasn't ready to do so myself when I wrote the message you are responding to, and I still am not. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Possible bug in git-completion.sh 2010-01-08 16:45 ` Junio C Hamano @ 2010-01-08 16:56 ` Junio C Hamano 2010-01-08 17:24 ` Jeff King 2010-01-08 17:21 ` Jeff King 2010-01-08 18:21 ` Junio C Hamano 2 siblings, 1 reply; 28+ messages in thread From: Junio C Hamano @ 2010-01-08 16:56 UTC (permalink / raw) To: Jeff King; +Cc: Michael J Gruber, Jon Schewe, spearce, git Junio C Hamano <gitster@pobox.com> writes: > You can see it as a feature that you can use to check what would happen > if you stopped ignoring the directory from the higher level. With a patch > to always cd-up and use pathspec, that will become impossible. I hate to say this, but I have a feeling that status might be what needs to be fixed instead. It isn't hard to imagine a use case where you don't want to be bothered by crufts in lower level directories when you are looking at the bigger picture (e.g. at the top of the hierarchy) but when you go to individual subdirectory you would want to see them. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Possible bug in git-completion.sh 2010-01-08 16:56 ` Junio C Hamano @ 2010-01-08 17:24 ` Jeff King 0 siblings, 0 replies; 28+ messages in thread From: Jeff King @ 2010-01-08 17:24 UTC (permalink / raw) To: Junio C Hamano; +Cc: Michael J Gruber, Jon Schewe, spearce, git On Fri, Jan 08, 2010 at 08:56:38AM -0800, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > > > You can see it as a feature that you can use to check what would happen > > if you stopped ignoring the directory from the higher level. With a patch > > to always cd-up and use pathspec, that will become impossible. > > I hate to say this, but I have a feeling that status might be what needs > to be fixed instead. It isn't hard to imagine a use case where you don't > want to be bothered by crufts in lower level directories when you are > looking at the bigger picture (e.g. at the top of the hierarchy) but when > you go to individual subdirectory you would want to see them. For people who have status.relativepaths off, status is still a global command. So I would hope that behavior would come out only when doing "git status .", or at the very least be disabled if status.relativepaths is set. Personally, I think having ignored files show up as untracked based on heuristics like cwd is just going to end up confusing people. -Peff ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Possible bug in git-completion.sh 2010-01-08 16:45 ` Junio C Hamano 2010-01-08 16:56 ` Junio C Hamano @ 2010-01-08 17:21 ` Jeff King 2010-01-08 18:21 ` Junio C Hamano 2 siblings, 0 replies; 28+ messages in thread From: Jeff King @ 2010-01-08 17:21 UTC (permalink / raw) To: Junio C Hamano; +Cc: Michael J Gruber, Jon Schewe, spearce, git On Fri, Jan 08, 2010 at 08:45:55AM -0800, Junio C Hamano wrote: > > But just looking at the ls-files output, do you not agree that there is > > a bug? > > If I agreed, I wouldn't have suggested _you_ to cd up and use pathspec, > but instead would have suggested to patch ls-files to make it do so for > you. Ah, I missed the subtlety there. > You can see it as a feature that you can use to check what would happen > if you stopped ignoring the directory from the higher level. With a patch > to always cd-up and use pathspec, that will become impossible. > > Maybe nobody needs such a feature (I don't), in which case we can declare > it as a bug. But I wasn't ready to do so myself when I wrote the message > you are responding to, and I still am not. That feature seems somewhat insane to me. If I wanted to know how things would look without gitignore, I would not have said --exclude-standard. However, I was wrong before that it ignores .gitignore. It doesn't. If you put "cruft" instead of "subdir" into gitignore in my previous example, it is correctly ignored. So it is sort of a "half-use gitignore", which you cannot accomplish any other way. I still think it's a bit crazy to have as the default behavior. But at least it's constrained to a plumbing command, which scripts can work around to get what they want. With the current behavior, that means the bash prompt code should be doing "git rev-parse --show-cdup --show-prefix" and moving to the toplevel. -Peff ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Possible bug in git-completion.sh 2010-01-08 16:45 ` Junio C Hamano 2010-01-08 16:56 ` Junio C Hamano 2010-01-08 17:21 ` Jeff King @ 2010-01-08 18:21 ` Junio C Hamano 2010-01-08 19:58 ` Junio C Hamano 2 siblings, 1 reply; 28+ messages in thread From: Junio C Hamano @ 2010-01-08 18:21 UTC (permalink / raw) To: Jeff King; +Cc: Michael J Gruber, Jon Schewe, spearce, git Junio C Hamano <gitster@pobox.com> writes: > Jeff King <peff@peff.net> writes: > >> Well, yes, if you wanted to compare apples to apples. But actually, my >> point in showing "status" at all was to note that Michael's statement >> that they would be the same is wrong. >> >> But just looking at the ls-files output, do you not agree that there is >> a bug? > > If I agreed, I wouldn't have suggested _you_ to cd up and use pathspec, > but instead would have suggested to patch ls-files to make it do so for > you. Nah, I should have checked the code. The implementation of ls-files does cd up and uses pathspec, so the intent is to apply higher level gitignore. It however feeds paths from the already ignored directories, which _is_ the real bug. For example, if you have .gitignore (ignores t/) t/ t/junk in your work tree, it will read .gitignore at the top level, and eventually end up feeding "t/junk" to dir.c::excluded_1(), which does: for (i = el->nr - 1; 0 <= i; i--) { struct exclude *x = el->excludes[i]; const char *exclude = x->pattern; int to_exclude = x->to_exclude; if (x->flags & EXC_FLAG_MUSTBEDIR) { if (*dtype == DT_UNKNOWN) *dtype = get_dtype(NULL, pathname, pathlen); if (*dtype != DT_DIR) continue; } ... where *x has "'t/' that must be directory". but the path "t/junk" does not match with "t/" and is not excluded by this rule; this part notices "t/junk" is not a directory, and continues without giving the later part a chance to intervene. Of course, the later part also is not prepared to do a "leading path" match, as the function is not meant to be fed entries from ignored "t/" directory in the first place. I think a proper fix should be in dir.c::read_directory() where it calls read_directory_recursive() without first checking if the directory itself should be ignored. read_directory_recursive() itself has a logic to see if the dirent it found is worth recursing into and a similar logic should be in the toplevel caller. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Possible bug in git-completion.sh 2010-01-08 18:21 ` Junio C Hamano @ 2010-01-08 19:58 ` Junio C Hamano 2010-01-08 23:01 ` [PATCH] ls-files: fix overeager pathspec optimization Junio C Hamano 0 siblings, 1 reply; 28+ messages in thread From: Junio C Hamano @ 2010-01-08 19:58 UTC (permalink / raw) To: Jeff King, Linus Torvalds; +Cc: Michael J Gruber, Jon Schewe, spearce, git Junio C Hamano <gitster@pobox.com> writes: > Nah, I should have checked the code. > > The implementation of ls-files does cd up and uses pathspec, so the intent > is to apply higher level gitignore. > > It however feeds paths from the already ignored directories, which _is_ > the real bug. > ... > I think a proper fix should be in dir.c::read_directory() where it calls > read_directory_recursive() without first checking if the directory itself > should be ignored. read_directory_recursive() itself has a logic to see > if the dirent it found is worth recursing into and a similar logic should > be in the toplevel caller. Actually doing less in fill_directory() turned out to be a simpler solution. builtin_add() cares about the return value from fill_directory() and performs prune_directory() optimization magic, and we may want to change it not to do so as well, but I didn't want to worry about too many things at once, so this version still runs the "common_prefix()" that forgets to take .gitignore at higher levels (or a $GIT_DIR/info/exclude entry that ignores the common prefix directory of given pathspecs) into account. Another possibility is to fix common_prefix() and make it walk the path it returns one level at a time from the top, making sure they are not ignored, and that would probably be a better fix, but at least this patch will give you a starting point and tests to check the result against. diff --git a/dir.c b/dir.c index d0999ba..56d3f60 100644 --- a/dir.c +++ b/dir.c @@ -53,7 +53,6 @@ static int common_prefix(const char **pathspec) int fill_directory(struct dir_struct *dir, const char **pathspec) { - const char *path; int len; /* @@ -61,13 +60,8 @@ int fill_directory(struct dir_struct *dir, const char **pathspec) * use that to optimize the directory walk */ len = common_prefix(pathspec); - path = ""; - - if (len) - path = xmemdupz(*pathspec, len); - /* Read the directory and prune it */ - read_directory(dir, path, len, pathspec); + read_directory(dir, "", 0, pathspec); return len; } diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh index c65bca8..17d1764 100755 --- a/t/t3001-ls-files-others-exclude.sh +++ b/t/t3001-ls-files-others-exclude.sh @@ -153,4 +153,42 @@ test_expect_success 'negated exclude matches can override previous ones' ' grep "^a.1" output ' +test_expect_success 'subdirectory ignore (setup)' ' + mkdir -p top/l1/l2 && + ( + cd top && + git init && + echo /.gitignore >.gitignore && + echo l1 >>.gitignore && + echo l2 >l1/.gitignore + ) +' + +test_expect_success 'subdirectory ignore (toplevel)' ' + ( + cd top && + git ls-files -o --exclude-standard + ) >actual && + >expect && + test_cmp expect actual +' + +test_expect_success 'subdirectory ignore (l1/l2)' ' + ( + cd top/l1/l2 && + git ls-files -o --exclude-standard + ) >actual && + >expect && + test_cmp expect actual +' + +test_expect_success 'subdirectory ignore (l1)' ' + ( + cd top/l1 && + git ls-files -o --exclude-standard + ) >actual && + >expect && + test_cmp expect actual +' + test_done ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH] ls-files: fix overeager pathspec optimization 2010-01-08 19:58 ` Junio C Hamano @ 2010-01-08 23:01 ` Junio C Hamano 2010-01-08 23:24 ` Linus Torvalds 0 siblings, 1 reply; 28+ messages in thread From: Junio C Hamano @ 2010-01-08 23:01 UTC (permalink / raw) To: Jeff King, Linus Torvalds; +Cc: Michael J Gruber, Jon Schewe, spearce, git Given pathspecs that share a common prefix, ls-files optimized its call into recursive directory reader by starting at the common prefix directory. If you have a directory "t" with an untracked file "t/junk" in it, but the top-level .gitignore file told us to ignore "t/", this resulted in an unexpected behaviour: $ git ls-files -o --exclude-standard $ cd t && git ls-files -o --exclude-standard junk Signed-off-by: Junio C Hamano <gitster@pobox.com> --- Junio C Hamano <gitster@pobox.com> writes: >> I think a proper fix should be in dir.c::read_directory() where it calls >> read_directory_recursive() without first checking if the directory itself >> should be ignored. read_directory_recursive() itself has a logic to see >> if the dirent it found is worth recursing into and a similar logic should >> be in the toplevel caller. So here it is. dir.c | 37 ++++++++++++++++++++++++++++++++++ t/t3001-ls-files-others-exclude.sh | 39 ++++++++++++++++++++++++++++++++++++ 2 files changed, 76 insertions(+), 0 deletions(-) diff --git a/dir.c b/dir.c index f9ac454..9316eb6 100644 --- a/dir.c +++ b/dir.c @@ -787,6 +787,40 @@ static void free_simplify(struct path_simplify *simplify) free(simplify); } +static int has_leading_ignored_dir(struct dir_struct *dir, + const char *path_, int len, + const struct path_simplify *simplify) +{ + int at, dtype = DT_DIR; + char path[PATH_MAX]; + + at = 0; + memcpy(path, path_, len); + while (1) { + char *cp; + path[at] = '\0'; + /* + * NOTE! NOTE! NOTE!: we might want to actually lstat(2) + * path[] to make sure it is a directory. + */ + if (excluded(dir, path, &dtype)) + return 1; + if (at < len) { + path[at] = path_[at]; + if (at < len && path[at] == '/') + at++; + } + if (len <= at) + break; + cp = memchr(path + at, '/', len - at); + if (!cp) + at = len; + else + at = cp - path; + } + return 0; +} + int read_directory(struct dir_struct *dir, const char *path, int len, const char **pathspec) { struct path_simplify *simplify; @@ -795,6 +829,9 @@ int read_directory(struct dir_struct *dir, const char *path, int len, const char return dir->nr; simplify = create_simplify(pathspec); + if (has_leading_ignored_dir(dir, path, len, simplify)) + return dir->nr; + read_directory_recursive(dir, path, len, 0, simplify); free_simplify(simplify); qsort(dir->entries, dir->nr, sizeof(struct dir_entry *), cmp_name); diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh index c65bca8..9e71260 100755 --- a/t/t3001-ls-files-others-exclude.sh +++ b/t/t3001-ls-files-others-exclude.sh @@ -153,4 +153,43 @@ test_expect_success 'negated exclude matches can override previous ones' ' grep "^a.1" output ' +test_expect_success 'subdirectory ignore (setup)' ' + mkdir -p top/l1/l2 && + ( + cd top && + git init && + echo /.gitignore >.gitignore && + echo l1 >>.gitignore && + echo l2 >l1/.gitignore && + >l1/l2/l1 + ) +' + +test_expect_success 'subdirectory ignore (toplevel)' ' + ( + cd top && + git ls-files -o --exclude-standard + ) >actual && + >expect && + test_cmp expect actual +' + +test_expect_success 'subdirectory ignore (l1/l2)' ' + ( + cd top/l1/l2 && + git ls-files -o --exclude-standard + ) >actual && + >expect && + test_cmp expect actual +' + +test_expect_success 'subdirectory ignore (l1)' ' + ( + cd top/l1 && + git ls-files -o --exclude-standard + ) >actual && + >expect && + test_cmp expect actual +' + test_done -- 1.6.6.209.g52296.dirty ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH] ls-files: fix overeager pathspec optimization 2010-01-08 23:01 ` [PATCH] ls-files: fix overeager pathspec optimization Junio C Hamano @ 2010-01-08 23:24 ` Linus Torvalds 2010-01-08 23:31 ` Junio C Hamano 2010-01-09 0:06 ` Junio C Hamano 0 siblings, 2 replies; 28+ messages in thread From: Linus Torvalds @ 2010-01-08 23:24 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Michael J Gruber, Jon Schewe, spearce, git On Fri, 8 Jan 2010, Junio C Hamano wrote: > > Given pathspecs that share a common prefix, ls-files optimized its call > into recursive directory reader by starting at the common prefix > directory. > > If you have a directory "t" with an untracked file "t/junk" in it, but the > top-level .gitignore file told us to ignore "t/", this resulted in an > unexpected behaviour: Ok, I'm not sure how "unexpected" this is, since arguably you are overriding the ignore file by _being_ in that directory (the same way index contents override ignore files), but I could go either way on that. Your patch looks fine, although I think you did this in a very odd way. > + at = 0; > + memcpy(path, path_, len); > + while (1) { > + char *cp; > + path[at] = '\0'; > + /* > + * NOTE! NOTE! NOTE!: we might want to actually lstat(2) > + * path[] to make sure it is a directory. > + */ > + if (excluded(dir, path, &dtype)) > + return 1; The above starts by testing the empty string, and then after that test it goes on to the next directory component. That is just _odd_. Wouldn't it be more natural to write the loop the other way around, ie _first_ look up the next directory component, and _then_ do the exclude processing for thoose components? Or is there some subtle reason I'm missing for actually checking the empty name? Linus ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] ls-files: fix overeager pathspec optimization 2010-01-08 23:24 ` Linus Torvalds @ 2010-01-08 23:31 ` Junio C Hamano 2010-01-09 0:06 ` Junio C Hamano 1 sibling, 0 replies; 28+ messages in thread From: Junio C Hamano @ 2010-01-08 23:31 UTC (permalink / raw) To: Linus Torvalds Cc: Junio C Hamano, Jeff King, Michael J Gruber, Jon Schewe, spearce, git Linus Torvalds <torvalds@linux-foundation.org> writes: > On Fri, 8 Jan 2010, Junio C Hamano wrote: >> >> Given pathspecs that share a common prefix, ls-files optimized its call >> into recursive directory reader by starting at the common prefix >> directory. >> >> If you have a directory "t" with an untracked file "t/junk" in it, but the >> top-level .gitignore file told us to ignore "t/", this resulted in an >> unexpected behaviour: > > Ok, I'm not sure how "unexpected" this is, since arguably you are > overriding the ignore file by _being_ in that directory (the same way > index contents override ignore files), but I could go either way on that. > > Your patch looks fine, although I think you did this in a very odd way. > >> + at = 0; >> + memcpy(path, path_, len); >> + while (1) { >> + char *cp; >> + path[at] = '\0'; >> + /* >> + * NOTE! NOTE! NOTE!: we might want to actually lstat(2) >> + * path[] to make sure it is a directory. >> + */ >> + if (excluded(dir, path, &dtype)) >> + return 1; > > The above starts by testing the empty string, and then after that test it > goes on to the next directory component. That is just _odd_. > > Wouldn't it be more natural to write the loop the other way around, ie > _first_ look up the next directory component, and _then_ do the exclude > processing for thoose components? > > Or is there some subtle reason I'm missing for actually checking the empty > name? No, just being paranoid in case somebody managed to .gitignore the top-level of the working tree ;-) ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] ls-files: fix overeager pathspec optimization 2010-01-08 23:24 ` Linus Torvalds 2010-01-08 23:31 ` Junio C Hamano @ 2010-01-09 0:06 ` Junio C Hamano 2010-01-09 0:24 ` Linus Torvalds 1 sibling, 1 reply; 28+ messages in thread From: Junio C Hamano @ 2010-01-09 0:06 UTC (permalink / raw) To: Linus Torvalds Cc: Junio C Hamano, Jeff King, Michael J Gruber, Jon Schewe, spearce, git Linus Torvalds <torvalds@linux-foundation.org> writes: > On Fri, 8 Jan 2010, Junio C Hamano wrote: >> >> Given pathspecs that share a common prefix, ls-files optimized its call >> into recursive directory reader by starting at the common prefix >> directory. >> >> If you have a directory "t" with an untracked file "t/junk" in it, but the >> top-level .gitignore file told us to ignore "t/", this resulted in an >> unexpected behaviour: > > Ok, I'm not sure how "unexpected" this is, since arguably you are > overriding the ignore file by _being_ in that directory (the same way > index contents override ignore files), but I could go either way on that. > > Your patch looks fine, although I think you did this in a very odd way. Actually, there is some funny interaction with "git add" I ran out of time to figure out what the right fix should be: $ git init new $ cd new $ mkdir t $ >kuzu $ >t/gomi $ echo t >.gitignore $ echo kuzu >>.gitignore $ git add kuzu The following paths are ignored... kuzu Use -f if you really want to add them. $ git add t/gomi Since "git add" uses the exact same codepath to find the untracked files that match the pathspecs, I expected the "fix" will make it complain about "t/gomi" being ignored. Not so. Actually, it does start ignoring t/gomi (the index does not have t/gomi after the above sequence with the patch), but lack of the error message makes it a rather unfortunate regression---it works as specified in the sense that ignored paths are not added to the index unless --forced, but it does so without telling the user about it. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] ls-files: fix overeager pathspec optimization 2010-01-09 0:06 ` Junio C Hamano @ 2010-01-09 0:24 ` Linus Torvalds 2010-01-09 0:54 ` Junio C Hamano 0 siblings, 1 reply; 28+ messages in thread From: Linus Torvalds @ 2010-01-09 0:24 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Michael J Gruber, Jon Schewe, spearce, git On Fri, 8 Jan 2010, Junio C Hamano wrote: > > Since "git add" uses the exact same codepath to find the untracked files > that match the pathspecs, I expected the "fix" will make it complain about > "t/gomi" being ignored. Not so. No. Since 't' is ignored, it won't even do a readdir() in there. And since it didn't do a readdir() in there, it won't find any files in there to add to the ignored list. And that's how the "git add" logic for "apparently ignored" files works: it looks at whether the list of ignored files is empty or not. > Actually, it does start ignoring t/gomi (the index does not have t/gomi > after the above sequence with the patch), but lack of the error message > makes it a rather unfortunate regression---it works as specified in the > sense that ignored paths are not added to the index unless --forced, but > it does so without telling the user about it. I have this memory that _used_ to have a per-filename flag in "git add" that checked if that particular filename component was used or not. But now it just looks at 'dir->ignored_nr' and 'dir->ignored[]'. [ Digging back in history.. Yes: commit e96980ef ] Linus ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] ls-files: fix overeager pathspec optimization 2010-01-09 0:24 ` Linus Torvalds @ 2010-01-09 0:54 ` Junio C Hamano 2010-01-09 1:07 ` Linus Torvalds 0 siblings, 1 reply; 28+ messages in thread From: Junio C Hamano @ 2010-01-09 0:54 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jeff King, Michael J Gruber, Jon Schewe, spearce, git Linus Torvalds <torvalds@linux-foundation.org> writes: > I have this memory that _used_ to have a per-filename flag in "git add" > that checked if that particular filename component was used or not. But > now it just looks at 'dir->ignored_nr' and 'dir->ignored[]'. Yes, and the previous patch wasn't adding what is ignored to the array, so here is a re-roll to fix that in addition to the fix to "should the loop start from checking an empty path?" issue you noticed. But I am starting to wonder if we might be better off restructuring read_directory_recursive(). Currently it assumes that the path it was given _must_ be of interest (i.e. not ignored) and runs excluded() on subdirectories it finds to make that same decision before recursing into them or skipping them. It might make more sense if it first checked if the path given by the caller should be ignored and act accordingly. If it is to be ignored, perhaps it will simply return without doing anything (normal case), or it will return but adds the path to ignored[] (DIR_COLLECT_IGNORED case), or it will recurse into itself but tell it that everything it finds is to be ignored. I dunno... -- >8 -- Subject: [PATCH (v3)] ls-files: fix overeager pathspec optimization Given pathspecs that share a common prefix, ls-files optimized its call into recursive directory reader by starting at the common prefix directory. If you have a directory "t" with an untracked file "t/junk" in it, but the top-level .gitignore file told us to ignore "t/", this resulted in: $ git ls-files -o --exclude-standard $ git ls-files -o --exclude-standard t/ t/junk $ git ls-files -o --exclude-standard t/junk t/junk $ cd t && git ls-files -o --exclude-standard junk We could argue that you are overriding the ignore file by giving a patchspec that matches or being in that directory, but it is somewhat unexpected. Worse yet, these behave differently: $ git ls-files -o --exclude-standard t/ . $ git ls-files -o --exclude-standard t/ t/junk This patch changes the optimization so that it notices when the common prefix directory that it starts reading from is an ignored one. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- dir.c | 36 +++++++++++++++++++++++++++++++++ t/t3001-ls-files-others-exclude.sh | 39 ++++++++++++++++++++++++++++++++++++ 2 files changed, 75 insertions(+), 0 deletions(-) diff --git a/dir.c b/dir.c index d0999ba..e8be909 100644 --- a/dir.c +++ b/dir.c @@ -778,6 +778,39 @@ static void free_simplify(struct path_simplify *simplify) free(simplify); } +static int has_leading_ignored_dir(struct dir_struct *dir, + const char *path_, int len, + const struct path_simplify *simplify) +{ + int dtype = DT_DIR; + char path[PATH_MAX], *cp = path; + + memcpy(path, path_, len); + while (1) { + char *next = memchr(cp, '/', path + len - cp); + int exclude; + + /* + * NOTE! NOTE! NOTE!: we might want to actually lstat(2) + * path[] to make sure it is a directory. + */ + if (next) + *next = '\0'; + exclude = excluded(dir, path, &dtype); + if (next) + *next = '/'; + if (exclude) { + if (dir->flags & DIR_COLLECT_IGNORED) + dir_add_ignored(dir, path, len); + return 1; + } + if (!next) + break; + cp = next + 1; + } + return 0; +} + int read_directory(struct dir_struct *dir, const char *path, int len, const char **pathspec) { struct path_simplify *simplify; @@ -786,6 +819,9 @@ int read_directory(struct dir_struct *dir, const char *path, int len, const char return dir->nr; simplify = create_simplify(pathspec); + if (has_leading_ignored_dir(dir, path, len, simplify)) + return dir->nr; + read_directory_recursive(dir, path, len, 0, simplify); free_simplify(simplify); qsort(dir->entries, dir->nr, sizeof(struct dir_entry *), cmp_name); diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh index c65bca8..9e71260 100755 --- a/t/t3001-ls-files-others-exclude.sh +++ b/t/t3001-ls-files-others-exclude.sh @@ -153,4 +153,43 @@ test_expect_success 'negated exclude matches can override previous ones' ' grep "^a.1" output ' +test_expect_success 'subdirectory ignore (setup)' ' + mkdir -p top/l1/l2 && + ( + cd top && + git init && + echo /.gitignore >.gitignore && + echo l1 >>.gitignore && + echo l2 >l1/.gitignore && + >l1/l2/l1 + ) +' + +test_expect_success 'subdirectory ignore (toplevel)' ' + ( + cd top && + git ls-files -o --exclude-standard + ) >actual && + >expect && + test_cmp expect actual +' + +test_expect_success 'subdirectory ignore (l1/l2)' ' + ( + cd top/l1/l2 && + git ls-files -o --exclude-standard + ) >actual && + >expect && + test_cmp expect actual +' + +test_expect_success 'subdirectory ignore (l1)' ' + ( + cd top/l1 && + git ls-files -o --exclude-standard + ) >actual && + >expect && + test_cmp expect actual +' + test_done -- 1.6.6.209.g52296.dirty ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH] ls-files: fix overeager pathspec optimization 2010-01-09 0:54 ` Junio C Hamano @ 2010-01-09 1:07 ` Linus Torvalds 2010-01-09 5:42 ` Jeff King 2010-01-09 7:16 ` Junio C Hamano 0 siblings, 2 replies; 28+ messages in thread From: Linus Torvalds @ 2010-01-09 1:07 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Michael J Gruber, Jon Schewe, spearce, git On Fri, 8 Jan 2010, Junio C Hamano wrote: > > Yes, and the previous patch wasn't adding what is ignored to the array, so > here is a re-roll to fix that in addition to the fix to "should the loop > start from checking an empty path?" issue you noticed. Ack. Looks ok to me, and I think it's a lot more obvious. > But I am starting to wonder if we might be better off restructuring > read_directory_recursive(). Currently it assumes that the path it was > given _must_ be of interest (i.e. not ignored) and runs excluded() on > subdirectories it finds to make that same decision before recursing into > them or skipping them. It might make more sense if it first checked if > the path given by the caller should be ignored and act accordingly. Hmm. I can't make myself care one way or the other, I have to admit. I assume you mean basically taking the path and using the first component of it _instead_ of doing a readdir() - and getting rid of the simplification up front? I agree that that should work. Would it be simpler and cleaner? Perhaps. I'd have to see both patches to be able to tell. I do admit that while I acked your patch, it sure ain't _pretty_ to do that special odd "has_leading_ignored_dir()" thing. Linus ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] ls-files: fix overeager pathspec optimization 2010-01-09 1:07 ` Linus Torvalds @ 2010-01-09 5:42 ` Jeff King 2010-01-09 7:16 ` Junio C Hamano 1 sibling, 0 replies; 28+ messages in thread From: Jeff King @ 2010-01-09 5:42 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, Michael J Gruber, Jon Schewe, spearce, git On Fri, Jan 08, 2010 at 05:07:46PM -0800, Linus Torvalds wrote: > > But I am starting to wonder if we might be better off restructuring > > read_directory_recursive(). Currently it assumes that the path it was > > given _must_ be of interest (i.e. not ignored) and runs excluded() on > > subdirectories it finds to make that same decision before recursing into > > them or skipping them. It might make more sense if it first checked if > > the path given by the caller should be ignored and act accordingly. > > Hmm. I can't make myself care one way or the other, I have to admit. I > assume you mean basically taking the path and using the first component of > it _instead_ of doing a readdir() - and getting rid of the simplification > up front? > > I agree that that should work. Would it be simpler and cleaner? Perhaps. > I'd have to see both patches to be able to tell. I do admit that while I > acked your patch, it sure ain't _pretty_ to do that special odd > "has_leading_ignored_dir()" thing. It would look something like this: diff --git a/dir.c b/dir.c index 3a8d3e6..306d354 100644 --- a/dir.c +++ b/dir.c @@ -811,12 +811,19 @@ static void free_simplify(struct path_simplify *simplify) int read_directory(struct dir_struct *dir, const char *path, int len, const char **pathspec) { struct path_simplify *simplify; + int d_type = DT_DIR; + int exclude; if (has_symlink_leading_path(path, len)) return dir->nr; simplify = create_simplify(pathspec); - read_directory_recursive(dir, path, len, 0, simplify); + exclude = excluded(dir, path, &d_type); + if (exclude && (dir->flags & DIR_COLLECT_IGNORED) && + in_pathspec(path, len, simplify)) + dir_add_ignored(dir, path, len); + if (!exclude || (dir->flags & DIR_SHOW_IGNORED)) + read_directory_recursive(dir, path, len, 0, simplify); free_simplify(simplify); qsort(dir->entries, dir->nr, sizeof(struct dir_entry *), cmp_name); qsort(dir->ignored, dir->ignored_nr, sizeof(struct dir_entry *), cmp_name); But unfortunately excluded() is not happy with the trailing slash on the path given to read_directory, so we also need on top: diff --git a/dir.c b/dir.c index 306d354..6045a84 100644 --- a/dir.c +++ b/dir.c @@ -813,12 +813,17 @@ int read_directory(struct dir_struct *dir, const char *path, int len, const char struct path_simplify *simplify; int d_type = DT_DIR; int exclude; + char *path_without_slash; if (has_symlink_leading_path(path, len)) return dir->nr; simplify = create_simplify(pathspec); - exclude = excluded(dir, path, &d_type); + path_without_slash = xstrdup(path); + if (path_without_slash[strlen(path_without_slash)-1] == '/') + path_without_slash[strlen(path_without_slash)-1] = '\0'; + exclude = excluded(dir, path_without_slash, &d_type); + free(path_without_slash); if (exclude && (dir->flags & DIR_COLLECT_IGNORED) && in_pathspec(path, len, simplify)) dir_add_ignored(dir, path, len); And that does fix the case that triggered this whole discussion, but I haven't tested thoroughly to make sure we are not adversely affecting other cases. -Peff ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH] ls-files: fix overeager pathspec optimization 2010-01-09 1:07 ` Linus Torvalds 2010-01-09 5:42 ` Jeff King @ 2010-01-09 7:16 ` Junio C Hamano 2010-01-09 7:35 ` [PATCH 1/4] t3001: test ls-files -o ignored/dir Junio C Hamano 2010-01-09 8:07 ` [PATCH] " Junio C Hamano 1 sibling, 2 replies; 28+ messages in thread From: Junio C Hamano @ 2010-01-09 7:16 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jeff King, Michael J Gruber, Jon Schewe, spearce, git Linus Torvalds <torvalds@linux-foundation.org> writes: > I'd have to see both patches to be able to tell. I do admit that while I > acked your patch, it sure ain't _pretty_ to do that special odd > "has_leading_ignored_dir()" thing. Revised patch (v4) series is coming shortly. ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 1/4] t3001: test ls-files -o ignored/dir 2010-01-09 7:16 ` Junio C Hamano @ 2010-01-09 7:35 ` Junio C Hamano 2010-01-09 7:35 ` [PATCH 2/4] read_directory_recursive(): refactor handling of a single path into a separate function Junio C Hamano ` (2 more replies) 2010-01-09 8:07 ` [PATCH] " Junio C Hamano 1 sibling, 3 replies; 28+ messages in thread From: Junio C Hamano @ 2010-01-09 7:35 UTC (permalink / raw) To: git; +Cc: Linus Torvalds, Jeff King, Michael J Gruber, Jon Schewe, spearce When you have "t" directory that is marked as ignored in the top-level .gitignore file (or $GIT_DIR/info/exclude), running $ git ls-files -o --exclude-standard from the top-level correctly excludes files in "t" directory, but any of the following: $ git ls-files -o --exclude-standard t/ $ cd t && git ls-files -o --exclude-standard would show untracked files in that directory. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- t/t3001-ls-files-others-exclude.sh | 39 ++++++++++++++++++++++++++++++++++++ 1 files changed, 39 insertions(+), 0 deletions(-) diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh index c65bca8..e3e4d71 100755 --- a/t/t3001-ls-files-others-exclude.sh +++ b/t/t3001-ls-files-others-exclude.sh @@ -153,4 +153,43 @@ test_expect_success 'negated exclude matches can override previous ones' ' grep "^a.1" output ' +test_expect_success 'subdirectory ignore (setup)' ' + mkdir -p top/l1/l2 && + ( + cd top && + git init && + echo /.gitignore >.gitignore && + echo l1 >>.gitignore && + echo l2 >l1/.gitignore && + >l1/l2/l1 + ) +' + +test_expect_success 'subdirectory ignore (toplevel)' ' + ( + cd top && + git ls-files -o --exclude-standard + ) >actual && + >expect && + test_cmp expect actual +' + +test_expect_success 'subdirectory ignore (l1/l2)' ' + ( + cd top/l1/l2 && + git ls-files -o --exclude-standard + ) >actual && + >expect && + test_cmp expect actual +' + +test_expect_failure 'subdirectory ignore (l1)' ' + ( + cd top/l1 && + git ls-files -o --exclude-standard + ) >actual && + >expect && + test_cmp expect actual +' + test_done -- 1.6.6.209.g52296.dirty ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 2/4] read_directory_recursive(): refactor handling of a single path into a separate function 2010-01-09 7:35 ` [PATCH 1/4] t3001: test ls-files -o ignored/dir Junio C Hamano @ 2010-01-09 7:35 ` Junio C Hamano 2010-01-09 7:35 ` [PATCH 3/4] read_directory(): further split treat_path() Junio C Hamano 2010-01-09 7:35 ` [PATCH 4/4] ls-files: fix overeager pathspec optimization Junio C Hamano 2 siblings, 0 replies; 28+ messages in thread From: Junio C Hamano @ 2010-01-09 7:35 UTC (permalink / raw) To: git; +Cc: Linus Torvalds, Jeff King, Michael J Gruber, Jon Schewe, spearce Primarily because I want to reuse it in a separate function later, but this de-dents a huge function by one tabstop which by itself is an improvement as well. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- dir.c | 153 ++++++++++++++++++++++++++++++++++++++--------------------------- 1 files changed, 90 insertions(+), 63 deletions(-) diff --git a/dir.c b/dir.c index d0999ba..dec8365 100644 --- a/dir.c +++ b/dir.c @@ -625,6 +625,84 @@ static int get_dtype(struct dirent *de, const char *path, int len) return dtype; } +enum path_treatment { + path_ignored, + path_handled, + path_recurse, +}; + +static enum path_treatment treat_path(struct dir_struct *dir, + struct dirent *de, + char *path, int path_max, + int baselen, + const struct path_simplify *simplify, + int *len) +{ + int dtype, exclude; + + if (is_dot_or_dotdot(de->d_name) || !strcmp(de->d_name, ".git")) + return path_ignored; + *len = strlen(de->d_name); + /* Ignore overly long pathnames! */ + if (*len + baselen + 8 > path_max) + return path_ignored; + memcpy(path + baselen, de->d_name, *len + 1); + *len += baselen; + if (simplify_away(path, *len, simplify)) + return path_ignored; + + dtype = DTYPE(de); + exclude = excluded(dir, path, &dtype); + if (exclude && (dir->flags & DIR_COLLECT_IGNORED) + && in_pathspec(path, *len, simplify)) + dir_add_ignored(dir, path, *len); + + /* + * Excluded? If we don't explicitly want to show + * ignored files, ignore it + */ + if (exclude && !(dir->flags & DIR_SHOW_IGNORED)) + return path_ignored; + + if (dtype == DT_UNKNOWN) + dtype = get_dtype(de, path, *len); + + /* + * Do we want to see just the ignored files? + * We still need to recurse into directories, + * even if we don't ignore them, since the + * directory may contain files that we do.. + */ + if (!exclude && (dir->flags & DIR_SHOW_IGNORED)) { + if (dtype != DT_DIR) + return path_ignored; + } + + switch (dtype) { + default: + return path_ignored; + case DT_DIR: + memcpy(path + *len, "/", 2); + (*len)++; + switch (treat_directory(dir, path, *len, simplify)) { + case show_directory: + if (exclude != !!(dir->flags + & DIR_SHOW_IGNORED)) + return path_ignored; + break; + case recurse_into_directory: + return path_recurse; + case ignore_directory: + return path_ignored; + } + break; + case DT_REG: + case DT_LNK: + break; + } + return path_handled; +} + /* * Read a directory tree. We currently ignore anything but * directories, regular files and symlinks. That's because git @@ -634,7 +712,10 @@ static int get_dtype(struct dirent *de, const char *path, int len) * Also, we ignore the name ".git" (even if it is not a directory). * That likely will not change. */ -static int read_directory_recursive(struct dir_struct *dir, const char *base, int baselen, int check_only, const struct path_simplify *simplify) +static int read_directory_recursive(struct dir_struct *dir, + const char *base, int baselen, + int check_only, + const struct path_simplify *simplify) { DIR *fdir = opendir(*base ? base : "."); int contents = 0; @@ -645,70 +726,16 @@ static int read_directory_recursive(struct dir_struct *dir, const char *base, in memcpy(path, base, baselen); while ((de = readdir(fdir)) != NULL) { - int len, dtype; - int exclude; - - if (is_dot_or_dotdot(de->d_name) || - !strcmp(de->d_name, ".git")) - continue; - len = strlen(de->d_name); - /* Ignore overly long pathnames! */ - if (len + baselen + 8 > sizeof(path)) + int len; + switch (treat_path(dir, de, path, sizeof(path), + baselen, simplify, &len)) { + case path_recurse: + contents += read_directory_recursive + (dir, path, len, 0, simplify); continue; - memcpy(path + baselen, de->d_name, len+1); - len = baselen + len; - if (simplify_away(path, len, simplify)) + case path_ignored: continue; - - dtype = DTYPE(de); - exclude = excluded(dir, path, &dtype); - if (exclude && (dir->flags & DIR_COLLECT_IGNORED) - && in_pathspec(path, len, simplify)) - dir_add_ignored(dir, path,len); - - /* - * Excluded? If we don't explicitly want to show - * ignored files, ignore it - */ - if (exclude && !(dir->flags & DIR_SHOW_IGNORED)) - continue; - - if (dtype == DT_UNKNOWN) - dtype = get_dtype(de, path, len); - - /* - * Do we want to see just the ignored files? - * We still need to recurse into directories, - * even if we don't ignore them, since the - * directory may contain files that we do.. - */ - if (!exclude && (dir->flags & DIR_SHOW_IGNORED)) { - if (dtype != DT_DIR) - continue; - } - - switch (dtype) { - default: - continue; - case DT_DIR: - memcpy(path + len, "/", 2); - len++; - switch (treat_directory(dir, path, len, simplify)) { - case show_directory: - if (exclude != !!(dir->flags - & DIR_SHOW_IGNORED)) - continue; - break; - case recurse_into_directory: - contents += read_directory_recursive(dir, - path, len, 0, simplify); - continue; - case ignore_directory: - continue; - } - break; - case DT_REG: - case DT_LNK: + case path_handled: break; } contents++; -- 1.6.6.209.g52296.dirty ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 3/4] read_directory(): further split treat_path() 2010-01-09 7:35 ` [PATCH 1/4] t3001: test ls-files -o ignored/dir Junio C Hamano 2010-01-09 7:35 ` [PATCH 2/4] read_directory_recursive(): refactor handling of a single path into a separate function Junio C Hamano @ 2010-01-09 7:35 ` Junio C Hamano 2010-01-09 7:35 ` [PATCH 4/4] ls-files: fix overeager pathspec optimization Junio C Hamano 2 siblings, 0 replies; 28+ messages in thread From: Junio C Hamano @ 2010-01-09 7:35 UTC (permalink / raw) To: git; +Cc: Linus Torvalds, Jeff King, Michael J Gruber, Jon Schewe, spearce The next caller I'll be adding won't have an access to struct dirent because it won't be reading from a directory stream. Split the main part of the function further into a separate function to make it usable by a caller without passing a dirent as long as it knows what type is feeding the function. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- dir.c | 50 +++++++++++++++++++++++++++++--------------------- 1 files changed, 29 insertions(+), 21 deletions(-) diff --git a/dir.c b/dir.c index dec8365..35cc89b 100644 --- a/dir.c +++ b/dir.c @@ -631,28 +631,12 @@ enum path_treatment { path_recurse, }; -static enum path_treatment treat_path(struct dir_struct *dir, - struct dirent *de, - char *path, int path_max, - int baselen, - const struct path_simplify *simplify, - int *len) +static enum path_treatment treat_one_path(struct dir_struct *dir, + char *path, int *len, + const struct path_simplify *simplify, + int dtype, struct dirent *de) { - int dtype, exclude; - - if (is_dot_or_dotdot(de->d_name) || !strcmp(de->d_name, ".git")) - return path_ignored; - *len = strlen(de->d_name); - /* Ignore overly long pathnames! */ - if (*len + baselen + 8 > path_max) - return path_ignored; - memcpy(path + baselen, de->d_name, *len + 1); - *len += baselen; - if (simplify_away(path, *len, simplify)) - return path_ignored; - - dtype = DTYPE(de); - exclude = excluded(dir, path, &dtype); + int exclude = excluded(dir, path, &dtype); if (exclude && (dir->flags & DIR_COLLECT_IGNORED) && in_pathspec(path, *len, simplify)) dir_add_ignored(dir, path, *len); @@ -703,6 +687,30 @@ static enum path_treatment treat_path(struct dir_struct *dir, return path_handled; } +static enum path_treatment treat_path(struct dir_struct *dir, + struct dirent *de, + char *path, int path_max, + int baselen, + const struct path_simplify *simplify, + int *len) +{ + int dtype; + + if (is_dot_or_dotdot(de->d_name) || !strcmp(de->d_name, ".git")) + return path_ignored; + *len = strlen(de->d_name); + /* Ignore overly long pathnames! */ + if (*len + baselen + 8 > path_max) + return path_ignored; + memcpy(path + baselen, de->d_name, *len + 1); + *len += baselen; + if (simplify_away(path, *len, simplify)) + return path_ignored; + + dtype = DTYPE(de); + return treat_one_path(dir, path, len, simplify, dtype, de); +} + /* * Read a directory tree. We currently ignore anything but * directories, regular files and symlinks. That's because git -- 1.6.6.209.g52296.dirty ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 4/4] ls-files: fix overeager pathspec optimization 2010-01-09 7:35 ` [PATCH 1/4] t3001: test ls-files -o ignored/dir Junio C Hamano 2010-01-09 7:35 ` [PATCH 2/4] read_directory_recursive(): refactor handling of a single path into a separate function Junio C Hamano 2010-01-09 7:35 ` [PATCH 3/4] read_directory(): further split treat_path() Junio C Hamano @ 2010-01-09 7:35 ` Junio C Hamano 2010-01-12 16:33 ` Jeff King 2 siblings, 1 reply; 28+ messages in thread From: Junio C Hamano @ 2010-01-09 7:35 UTC (permalink / raw) To: git; +Cc: Linus Torvalds, Jeff King, Michael J Gruber, Jon Schewe, spearce Given pathspecs that share a common prefix, ls-files optimized its call into recursive directory reader by starting at the common prefix directory. If you have a directory "t" with an untracked file "t/junk" in it, but the top-level .gitignore file told us to ignore "t/", this resulted in: $ git ls-files -o --exclude-standard $ git ls-files -o --exclude-standard t/ t/junk $ git ls-files -o --exclude-standard t/junk t/junk $ cd t && git ls-files -o --exclude-standard junk We could argue that you are overriding the ignore file by giving a patchspec that matches or being in that directory, but it is somewhat unexpected. Worse yet, these behave differently: $ git ls-files -o --exclude-standard t/ . $ git ls-files -o --exclude-standard t/ t/junk This patch changes the optimization so that it notices when the common prefix directory that it starts reading from is an ignored one. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- dir.c | 38 +++++++++++++++++++++++++++++++++++- t/t3001-ls-files-others-exclude.sh | 2 +- 2 files changed, 38 insertions(+), 2 deletions(-) diff --git a/dir.c b/dir.c index 35cc89b..00d698d 100644 --- a/dir.c +++ b/dir.c @@ -813,6 +813,41 @@ static void free_simplify(struct path_simplify *simplify) free(simplify); } +static int treat_leading_path(struct dir_struct *dir, + const char *path, int len, + const struct path_simplify *simplify) +{ + char pathbuf[PATH_MAX]; + int baselen, blen; + const char *cp; + + while (len && path[len - 1] == '/') + len--; + if (!len) + return 1; + baselen = 0; + while (1) { + cp = path + baselen + !!baselen; + cp = memchr(cp, '/', path + len - cp); + if (!cp) + baselen = len; + else + baselen = cp - path; + memcpy(pathbuf, path, baselen); + pathbuf[baselen] = '\0'; + if (!is_directory(pathbuf)) + return 0; + if (simplify_away(pathbuf, baselen, simplify)) + return 0; + blen = baselen; + if (treat_one_path(dir, pathbuf, &blen, simplify, + DT_DIR, NULL) == path_ignored) + return 0; /* do not recurse into it */ + if (len <= baselen) + return 1; /* finished checking */ + } +} + int read_directory(struct dir_struct *dir, const char *path, int len, const char **pathspec) { struct path_simplify *simplify; @@ -821,7 +856,8 @@ int read_directory(struct dir_struct *dir, const char *path, int len, const char return dir->nr; simplify = create_simplify(pathspec); - read_directory_recursive(dir, path, len, 0, simplify); + if (!len || treat_leading_path(dir, path, len, simplify)) + read_directory_recursive(dir, path, len, 0, simplify); free_simplify(simplify); qsort(dir->entries, dir->nr, sizeof(struct dir_entry *), cmp_name); qsort(dir->ignored, dir->ignored_nr, sizeof(struct dir_entry *), cmp_name); diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh index e3e4d71..9e71260 100755 --- a/t/t3001-ls-files-others-exclude.sh +++ b/t/t3001-ls-files-others-exclude.sh @@ -183,7 +183,7 @@ test_expect_success 'subdirectory ignore (l1/l2)' ' test_cmp expect actual ' -test_expect_failure 'subdirectory ignore (l1)' ' +test_expect_success 'subdirectory ignore (l1)' ' ( cd top/l1 && git ls-files -o --exclude-standard -- 1.6.6.209.g52296.dirty ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 4/4] ls-files: fix overeager pathspec optimization 2010-01-09 7:35 ` [PATCH 4/4] ls-files: fix overeager pathspec optimization Junio C Hamano @ 2010-01-12 16:33 ` Jeff King 0 siblings, 0 replies; 28+ messages in thread From: Jeff King @ 2010-01-12 16:33 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Linus Torvalds, Michael J Gruber, Jon Schewe, spearce On Fri, Jan 08, 2010 at 11:35:35PM -0800, Junio C Hamano wrote: > This patch changes the optimization so that it notices when the common > prefix directory that it starts reading from is an ignored one. Having just produced the similar but more ugly and messy patch earlier in the thread, this series looks right to me. -Peff ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] ls-files: fix overeager pathspec optimization 2010-01-09 7:16 ` Junio C Hamano 2010-01-09 7:35 ` [PATCH 1/4] t3001: test ls-files -o ignored/dir Junio C Hamano @ 2010-01-09 8:07 ` Junio C Hamano 2010-01-09 18:05 ` Linus Torvalds 1 sibling, 1 reply; 28+ messages in thread From: Junio C Hamano @ 2010-01-09 8:07 UTC (permalink / raw) To: Linus Torvalds, Jeff King; +Cc: Michael J Gruber, Jon Schewe, spearce, git Junio C Hamano <gitster@pobox.com> writes: > Linus Torvalds <torvalds@linux-foundation.org> writes: > >> I'd have to see both patches to be able to tell. I do admit that while I >> acked your patch, it sure ain't _pretty_ to do that special odd >> "has_leading_ignored_dir()" thing. > > Revised patch (v4) series is coming shortly. Having sent these patches, I am wondering if the simplest fix might be this one-liner. diff --git a/dir.c b/dir.c index d0999ba..7fba335 100644 --- a/dir.c +++ b/dir.c @@ -788,3 +788,3 @@ int read_directory(struct dir_struct *dir, const char *path, int len, const char simplify = create_simplify(pathspec); - read_directory_recursive(dir, path, len, 0, simplify); + read_directory_recursive(dir, "", 0, 0, simplify); free_simplify(simplify); What my series does is to keep all the "depending on dir->flags and what excluded() says for path, decide to recurse, add it to dir->ignored[], etc." logic and optimize only the readdir() loop, pretending as if it returned only the entry on the "common prefix" path and nothing else, as we know all other paths will be skipped by either simplified away or filtered by in_pathspec() check. If the directories we are reading are not humongous, maybe using this much simpler patch might be preferrable (although it is completely untested). ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH] ls-files: fix overeager pathspec optimization 2010-01-09 8:07 ` [PATCH] " Junio C Hamano @ 2010-01-09 18:05 ` Linus Torvalds 2010-01-10 6:31 ` Junio C Hamano 0 siblings, 1 reply; 28+ messages in thread From: Linus Torvalds @ 2010-01-09 18:05 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Michael J Gruber, Jon Schewe, spearce, git On Sat, 9 Jan 2010, Junio C Hamano wrote: > > Having sent these patches, I am wondering if the simplest fix might be this > one-liner. > > diff --git a/dir.c b/dir.c > index d0999ba..7fba335 100644 > --- a/dir.c > +++ b/dir.c > @@ -788,3 +788,3 @@ int read_directory(struct dir_struct *dir, const char *path, int len, const char > simplify = create_simplify(pathspec); > - read_directory_recursive(dir, path, len, 0, simplify); > + read_directory_recursive(dir, "", 0, 0, simplify); > free_simplify(simplify); That one-liner doesn't work at all for me. Lookie here: [torvalds@nehalem linux]$ touch drivers/char/hello.c [torvalds@nehalem linux]$ ~/git/git ls-files --exclude-standard -o drivers/char [torvalds@nehalem linux]$ git ls-files --exclude-standard -o drivers/char drivers/char/hello.c where that ~/git/git is the version with the one-liner. IOW, it now ignores _everything_, because the dir and the path don't match. Linus ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] ls-files: fix overeager pathspec optimization 2010-01-09 18:05 ` Linus Torvalds @ 2010-01-10 6:31 ` Junio C Hamano 0 siblings, 0 replies; 28+ messages in thread From: Junio C Hamano @ 2010-01-10 6:31 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jeff King, Michael J Gruber, Jon Schewe, spearce, git Linus Torvalds <torvalds@linux-foundation.org> writes: > On Sat, 9 Jan 2010, Junio C Hamano wrote: >> >> Having sent these patches, I am wondering if the simplest fix might be this >> one-liner. >> >> diff --git a/dir.c b/dir.c >> index d0999ba..7fba335 100644 >> --- a/dir.c >> +++ b/dir.c >> @@ -788,3 +788,3 @@ int read_directory(struct dir_struct *dir, const char *path, int len, const char >> simplify = create_simplify(pathspec); >> - read_directory_recursive(dir, path, len, 0, simplify); >> + read_directory_recursive(dir, "", 0, 0, simplify); >> free_simplify(simplify); > > That one-liner doesn't work at all for me. > ... > IOW, it now ignores _everything_, because the dir and the path don't > match. Hmph, you are right. The real series is not equivalent to the one-liner at all. ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2010-01-12 16:33 UTC | newest] Thread overview: 28+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-01-08 15:17 Possible bug in git-completion.sh Jon Schewe 2010-01-08 15:40 ` Michael J Gruber 2010-01-08 16:24 ` Jeff King 2010-01-08 16:38 ` Junio C Hamano 2010-01-08 16:41 ` Jeff King 2010-01-08 16:45 ` Junio C Hamano 2010-01-08 16:56 ` Junio C Hamano 2010-01-08 17:24 ` Jeff King 2010-01-08 17:21 ` Jeff King 2010-01-08 18:21 ` Junio C Hamano 2010-01-08 19:58 ` Junio C Hamano 2010-01-08 23:01 ` [PATCH] ls-files: fix overeager pathspec optimization Junio C Hamano 2010-01-08 23:24 ` Linus Torvalds 2010-01-08 23:31 ` Junio C Hamano 2010-01-09 0:06 ` Junio C Hamano 2010-01-09 0:24 ` Linus Torvalds 2010-01-09 0:54 ` Junio C Hamano 2010-01-09 1:07 ` Linus Torvalds 2010-01-09 5:42 ` Jeff King 2010-01-09 7:16 ` Junio C Hamano 2010-01-09 7:35 ` [PATCH 1/4] t3001: test ls-files -o ignored/dir Junio C Hamano 2010-01-09 7:35 ` [PATCH 2/4] read_directory_recursive(): refactor handling of a single path into a separate function Junio C Hamano 2010-01-09 7:35 ` [PATCH 3/4] read_directory(): further split treat_path() Junio C Hamano 2010-01-09 7:35 ` [PATCH 4/4] ls-files: fix overeager pathspec optimization Junio C Hamano 2010-01-12 16:33 ` Jeff King 2010-01-09 8:07 ` [PATCH] " Junio C Hamano 2010-01-09 18:05 ` Linus Torvalds 2010-01-10 6:31 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).