* git stash takes excessively long when many untracked files present @ 2013-08-10 21:44 Josh Triplett 2013-08-13 10:11 ` Anders Darander 0 siblings, 1 reply; 15+ messages in thread From: Josh Triplett @ 2013-08-10 21:44 UTC (permalink / raw) To: git; +Cc: Petr Baudis, Junio C Hamano [CCing folks involved in the recent "stash-refuse-to-kill" merge.] I keep portions of my home directory in git. I tried to "git stash" some local changes, and it ran for several minutes with no progress. ps showed that it was running "git ls-files --killed", which was taking 100% CPU, and occasionally reading the disk very slowly. strace shows that git ls-files --killed is doing a full recursive enumeration of my entire home directory. That's a Really Bad Idea: ~$ find | wc -l 3248997 ~$ find -type d | wc -l 350680 Not only that, but it also appears to be attempting to stat and open several files in every single directory; for instance: stat(".ccache/1/3/.git", 0x7fff254bc7a0) = -1 ENOENT (No such file or directory) open(".ccache/1/3/.git/HEAD", O_RDONLY) = -1 ENOENT (No such file or directory) stat(".ccache/1/3/.git", 0x7fff254bc770) = -1 ENOENT (No such file or directory) open(".ccache/1/3/.git/packed-refs", O_RDONLY) = -1 ENOENT (No such file or directory) (Yes, in that order.) I see a lot of room for optimization here. Most importantly, git ls-files --killed really doesn't need to look at any directory entry unless something in the index would conflict with it. - Josh Triplett ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: git stash takes excessively long when many untracked files present 2013-08-10 21:44 git stash takes excessively long when many untracked files present Josh Triplett @ 2013-08-13 10:11 ` Anders Darander 2013-08-13 17:07 ` Junio C Hamano 0 siblings, 1 reply; 15+ messages in thread From: Anders Darander @ 2013-08-13 10:11 UTC (permalink / raw) To: git Josh Triplett <josh <at> joshtriplett.org> writes: > [CCing folks involved in the recent "stash-refuse-to-kill" merge.] > > I keep portions of my home directory in git. I tried to "git stash" > some local changes, and it ran for several minutes with no progress. ps > showed that it was running "git ls-files --killed", which was taking > 100% CPU, and occasionally reading the disk very slowly. I've recently got the same problem, though in this case it's my openembedded directory that's giving me those problems. (Having an untracked build-directory of quiet a few GB takes some time). I worked around it by locally patching git-stash: ------------------------------------------- diff --git a/git-stash.sh b/git-stash.sh index 85c9e2c..e5a2043 100755 --- a/git-stash.sh +++ b/git-stash.sh @@ -263,7 +263,7 @@ save_stash () { exit 0 fi if test -z "$untracked$force" && - test -n "$(git ls-files --killed | head -n 1)" + test -n "$(git ls-files --killed --directory | head -n 1)" then say "$(gettext "The following untracked files would NOT be saved test -n "$GIT_QUIET" || git ls-files --killed | sed 's/^/\t/' ------------------------------------------- It seems to work in my extremely limited testing. Though, I'm pretty sure that there'll be quite a few error cases... (Especially, as I just made a naive attempt at patching git-stash, so I could go on with a few other things). Do anyone have any better idea on how to approach this? > strace shows that git ls-files --killed is doing a full recursive > enumeration of my entire home directory. That's a Really Bad Idea: > > ~$ find | wc -l > 3248997 > ~$ find -type d | wc -l > 350680 > > Not only that, but it also appears to be attempting to stat and open > several files in every single directory; for instance: > > stat(".ccache/1/3/.git", 0x7fff254bc7a0) = -1 ENOENT (No such file or directory) > open(".ccache/1/3/.git/HEAD", O_RDONLY) = -1 ENOENT (No such file or directory) > stat(".ccache/1/3/.git", 0x7fff254bc770) = -1 ENOENT (No such file or directory) > open(".ccache/1/3/.git/packed-refs", O_RDONLY) = -1 ENOENT (No such file or directory) > > (Yes, in that order.) > > I see a lot of room for optimization here. Most importantly, git > ls-files --killed really doesn't need to look at any directory entry > unless something in the index would conflict with it. I guess that this would be a good optimization. Or, are ls-files --killed used in other cases where the current behaviour would be requiered? Cheers, Anders ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: git stash takes excessively long when many untracked files present 2013-08-13 10:11 ` Anders Darander @ 2013-08-13 17:07 ` Junio C Hamano 2013-08-13 17:36 ` Anders Darander 0 siblings, 1 reply; 15+ messages in thread From: Junio C Hamano @ 2013-08-13 17:07 UTC (permalink / raw) To: Anders Darander; +Cc: git Anders Darander <anders.darander@gmail.com> writes: > diff --git a/git-stash.sh b/git-stash.sh > index 85c9e2c..e5a2043 100755 > --- a/git-stash.sh > +++ b/git-stash.sh > @@ -263,7 +263,7 @@ save_stash () { > exit 0 > fi > if test -z "$untracked$force" && > - test -n "$(git ls-files --killed | head -n 1)" > + test -n "$(git ls-files --killed --directory | head -n 1)" > then > say "$(gettext "The following untracked files would NOT be > saved > test -n "$GIT_QUIET" || git ls-files --killed | sed > 's/^/\t/' > ------------------------------------------- > > It seems to work in my extremely limited testing. Though, I'm pretty sure > that there'll be quite a few error cases... (Especially, as I just made > a naive attempt at patching git-stash, so I could go on with a few other > things). I am not sure adding "--directory" there is safe. Aren't there cases where saving a stash and going back to the committed state will involve killing no directories, but some files? If your local change is to remove a directory and files in it from your working tree and then deposit a newly created file at the path where the directory was in the HEAD, stashing that local change and then going back to the HEAD will involve removing the new file from the working tree, and unless you have "git add"ed the new file, it will be lost. > Do anyone have any better idea on how to approach this? Teaching "ls-files" to leave early once it seens even a single output is probably a possibility. >> I see a lot of room for optimization here. Most importantly, git >> ls-files --killed really doesn't need to look at any directory entry >> unless something in the index would conflict with it. This observation probably is correct, even though I didn't think about it long enough. Thanks. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: git stash takes excessively long when many untracked files present 2013-08-13 17:07 ` Junio C Hamano @ 2013-08-13 17:36 ` Anders Darander 2013-08-13 17:52 ` Junio C Hamano 0 siblings, 1 reply; 15+ messages in thread From: Anders Darander @ 2013-08-13 17:36 UTC (permalink / raw) To: Junio C Hamano; +Cc: git (I'm resending this as Gmail added some html parts...) Junio C Hamano <gitster@pobox.com> wrote: >Anders Darander <anders.darander@gmail.com> writes: > >> diff --git a/git-stash.sh b/git-stash.sh >> index 85c9e2c..e5a2043 100755 >> --- a/git-stash.sh >> +++ b/git-stash.sh >> @@ -263,7 +263,7 @@ save_stash () { >> exit 0 >> fi >> if test -z "$untracked$force" && >> - test -n "$(git ls-files --killed | head -n 1)" >> + test -n "$(git ls-files --killed --directory | head -n 1)" >> then >> say "$(gettext "The following untracked files would >NOT be >> saved >> test -n "$GIT_QUIET" || git ls-files --killed | sed >> 's/^/\t/' >> ------------------------------------------- >> >> It seems to work in my extremely limited testing. Though, I'm pretty >sure >> that there'll be quite a few error cases... (Especially, as I just >made >> a naive attempt at patching git-stash, so I could go on with a few >other >> things). > >I am not sure adding "--directory" there is safe. Aren't there >cases where saving a stash and going back to the committed state >will involve killing no directories, but some files? If your local >change is to remove a directory and files in it from your working >tree and then deposit a newly created file at the path where the >directory was in the HEAD, stashing that local change and then going >back to the HEAD will involve removing the new file from the working >tree, and unless you have "git add"ed the new file, it will be lost. Yes, it's more than likely that there are some real issues with adding - -directory here. I just realised that in the specific case I needed to run stash, I could do that by adding either of -u or -f as options. Obviously, >> Do anyone have any better idea on how to approach this? > >Teaching "ls-files" to leave early once it seens even a single >output is probably a possibility. Would that mean that we're able to fail early? That's certainly an improvement, but not a working situation. In my case, running stash in an OpenEmbedded checkout (including untracked directories for builds and caches), I gave up waiting on stash running ls-files after running at 100% for more than 12 minutes. Git status returned after a couple of seconds. >>> I see a lot of room for optimization here. Most importantly, git >>> ls-files --killed really doesn't need to look at any directory entry >>> unless something in the index would conflict with it. > >This observation probably is correct, even though I didn't think >about it long enough. I'd think that something like this is needed. As the index should be rather small, compared to e.g. some large untracked directory. Cheers, Anders . ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: git stash takes excessively long when many untracked files present 2013-08-13 17:36 ` Anders Darander @ 2013-08-13 17:52 ` Junio C Hamano 2013-08-13 21:47 ` Junio C Hamano 0 siblings, 1 reply; 15+ messages in thread From: Junio C Hamano @ 2013-08-13 17:52 UTC (permalink / raw) To: Anders Darander; +Cc: git Anders Darander <anders.darander@gmail.com> writes: >>> Do anyone have any better idea on how to approach this? >> >>Teaching "ls-files" to leave early once it seens even a single >>output is probably a possibility. > > Would that mean that we're able to fail early? Heh, good point. "Leave once you find one path" does not help the most common "sane" case where you do not kill any path, so it does not help us at all. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: git stash takes excessively long when many untracked files present 2013-08-13 17:52 ` Junio C Hamano @ 2013-08-13 21:47 ` Junio C Hamano 2013-08-15 17:52 ` Junio C Hamano 0 siblings, 1 reply; 15+ messages in thread From: Junio C Hamano @ 2013-08-13 21:47 UTC (permalink / raw) To: Anders Darander; +Cc: git, Petr Baudis, Josh Triplett [administrivia: people on the original thread added back on CC] Junio C Hamano <gitster@pobox.com> writes: > Anders Darander <anders.darander@gmail.com> writes: > >>>> Do anyone have any better idea on how to approach this? >>> >>>Teaching "ls-files" to leave early once it seens even a single >>>output is probably a possibility. >> >> Would that mean that we're able to fail early? > > Heh, good point. "Leave once you find one path" does not help the > most common "sane" case where you do not kill any path, so it does > not help us at all. In any case, this is a regression introduced in 'master' since the last release, and the attempted fix was for an issue that has long been with us, so I'll revert a7365313 (git stash: avoid data loss when "git stash save" kills a directory, 2013-06-28) soon. For today's -rc3, I'm already deep into the integration cycle, so it is too late to do the revert it and then redo everything. Then we will plan to re-apply the patch once "ls-files --killed" gets fixed not to waste too much cycles needlessly, after the coming release. Thanks for a report and discussion. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: git stash takes excessively long when many untracked files present 2013-08-13 21:47 ` Junio C Hamano @ 2013-08-15 17:52 ` Junio C Hamano 2013-08-15 18:07 ` Josh Triplett 2013-08-15 19:47 ` Junio C Hamano 0 siblings, 2 replies; 15+ messages in thread From: Junio C Hamano @ 2013-08-15 17:52 UTC (permalink / raw) To: git; +Cc: Anders Darander, Petr Baudis, Josh Triplett Junio C Hamano <gitster@pobox.com> writes: > In any case, this is a regression introduced in 'master' since the > last release, and the attempted fix was for an issue that has long > been with us, so I'll revert a7365313 (git stash: avoid data loss > when "git stash save" kills a directory, 2013-06-28) soon. For > today's -rc3, I'm already deep into the integration cycle, so it is > too late to do the revert it and then redo everything. > > Then we will plan to re-apply the patch once "ls-files --killed" > gets fixed not to waste too much cycles needlessly, after the coming > release. I've already reverted the problematic patch to "git stash" and it will not be part of the upcoming release. Here is a quick attempt to see if we can do better in "ls-files -k". We have an existing test t3010.3 that tries all the combinations of directory turning into a regular file, symlink, etc. and vice versa, and it seems to pass. The test has a directory path6 in the working tree without any paths in it in the index, and the added bypass code seems to correctly trigger and prevents us from digging into that directory, so this patch may be sufficient to improve "ls-files -k". By the way, regarding the reverted commit, I do not think it is enough to ask "ls-files -k" to see if the state recorded in the current index is sufficient. Imagine your HEAD records "path" as a file and then you did this: $ git reset --hard ;# "path" is now a regular file $ mv path path.bak $ mkdir path $ mv path.bak path/file $ git add -A ;# "path/file" in the index and in the working tree $ >path/cruft ;# "path/cruft" in the working tree Then call "save_stash" without saving untracked. The resulting stash will save the contents of "path/file" but "path/cruft" is not recorded anywhere, and then we would need to bring the state in the working tree and the index back to the state recorded in HEAD, hence "path" needs to be turned back to a directory. But "ls-files -k" is asked to check with the index, which has the path as a directory, so this case is missed. So instead of test -n "$(git ls-files --killed | head -n 1)" in Pasky's patch, which probably is a right thing to do if you are running "git stash save --keep-index", you would need something like this if you are not running with "--keep-index": test -n "$( GIT_INDEX_FILE=tmp_index export GIT_INDEX_FILE git read-tree HEAD git ls-files -k )" in order to make sure that the result of going back to the state in the HEAD will not clobber leftover "path/cruft". builtin/ls-files.c | 2 ++ dir.c | 9 +++++++++ dir.h | 3 ++- 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/builtin/ls-files.c b/builtin/ls-files.c index 5cf3e31..8500446 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -219,6 +219,8 @@ static void show_files(struct dir_struct *dir) /* For cached/deleted files we don't need to even do the readdir */ if (show_others || show_killed) { + if (!show_others) + dir->flags |= DIR_COLLECT_KILLED_ONLY; fill_directory(dir, pathspec); if (show_others) show_other_files(dir); diff --git a/dir.c b/dir.c index 910bfcd..02939e2 100644 --- a/dir.c +++ b/dir.c @@ -1183,6 +1183,15 @@ static enum path_treatment treat_one_path(struct dir_struct *dir, cache_name_exists(path->buf, path->len, ignore_case)) return path_none; + /* + * A directory can only contain killed files if the index + * has a path that wants it to be a non-directory. + */ + if ((dir->flags & DIR_COLLECT_KILLED_ONLY) && + (dtype == DT_DIR) && + !cache_name_exists(path->buf, path->len, ignore_case)) + return path_none; + exclude = is_excluded(dir, path->buf, &dtype); /* diff --git a/dir.h b/dir.h index 3d6b80c..4677b86 100644 --- a/dir.h +++ b/dir.h @@ -80,7 +80,8 @@ struct dir_struct { DIR_HIDE_EMPTY_DIRECTORIES = 1<<2, DIR_NO_GITLINKS = 1<<3, DIR_COLLECT_IGNORED = 1<<4, - DIR_SHOW_IGNORED_TOO = 1<<5 + DIR_SHOW_IGNORED_TOO = 1<<5, + DIR_COLLECT_KILLED_ONLY = 1<<6 } flags; struct dir_entry **entries; struct dir_entry **ignored; ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: git stash takes excessively long when many untracked files present 2013-08-15 17:52 ` Junio C Hamano @ 2013-08-15 18:07 ` Josh Triplett 2013-08-15 18:58 ` Junio C Hamano 2013-08-15 19:47 ` Junio C Hamano 1 sibling, 1 reply; 15+ messages in thread From: Josh Triplett @ 2013-08-15 18:07 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Anders Darander, Petr Baudis On Thu, Aug 15, 2013 at 10:52:39AM -0700, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > > > In any case, this is a regression introduced in 'master' since the > > last release, and the attempted fix was for an issue that has long > > been with us, so I'll revert a7365313 (git stash: avoid data loss > > when "git stash save" kills a directory, 2013-06-28) soon. For > > today's -rc3, I'm already deep into the integration cycle, so it is > > too late to do the revert it and then redo everything. > > > > Then we will plan to re-apply the patch once "ls-files --killed" > > gets fixed not to waste too much cycles needlessly, after the coming > > release. > > I've already reverted the problematic patch to "git stash" and it > will not be part of the upcoming release. Thanks! > Here is a quick attempt to see if we can do better in "ls-files -k". > > We have an existing test t3010.3 that tries all the combinations of > directory turning into a regular file, symlink, etc. and vice versa, > and it seems to pass. The test has a directory path6 in the working > tree without any paths in it in the index, and the added bypass code > seems to correctly trigger and prevents us from digging into that > directory, so this patch may be sufficient to improve "ls-files -k". > > By the way, regarding the reverted commit, I do not think it is > enough to ask "ls-files -k" to see if the state recorded in the > current index is sufficient. Imagine your HEAD records "path" as a > file and then you did this: > > $ git reset --hard ;# "path" is now a regular file > $ mv path path.bak > $ mkdir path > $ mv path.bak path/file > $ git add -A ;# "path/file" in the index and in the working tree > $ >path/cruft ;# "path/cruft" in the working tree > > Then call "save_stash" without saving untracked. The resulting > stash will save the contents of "path/file" but "path/cruft" is not > recorded anywhere, and then we would need to bring the state in the > working tree and the index back to the state recorded in HEAD, hence > "path" needs to be turned back to a directory. > > But "ls-files -k" is asked to check with the index, which has the > path as a directory, so this case is missed. Since git stash resets to the state in HEAD, whatever --killed check it does needs to check against HEAD, yes. It still doesn't need to check any path that doesn't exist in HEAD, though; it makes more sense to drive this from the list of files in HEAD rather than from the list of files in the working directory, even with a filter applied to the latter to prune bits not in HEAD. > So instead of > > test -n "$(git ls-files --killed | head -n 1)" > > in Pasky's patch, which probably is a right thing to do if you are > running "git stash save --keep-index", you would need something like > this if you are not running with "--keep-index": > > test -n "$( > GIT_INDEX_FILE=tmp_index > export GIT_INDEX_FILE > git read-tree HEAD > git ls-files -k > )" > > in order to make sure that the result of going back to the state in > the HEAD will not clobber leftover "path/cruft". Sure, that works. However, wouldn't it make sense to just directly let git ls-files output to the screen, then test its return value (after adding some ls-files option to set the return value)? Since ls-files --killed will have no output if git stash can proceed, and since git stash should show the list of files that'd be killed before it fails, using the output directly makes sense. - Josh Triplett ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: git stash takes excessively long when many untracked files present 2013-08-15 18:07 ` Josh Triplett @ 2013-08-15 18:58 ` Junio C Hamano 0 siblings, 0 replies; 15+ messages in thread From: Junio C Hamano @ 2013-08-15 18:58 UTC (permalink / raw) To: Josh Triplett; +Cc: git, Anders Darander, Petr Baudis Josh Triplett <josh@joshtriplett.org> writes: >> I've already reverted the problematic patch to "git stash" and it >> will not be part of the upcoming release. > > Thanks! > >> Here is a quick attempt to see if we can do better in "ls-files -k". Having said that, I am curious if the result of applying the patch you are responding to, without reverting the "git stash" patch, is now usable in the working tree you earlier had trouble with. > Sure, that works. However, wouldn't it make sense to just directly let > git ls-files output to the screen, then test its return value (after > adding some ls-files option to set the return value)? Not really. We may want to add "exit early if we see even a single killed file" option to the command so that we can simplify the "are we going to abort" logic, but the error codepath that is executed after that decision is made is not performance critical, and may need more flexibility than always spewing everything that will be killed, which could be thousands of crufts. So I think using two separate invocations to "ls-files --killed" is a necessity anyway. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: git stash takes excessively long when many untracked files present 2013-08-15 17:52 ` Junio C Hamano 2013-08-15 18:07 ` Josh Triplett @ 2013-08-15 19:47 ` Junio C Hamano 2013-08-15 21:28 ` [PATCH 0/3] Optimizing "ls-files -k" Junio C Hamano 1 sibling, 1 reply; 15+ messages in thread From: Junio C Hamano @ 2013-08-15 19:47 UTC (permalink / raw) To: git; +Cc: Anders Darander, Petr Baudis, Josh Triplett Junio C Hamano <gitster@pobox.com> writes: > diff --git a/dir.c b/dir.c > index 910bfcd..02939e2 100644 > --- a/dir.c > +++ b/dir.c > @@ -1183,6 +1183,15 @@ static enum path_treatment treat_one_path(struct dir_struct *dir, > cache_name_exists(path->buf, path->len, ignore_case)) > return path_none; > > + /* > + * A directory can only contain killed files if the index > + * has a path that wants it to be a non-directory. > + */ > + if ((dir->flags & DIR_COLLECT_KILLED_ONLY) && > + (dtype == DT_DIR) && > + !cache_name_exists(path->buf, path->len, ignore_case)) > + return path_none; > + I think this is wrong. When we are looking at a directory P in the working tree, there are three cases: (1) P exists in the index. Everything inside the directory P in the working tree needs to go when P is checked out from the index. (2) P does not exist in the index, but there is P/Q in the index. We know P will stay a directory when we check out the contents of the index, but we do not know yet if there is a directory P/Q in the working tree to be killed, so we need to recurse. (3) P does not exist in the index, and there is no P/Q in the index to require P to be a directory, either. Only in this case, we know that everything inside P will not be killed without recursing. The patch will break with the second case, I think. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 0/3] Optimizing "ls-files -k" 2013-08-15 19:47 ` Junio C Hamano @ 2013-08-15 21:28 ` Junio C Hamano 2013-08-15 21:28 ` [PATCH 1/3] dir.c: use the cache_* macro to access the current index Junio C Hamano ` (3 more replies) 0 siblings, 4 replies; 15+ messages in thread From: Junio C Hamano @ 2013-08-15 21:28 UTC (permalink / raw) To: git; +Cc: Petr Baudis, Josh Triplett "ls-files -o" and "ls-files -k" both traverse the working tree down to find either all untracked paths or those that will be "killed" (removed from the working tree to make room) when the paths recorded in the index are checked out. It is necessary to traverse the working tree fully when enumerating all the "other" paths, but when we are only interested in "killed" paths, we can take advantage of the fact that paths that do not overlap with entries in the index can never be killed. The first one is an independent clean-up. No public API in the working tree traversal takes alternate in-core index, so there is no reason to explicitly use the_index and index_* functions from the in-core index API. The second one is rerolled from the "something like this" patch I sent earlier, but corrects the "we see a directory, it is not in the index, but a file in it is" case. And the third one adds a testcase that illustrates why the earlier "something like this" patch is not sufficient. These are designed to apply on top of v1.8.3, and needs a bit of conflict resolution for the upcoming v1.8.4 codebase; I'll queue them in 'pu' for now. Note that t3010, especially after merged to 'pu', will use many different ways to create a test file. Some redirect "date" into it, some redirect ":" into it, some "touch" it, and some just redirect with no command. date >file1 : >file2 touch file3 >file4 We should consolidate them all to just do ">file4" after making sure the contents do not matter (we kind of know it already, as "date" will output string that is not repeatable). Use of "touch" for anything other than updating the timestamp is especially bad, as it is misleading. Junio C Hamano (3): dir.c: use the cache_* macro to access the current index ls-files -k: a directory only can be killed if the index has a non-directory t3010: update to demonstrate "ls-files -k" optimization pitfalls builtin/ls-files.c | 2 ++ dir.c | 40 +++++++++++++++++++++++++++++-------- dir.h | 3 ++- t/t3010-ls-files-killed-modified.sh | 12 ++++++++--- 4 files changed, 45 insertions(+), 12 deletions(-) -- 1.8.4-rc3-232-ga8053f8 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/3] dir.c: use the cache_* macro to access the current index 2013-08-15 21:28 ` [PATCH 0/3] Optimizing "ls-files -k" Junio C Hamano @ 2013-08-15 21:28 ` Junio C Hamano 2013-08-15 21:28 ` [PATCH 2/3] ls-files -k: a directory only can be killed if the index has a non-directory Junio C Hamano ` (2 subsequent siblings) 3 siblings, 0 replies; 15+ messages in thread From: Junio C Hamano @ 2013-08-15 21:28 UTC (permalink / raw) To: git; +Cc: Petr Baudis, Josh Triplett These codepaths always start from the_index and use index_* functions, but there is no reason to do so. Use the compatibility cache_* macro to access the current in-core index like everybody else. While at it, fix typo in the comment for a function to check if a path within a directory appears in the index. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- dir.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/dir.c b/dir.c index a5926fb..2f82cd1 100644 --- a/dir.c +++ b/dir.c @@ -472,15 +472,14 @@ static void *read_skip_worktree_file_from_index(const char *path, size_t *size) unsigned long sz; enum object_type type; void *data; - struct index_state *istate = &the_index; len = strlen(path); - pos = index_name_pos(istate, path, len); + pos = cache_name_pos(path, len); if (pos < 0) return NULL; - if (!ce_skip_worktree(istate->cache[pos])) + if (!ce_skip_worktree(active_cache[pos])) return NULL; - data = read_sha1_file(istate->cache[pos]->sha1, &type, &sz); + data = read_sha1_file(active_cache[pos]->sha1, &type, &sz); if (!data || type != OBJ_BLOB) { free(data); return NULL; @@ -924,13 +923,13 @@ enum exist_status { }; /* - * Do not use the alphabetically stored index to look up + * Do not use the alphabetically sorted index to look up * the directory name; instead, use the case insensitive * name hash. */ static enum exist_status directory_exists_in_index_icase(const char *dirname, int len) { - struct cache_entry *ce = index_name_exists(&the_index, dirname, len + 1, ignore_case); + struct cache_entry *ce = cache_name_exists(dirname, len + 1, ignore_case); unsigned char endchar; if (!ce) -- 1.8.4-rc3-232-ga8053f8 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/3] ls-files -k: a directory only can be killed if the index has a non-directory 2013-08-15 21:28 ` [PATCH 0/3] Optimizing "ls-files -k" Junio C Hamano 2013-08-15 21:28 ` [PATCH 1/3] dir.c: use the cache_* macro to access the current index Junio C Hamano @ 2013-08-15 21:28 ` Junio C Hamano 2013-08-15 21:28 ` [PATCH 3/3] t3010: update to demonstrate "ls-files -k" optimization pitfalls Junio C Hamano 2013-08-15 23:30 ` [PATCH 4/3] git stash: avoid data loss when "git stash save" kills a directory Junio C Hamano 3 siblings, 0 replies; 15+ messages in thread From: Junio C Hamano @ 2013-08-15 21:28 UTC (permalink / raw) To: git; +Cc: Petr Baudis, Josh Triplett "ls-files -o" and "ls-files -k" both traverse the working tree down to find either all untracked paths or those that will be "killed" (removed from the working tree to make room) when the paths recorded in the index are checked out. It is necessary to traverse the working tree fully when enumerating all the "other" paths, but when we are only interested in "killed" paths, we can take advantage of the fact that paths that do not overlap with entries in the index can never be killed. The treat_one_path() helper function, which is called during the recursive traversal, is the ideal place to implement an optimization. When we are looking at a directory P in the working tree, there are three cases: (1) P exists in the index. Everything inside the directory P in the working tree needs to go when P is checked out from the index. (2) P does not exist in the index, but there is P/Q in the index. We know P will stay a directory when we check out the contents of the index, but we do not know yet if there is a directory P/Q in the working tree to be killed, so we need to recurse. (3) P does not exist in the index, and there is no P/Q in the index to require P to be a directory, either. Only in this case, we know that everything inside P will not be killed without recursing. Note that this helper is called by treat_leading_path() that decides if we need to traverse only subdirectories of a single common leading directory, which is essential for this optimization to be correct. This caller checks each level of the leading path component from shallower directory to deeper ones, and that is what allows us to only check if the path appears in the index. If the call to treat_one_path() weren't there, given a path P/Q/R, the real traversal may start from directory P/Q/R, even when the index records P as a regular file, and we would end up having to check if any leading subpath in P/Q/R, e.g. P, appears in the index. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- builtin/ls-files.c | 2 ++ dir.c | 29 +++++++++++++++++++++++++++-- dir.h | 3 ++- 3 files changed, 31 insertions(+), 3 deletions(-) diff --git a/builtin/ls-files.c b/builtin/ls-files.c index 2202072..c7eb6f4 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -213,6 +213,8 @@ static void show_files(struct dir_struct *dir) /* For cached/deleted files we don't need to even do the readdir */ if (show_others || show_killed) { + if (!show_others) + dir->flags |= DIR_COLLECT_KILLED_ONLY; fill_directory(dir, pathspec); if (show_others) show_other_files(dir); diff --git a/dir.c b/dir.c index 2f82cd1..ff768f3 100644 --- a/dir.c +++ b/dir.c @@ -1173,12 +1173,37 @@ static enum path_treatment treat_one_path(struct dir_struct *dir, int dtype, struct dirent *de) { int exclude; + int has_path_in_index = !!cache_name_exists(path->buf, path->len, ignore_case); + if (dtype == DT_UNKNOWN) dtype = get_dtype(de, path->buf, path->len); /* Always exclude indexed files */ - if (dtype != DT_DIR && - cache_name_exists(path->buf, path->len, ignore_case)) + if (dtype != DT_DIR && has_path_in_index) + return path_none; + + /* + * When we are looking at a directory P in the working tree, + * there are three cases: + * + * (1) P exists in the index. Everything inside the directory P in + * the working tree needs to go when P is checked out from the + * index. + * + * (2) P does not exist in the index, but there is P/Q in the index. + * We know P will stay a directory when we check out the contents + * of the index, but we do not know yet if there is a directory + * P/Q in the working tree to be killed, so we need to recurse. + * + * (3) P does not exist in the index, and there is no P/Q in the index + * to require P to be a directory, either. Only in this case, we + * know that everything inside P will not be killed without + * recursing. + */ + if ((dir->flags & DIR_COLLECT_KILLED_ONLY) && + (dtype == DT_DIR) && + !has_path_in_index && + (directory_exists_in_index(path->buf, path->len) == index_nonexistent)) return path_none; exclude = is_excluded(dir, path->buf, &dtype); diff --git a/dir.h b/dir.h index 3d6b80c..4677b86 100644 --- a/dir.h +++ b/dir.h @@ -80,7 +80,8 @@ struct dir_struct { DIR_HIDE_EMPTY_DIRECTORIES = 1<<2, DIR_NO_GITLINKS = 1<<3, DIR_COLLECT_IGNORED = 1<<4, - DIR_SHOW_IGNORED_TOO = 1<<5 + DIR_SHOW_IGNORED_TOO = 1<<5, + DIR_COLLECT_KILLED_ONLY = 1<<6 } flags; struct dir_entry **entries; struct dir_entry **ignored; -- 1.8.4-rc3-232-ga8053f8 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/3] t3010: update to demonstrate "ls-files -k" optimization pitfalls 2013-08-15 21:28 ` [PATCH 0/3] Optimizing "ls-files -k" Junio C Hamano 2013-08-15 21:28 ` [PATCH 1/3] dir.c: use the cache_* macro to access the current index Junio C Hamano 2013-08-15 21:28 ` [PATCH 2/3] ls-files -k: a directory only can be killed if the index has a non-directory Junio C Hamano @ 2013-08-15 21:28 ` Junio C Hamano 2013-08-15 23:30 ` [PATCH 4/3] git stash: avoid data loss when "git stash save" kills a directory Junio C Hamano 3 siblings, 0 replies; 15+ messages in thread From: Junio C Hamano @ 2013-08-15 21:28 UTC (permalink / raw) To: git; +Cc: Petr Baudis, Josh Triplett An earlier draft of the previous step used cache_name_exists() to check the directory we were looking at, which missed the second case described in its log message. Demonstrate why it is not sufficient. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- t/t3010-ls-files-killed-modified.sh | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/t/t3010-ls-files-killed-modified.sh b/t/t3010-ls-files-killed-modified.sh index 95671c2..6ea7ca8 100755 --- a/t/t3010-ls-files-killed-modified.sh +++ b/t/t3010-ls-files-killed-modified.sh @@ -11,6 +11,7 @@ This test prepares the following in the cache: path1 - a symlink path2/file2 - a file in a directory path3/file3 - a file in a directory + pathx/ju - a file in a directory and the following on the filesystem: @@ -21,6 +22,7 @@ and the following on the filesystem: path4 - a file path5 - a symlink path6/file6 - a file in a directory + pathx/ju/nk - a file in a directory to be killed git ls-files -k should report that existing filesystem objects except path4, path5 and path6/file6 to be killed. @@ -44,16 +46,17 @@ then else date > path1 fi -mkdir path2 path3 +mkdir path2 path3 pathx date >path2/file2 date >path3/file3 +>pathx/ju : >path7 date >path8 : >path9 date >path10 test_expect_success \ 'git update-index --add to add various paths.' \ - "git update-index --add -- path0 path1 path?/file? path7 path8 path9 path10" + "git update-index --add -- path0 path1 path?/file? pathx/ju path7 path8 path9 path10" rm -fr path? ;# leave path10 alone date >path2 @@ -65,7 +68,7 @@ else date > path3 date > path5 fi -mkdir path0 path1 path6 +mkdir -p path0 path1 path6 pathx/ju date >path0/file0 date >path1/file1 date >path6/file6 @@ -73,6 +76,7 @@ date >path7 : >path8 : >path9 touch path10 +>pathx/ju/nk test_expect_success \ 'git ls-files -k to show killed files.' \ @@ -82,6 +86,7 @@ path0/file0 path1/file1 path2 path3 +pathx/ju/nk EOF test_expect_success \ @@ -98,6 +103,7 @@ path2/file2 path3/file3 path7 path8 +pathx/ju EOF test_expect_success \ -- 1.8.4-rc3-232-ga8053f8 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 4/3] git stash: avoid data loss when "git stash save" kills a directory 2013-08-15 21:28 ` [PATCH 0/3] Optimizing "ls-files -k" Junio C Hamano ` (2 preceding siblings ...) 2013-08-15 21:28 ` [PATCH 3/3] t3010: update to demonstrate "ls-files -k" optimization pitfalls Junio C Hamano @ 2013-08-15 23:30 ` Junio C Hamano 3 siblings, 0 replies; 15+ messages in thread From: Junio C Hamano @ 2013-08-15 23:30 UTC (permalink / raw) To: git; +Cc: Petr Baudis, Josh Triplett From: Petr Baudis <pasky@ucw.cz> "stash save" is about saving the local change to the working tree, but also about restoring the state of the last commit to the working tree. When a local change is to turn a non-directory to a directory, in order to restore the non-directory, everything in the directory needs to be removed. Which is fine when running "git stash save --include-untracked", but without that option, untracked, newly created files in the directory will have to be discarded, if the state you are restoring to has a non-directory at the same path as the directory. Introduce a safety valve to fail the operation in such case, using the "ls-files --killed" which was designed for this exact purpose. The "stash save" is stopped when untracked files need to be discarded because their leading path ceased to be a directory, and the user is required to pass --force to really have the data removed. Signed-off-by: Petr Baudis <pasky@ucw.cz> Signed-off-by: Junio C Hamano <gitster@pobox.com> --- * And this is the reverted patch ported on top of the "ls-files -k" miniseries I sent earlier. The updates to the test in t3903 compared to the original illustrates that the check implemented in the original did not protect once a path that was turned into a directory from a file gets added to the index, which this round also fixes by running "ls-files -k" against the state in the HEAD. Documentation/git-stash.txt | 11 +++++++++-- git-stash.sh | 20 ++++++++++++++++++++ t/t3903-stash.sh | 22 ++++++++++++++++++++++ 3 files changed, 51 insertions(+), 2 deletions(-) diff --git a/Documentation/git-stash.txt b/Documentation/git-stash.txt index 711ffe1..61fadc5 100644 --- a/Documentation/git-stash.txt +++ b/Documentation/git-stash.txt @@ -14,7 +14,7 @@ SYNOPSIS 'git stash' ( pop | apply ) [--index] [-q|--quiet] [<stash>] 'git stash' branch <branchname> [<stash>] 'git stash' [save [--patch] [-k|--[no-]keep-index] [-q|--quiet] - [-u|--include-untracked] [-a|--all] [<message>]] + [-u|--include-untracked] [-a|--all] [-f|--force] [<message>]] 'git stash' clear 'git stash' create @@ -43,7 +43,7 @@ is also possible). OPTIONS ------- -save [-p|--patch] [--[no-]keep-index] [-u|--include-untracked] [-a|--all] [-q|--quiet] [<message>]:: +save [-p|--patch] [--[no-]keep-index] [-u|--include-untracked] [-a|--all] [-q|--quiet] [-f|--force] [<message>]:: Save your local modifications to a new 'stash', and run `git reset --hard` to revert them. The <message> part is optional and gives @@ -70,6 +70,13 @@ linkgit:git-add[1] to learn how to operate the `--patch` mode. + The `--patch` option implies `--keep-index`. You can use `--no-keep-index` to override this. ++ +In some cases, saving a stash could mean irretrievably removing some +data - if a directory with untracked files replaces a tracked file of +the same name, the new untracked files are not saved (except in case +of `--include-untracked`) but the original tracked file shall be restored. +By default, `stash save` will abort in such a case; `--force` will allow +it to remove the untracked files. list [<options>]:: diff --git a/git-stash.sh b/git-stash.sh index bbefdf6..2d539f3 100755 --- a/git-stash.sh +++ b/git-stash.sh @@ -156,10 +156,19 @@ create_stash () { die "$(gettext "Cannot record working tree state")" } +# This helper MUST be run inside a subshell. +list_killed_files () { + GIT_INDEX_FILE=$TMP-ls-files-k && + export GIT_INDEX_FILE && + git read-tree HEAD && + git ls-files --killed +} + save_stash () { keep_index= patch_mode= untracked= + force= while test $# != 0 do case "$1" in @@ -180,6 +189,9 @@ save_stash () { -u|--include-untracked) untracked=untracked ;; + -f|--force) + force=t + ;; -a|--all) untracked=all ;; @@ -223,6 +235,14 @@ save_stash () { say "$(gettext "No local changes to save")" exit 0 fi + if test -z "$untracked$force" && + test -n "$(list_killed_files | head -n 1)" + then + say "$(gettext "The following untracked files would NOT be saved but need to be removed by stash save:")" + test -n "$GIT_QUIET" || (list_killed_files | sed 's/^/\t/') + say "$(gettext "Aborting. Consider using either the --force or --include-untracked option.")" >&2 + exit 1 + fi test -f "$GIT_DIR/logs/$ref_stash" || clear_stash || die "$(gettext "Cannot initialize stash")" diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh index 5dfbda7..08ce23b 100755 --- a/t/t3903-stash.sh +++ b/t/t3903-stash.sh @@ -637,4 +637,26 @@ test_expect_success 'stash where working directory contains "HEAD" file' ' test_cmp output expect ' +test_expect_success 'stash a change to turn a non-directory to a directory' ' + git reset --hard && + >testfile && + git add testfile && + git commit -m "add testfile as a regular file" && + rm testfile && + mkdir testfile && + >testfile/file && + test_must_fail git stash save "recover regular file" && + test -f testfile/file && + + git add testfile/file && + test_must_fail git stash save "recover regular file after adding" && + test -f testfile/file +' + +test_expect_success 'stash a change to turn a non-directory to a directory (forced)' ' + git stash save --force "recover regular file (forced)" && + ! test -f testfile/file && + test -f testfile +' + test_done -- 1.8.4-rc3-236-g903ae4b ^ permalink raw reply related [flat|nested] 15+ messages in thread
end of thread, other threads:[~2013-08-15 23:30 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-08-10 21:44 git stash takes excessively long when many untracked files present Josh Triplett 2013-08-13 10:11 ` Anders Darander 2013-08-13 17:07 ` Junio C Hamano 2013-08-13 17:36 ` Anders Darander 2013-08-13 17:52 ` Junio C Hamano 2013-08-13 21:47 ` Junio C Hamano 2013-08-15 17:52 ` Junio C Hamano 2013-08-15 18:07 ` Josh Triplett 2013-08-15 18:58 ` Junio C Hamano 2013-08-15 19:47 ` Junio C Hamano 2013-08-15 21:28 ` [PATCH 0/3] Optimizing "ls-files -k" Junio C Hamano 2013-08-15 21:28 ` [PATCH 1/3] dir.c: use the cache_* macro to access the current index Junio C Hamano 2013-08-15 21:28 ` [PATCH 2/3] ls-files -k: a directory only can be killed if the index has a non-directory Junio C Hamano 2013-08-15 21:28 ` [PATCH 3/3] t3010: update to demonstrate "ls-files -k" optimization pitfalls Junio C Hamano 2013-08-15 23:30 ` [PATCH 4/3] git stash: avoid data loss when "git stash save" kills a directory Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).