From: Junio C Hamano <gitster@pobox.com>
To: git@vger.kernel.org
Cc: Petr Baudis <pasky@ucw.cz>, Josh Triplett <josh@joshtriplett.org>
Subject: [PATCH 2/3] ls-files -k: a directory only can be killed if the index has a non-directory
Date: Thu, 15 Aug 2013 14:28:09 -0700 [thread overview]
Message-ID: <1376602090-19142-3-git-send-email-gitster@pobox.com> (raw)
In-Reply-To: <1376602090-19142-1-git-send-email-gitster@pobox.com>
"ls-files -o" and "ls-files -k" both traverse the working tree down
to find either all untracked paths or those that will be "killed"
(removed from the working tree to make room) when the paths recorded
in the index are checked out. It is necessary to traverse the
working tree fully when enumerating all the "other" paths, but when
we are only interested in "killed" paths, we can take advantage of
the fact that paths that do not overlap with entries in the index
can never be killed.
The treat_one_path() helper function, which is called during the
recursive traversal, is the ideal place to implement an
optimization.
When we are looking at a directory P in the working tree, there are
three cases:
(1) P exists in the index. Everything inside the directory P in
the working tree needs to go when P is checked out from the
index.
(2) P does not exist in the index, but there is P/Q in the index.
We know P will stay a directory when we check out the contents
of the index, but we do not know yet if there is a directory
P/Q in the working tree to be killed, so we need to recurse.
(3) P does not exist in the index, and there is no P/Q in the index
to require P to be a directory, either. Only in this case, we
know that everything inside P will not be killed without
recursing.
Note that this helper is called by treat_leading_path() that decides
if we need to traverse only subdirectories of a single common
leading directory, which is essential for this optimization to be
correct. This caller checks each level of the leading path
component from shallower directory to deeper ones, and that is what
allows us to only check if the path appears in the index. If the
call to treat_one_path() weren't there, given a path P/Q/R, the real
traversal may start from directory P/Q/R, even when the index
records P as a regular file, and we would end up having to check if
any leading subpath in P/Q/R, e.g. P, appears in the index.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
builtin/ls-files.c | 2 ++
dir.c | 29 +++++++++++++++++++++++++++--
dir.h | 3 ++-
3 files changed, 31 insertions(+), 3 deletions(-)
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 2202072..c7eb6f4 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -213,6 +213,8 @@ static void show_files(struct dir_struct *dir)
/* For cached/deleted files we don't need to even do the readdir */
if (show_others || show_killed) {
+ if (!show_others)
+ dir->flags |= DIR_COLLECT_KILLED_ONLY;
fill_directory(dir, pathspec);
if (show_others)
show_other_files(dir);
diff --git a/dir.c b/dir.c
index 2f82cd1..ff768f3 100644
--- a/dir.c
+++ b/dir.c
@@ -1173,12 +1173,37 @@ static enum path_treatment treat_one_path(struct dir_struct *dir,
int dtype, struct dirent *de)
{
int exclude;
+ int has_path_in_index = !!cache_name_exists(path->buf, path->len, ignore_case);
+
if (dtype == DT_UNKNOWN)
dtype = get_dtype(de, path->buf, path->len);
/* Always exclude indexed files */
- if (dtype != DT_DIR &&
- cache_name_exists(path->buf, path->len, ignore_case))
+ if (dtype != DT_DIR && has_path_in_index)
+ return path_none;
+
+ /*
+ * When we are looking at a directory P in the working tree,
+ * there are three cases:
+ *
+ * (1) P exists in the index. Everything inside the directory P in
+ * the working tree needs to go when P is checked out from the
+ * index.
+ *
+ * (2) P does not exist in the index, but there is P/Q in the index.
+ * We know P will stay a directory when we check out the contents
+ * of the index, but we do not know yet if there is a directory
+ * P/Q in the working tree to be killed, so we need to recurse.
+ *
+ * (3) P does not exist in the index, and there is no P/Q in the index
+ * to require P to be a directory, either. Only in this case, we
+ * know that everything inside P will not be killed without
+ * recursing.
+ */
+ if ((dir->flags & DIR_COLLECT_KILLED_ONLY) &&
+ (dtype == DT_DIR) &&
+ !has_path_in_index &&
+ (directory_exists_in_index(path->buf, path->len) == index_nonexistent))
return path_none;
exclude = is_excluded(dir, path->buf, &dtype);
diff --git a/dir.h b/dir.h
index 3d6b80c..4677b86 100644
--- a/dir.h
+++ b/dir.h
@@ -80,7 +80,8 @@ struct dir_struct {
DIR_HIDE_EMPTY_DIRECTORIES = 1<<2,
DIR_NO_GITLINKS = 1<<3,
DIR_COLLECT_IGNORED = 1<<4,
- DIR_SHOW_IGNORED_TOO = 1<<5
+ DIR_SHOW_IGNORED_TOO = 1<<5,
+ DIR_COLLECT_KILLED_ONLY = 1<<6
} flags;
struct dir_entry **entries;
struct dir_entry **ignored;
--
1.8.4-rc3-232-ga8053f8
next prev parent reply other threads:[~2013-08-15 21:28 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-10 21:44 git stash takes excessively long when many untracked files present Josh Triplett
2013-08-13 10:11 ` Anders Darander
2013-08-13 17:07 ` Junio C Hamano
2013-08-13 17:36 ` Anders Darander
2013-08-13 17:52 ` Junio C Hamano
2013-08-13 21:47 ` Junio C Hamano
2013-08-15 17:52 ` Junio C Hamano
2013-08-15 18:07 ` Josh Triplett
2013-08-15 18:58 ` Junio C Hamano
2013-08-15 19:47 ` Junio C Hamano
2013-08-15 21:28 ` [PATCH 0/3] Optimizing "ls-files -k" Junio C Hamano
2013-08-15 21:28 ` [PATCH 1/3] dir.c: use the cache_* macro to access the current index Junio C Hamano
2013-08-15 21:28 ` Junio C Hamano [this message]
2013-08-15 21:28 ` [PATCH 3/3] t3010: update to demonstrate "ls-files -k" optimization pitfalls Junio C Hamano
2013-08-15 23:30 ` [PATCH 4/3] git stash: avoid data loss when "git stash save" kills a directory Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1376602090-19142-3-git-send-email-gitster@pobox.com \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=josh@joshtriplett.org \
--cc=pasky@ucw.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).