From: Junio C Hamano <gitster@pobox.com>
To: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Cc: git@vger.kernel.org, Karsten Blees <karsten.blees@gmail.com>,
kusmabite@gmail.com, Ramkumar Ramachandra <artagnon@gmail.com>,
Robert Zeh <robert.allan.zeh@gmail.com>,
finnag@pvv.org
Subject: Re: [PATCH] read_directory: avoid invoking exclude machinery on tracked files
Date: Fri, 15 Feb 2013 08:52:03 -0800 [thread overview]
Message-ID: <7vd2w1wmdo.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <1360937848-4426-1-git-send-email-pclouds@gmail.com> ("Nguyễn Thái Ngọc Duy"'s message of "Fri, 15 Feb 2013 21:17:28 +0700")
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> read_directory() (and its friendly wrapper fill_directory) collects
> untracked/ignored files by traversing through the whole worktree (*),
> feeding every entry to treat_one_path(), where each entry is checked
> against .gitignore patterns.
>
> One may see that tracked files can't be excluded and we do not need to
> run them through exclude machinery. On repos where there are many
> .gitignore patterns and/or a lot of tracked files, this unnecessary
> processing can become expensive.
>
> This patch avoids it mostly for normal cases. Directories are still
> processed as before. DIR_SHOW_IGNORED and DIR_COLLECT_IGNORED are not
> normally used unless some options are given (e.g. "checkout
> --overwrite-ignore", "add -f"...) so people still need to pay penalty
> in some cases, just not as often as before.
>
> git status | webkit linux-2.6 libreoffice-core gentoo-x86
> -------------+----------------------------------------------
> before | 1.159s 0.226s 0.415s 0.597s
> after | 0.778s 0.176s 0.266s 0.556s
> nr. patterns | 89 376 19 0
> nr. tracked | 182k 40k 63k 101k
>
> (*) Not completely true. read_directory may skip recursing into a
> directory if it's entirely excluded and DIR_SHOW_OTHER_DIRECTORIES
> is not set.
>
> Tracked-down-by: Karsten Blees <karsten.blees@gmail.com>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
> For reference:
> http://thread.gmane.org/gmane.comp.version-control.git/215820/focus=216195
>
> dir.c | 17 +++++++++++++----
> 1 file changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index 57394e4..bdff256 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1244,7 +1244,19 @@ static enum path_treatment treat_one_path(struct dir_struct *dir,
> const struct path_simplify *simplify,
> int dtype, struct dirent *de)
> {
> - int exclude = is_excluded(dir, path->buf, &dtype);
> + int exclude;
> +
> + if (dtype == DT_UNKNOWN)
> + dtype = get_dtype(de, path->buf, path->len);
> +
> + if (!(dir->flags & DIR_SHOW_IGNORED) &&
> + !(dir->flags & DIR_COLLECT_IGNORED) &&
> + dtype != DT_DIR &&
> + cache_name_exists(path->buf, path->len, ignore_case))
> + return path_ignored;
> +
> + exclude = is_excluded(dir, path->buf, &dtype);
> +
> if (exclude && (dir->flags & DIR_COLLECT_IGNORED)
> && exclude_matches_pathspec(path->buf, path->len, simplify))
> dir_add_ignored(dir, path->buf, path->len);
Interesting.
In the current code, we always check if a path is excluded, and when
dealing with DT_REG/DT_LNK, we call treat_file():
* When such a path is excluded, treat_file() returns true when we
are not showing ignored directories. This causes treat_one_path()
to return path_ignored, so for excluded DT_REG/DT_LNK paths when
no DIR_*_IGNORED is in effect, this change is a correct
optimization.
* When such a path is not excluded, on the ther hand, and when we
are not showing ignored directories, treat_file() just returns
the value of exclude_file, which is initialized to false and is
not changed in the function. This causes treat_one_path() to
return path_handled. However, the new code returns path_ignored
in this case.
What guarantees that this change is regression free? I do not seem
to be able to find anything that checks if the path is already known
to the index in the original code for the case you special cased
(i.e. DIR_*_IGNORED is not set and dtype is not DT_DIR). Do all the
callers that reach this function in their callgraph, when they get
path_ignored for a path in the index, behave as if the difference
between path_ignored and path_handled does not matter?
> @@ -1256,9 +1268,6 @@ static enum path_treatment treat_one_path(struct dir_struct *dir,
> if (exclude && !(dir->flags & DIR_SHOW_IGNORED))
> return path_ignored;
>
> - if (dtype == DT_UNKNOWN)
> - dtype = get_dtype(de, path->buf, path->len);
> -
> switch (dtype) {
> default:
> return path_ignored;
next prev parent reply other threads:[~2013-02-15 16:52 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-15 14:17 [PATCH] read_directory: avoid invoking exclude machinery on tracked files Nguyễn Thái Ngọc Duy
2013-02-15 16:52 ` Junio C Hamano [this message]
2013-02-15 18:30 ` Duy Nguyen
2013-02-15 19:32 ` Junio C Hamano
2013-02-16 3:31 ` Duy Nguyen
2013-02-18 16:42 ` Karsten Blees
2013-02-16 7:17 ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2013-02-16 18:11 ` Pete Wyckoff
2013-02-17 4:39 ` Duy Nguyen
2013-02-17 15:49 ` Pete Wyckoff
2013-02-17 23:18 ` Junio C Hamano
2013-02-25 22:01 ` [PATCH/RFC] dir.c: Make git-status --ignored even more consistent Karsten Blees
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7vd2w1wmdo.fsf@alter.siamese.dyndns.org \
--to=gitster@pobox.com \
--cc=artagnon@gmail.com \
--cc=finnag@pvv.org \
--cc=git@vger.kernel.org \
--cc=karsten.blees@gmail.com \
--cc=kusmabite@gmail.com \
--cc=pclouds@gmail.com \
--cc=robert.allan.zeh@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).