From: Junio C Hamano <gitster@pobox.com>
To: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [RFC] Speed up "git status" by caching untracked file info
Date: Thu, 17 Apr 2014 12:40:11 -0700 [thread overview]
Message-ID: <xmqqy4z3d9t0.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <1397713918-22829-1-git-send-email-pclouds@gmail.com> ("Nguyễn Thái Ngọc Duy"'s message of "Thu, 17 Apr 2014 12:51:58 +0700")
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> first run second (cached) run
> gentoo-x86 500 ms 71.6 ms
> wine 140 ms 9.72 ms
> webkit 125 ms 6.88 ms
> linux-2.6 106 ms 16.2 ms
>
> Basically untracked time is cut to one tenth in the best case
> scenario. The final numbers would be a bit higher because I haven't
> stored or read the cache from index yet. Real commit message follows..
As you allude to later with "if you recompile a single file, the
whole hierarchy in that directory is lost", two back-to-back runs of
"git status" is not very interesting.
> - The list of files and directories of the direction in question
> - The $GIT_DIR/index
> - The content of $GIT_DIR/info/exclude
> - The content of core.excludesfile
> - The content (or the lack) of .gitignore of all parent directories
> from $GIT_WORK_TREE
>
> If we can cheaply validate all those inputs for a certain directory,
> we are sure that the current code will always produce the same
> results, so we can cache and reuse those results.
>
> This is not a silver bullet approach. When you compile a C file, for
> example, the old .o file is removed and a new one with the same name
> created, effectively invalidating the containing directory's
> cache. But at least with a large enough work tree, there could be many
> directories you never touch. The cache could help there.
>
> The first input can be checked using directory mtime. In many
> filesystems, directory mtime is updated when direct files/dirs are
> added or removed (*).
An important thing is that creation of new cruft or deletion of
existing cruft can be detected without any false negative with the
mechanism, and mtime on directory would be a good way to check it.
> The second one can be hooked from read-cache.c. Whenever a file (or a
> submodule) is added or removed from a directory, we invalidate that
> directory. This will be done in a later patch.
I would imagine that it would be done at the same places as we
invalidate cache-trees, with the same "invalidation percolates up"
logic.
> On subsequent runs, read_directory_recursive() reads stat info of the
> directory in question and verifies if files/dirs have been added or
> removed.
Hmph. If you have a two-level hierarchy D1/D2 and you change the
list of crufts in D2 but not in D1, the mtime of D1/D2 changes but
not the mtime of D1, as you observed below.
> With the help of prep_exclude() to verify .gitignore chain,
> it may decide "all is well" and enable the fast path in
> treat_path(). read_directory_recursive() is still called for
> subdirectories even in fast path, because a directory mtime does not
> cover all subdirs recursively.
I wonder if you can avoid recursing into D1 when no cached mtime
(and .gitignore) information has changed in any subdirectory of it
(e.g. both D1 and D1/D2 match the cache).
next prev parent reply other threads:[~2014-04-17 19:40 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-17 5:51 [RFC] Speed up "git status" by caching untracked file info Nguyễn Thái Ngọc Duy
2014-04-17 19:40 ` Junio C Hamano [this message]
2014-04-17 23:27 ` Duy Nguyen
2014-04-22 9:56 ` Karsten Blees
2014-04-22 10:13 ` Duy Nguyen
2014-04-22 10:35 ` Duy Nguyen
2014-04-22 18:56 ` Karsten Blees
2014-04-23 0:52 ` Duy Nguyen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqy4z3d9t0.fsf@gitster.dls.corp.google.com \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.