git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
To: git@vger.kernel.org
Cc: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: [PATCH 00/20] Untracked cache to speed up "git status"
Date: Wed,  7 May 2014 21:51:40 +0700	[thread overview]
Message-ID: <1399474320-6840-1-git-send-email-pclouds@gmail.com> (raw)

First of all, thanks for pointing to many more big repos. I'll look at
them tomorrow. End-of-day report (or ranting :D) time.

The series now looks good enough for public eyes. I haven't run the
test suite with untracked cache on by default so confidence level is
not so high. Although I suspect racy timestamp issue will practically
disable the cache anyway.

The idea is as before, exploit directory mtime to cache untracked
files. MSDN tells me NTFS on Windows exposes the "good" dir mtime
behavior, which means this series could speed up Git on Windows (I
think Karsten fscache only deals with slow lstat, untracked files..).

It would be nice if Windows people could try and confirm this. This
could be a good point for untracked cache vs watchman (no windows
support, last time I checked). Usage is very simple, "git update-index
--untracked-cache" and you're ready. Do --no-untracked-cache to revert
back.

The peformance numbers on webkit look good. If we focus on
read_directory time only. Normally it takes 890ms. The first run with
untracked cache goes up to 922ms (filling up the cache, not counting
index write time). The next run goes down to 184ms (best case). The
gain is about 80%. lstat costs on directories only about 20ms out of
that 184ms, so I still need to see if I can lower that number further
down.

"git status" performance gain is less impressive of course. Only about
38% with refresh time now becomes the top offender. With
core.preloadindex on, the gain increases to 50%. There's still room
for improvement to maybe make it to 65% by reducing read time, I think.

But again we may not stay in the best case forever. The more dirs are
damaged, the slower it gets. At the end of the spectrum, all dirs are
damanged, we gain nothing but overhead. This is actually when watchman
shines, although projects that do that may need some improvements.

Another bad point for untracked cache is, the extension data is
so specifiec to core git algorithm that it probably cannot be reused
by other implementations. Again, watchman shines here.

Last note, this series conflicts with split-index due to the
write_cache API change, so not a candidate for 'pu' yet. The series
could also be fetched from

https://github.com/pclouds/git/commits/untracked-cache

except the last few timing/experimental patches.

Nguyễn Thái Ngọc Duy (20):
  dir.c: coding style fix
  dir.h: move struct exclude declaration to top level
  prep_exclude: remove the artificial PATH_MAX limit
  dir.c: optionally compute sha-1 of a .gitignore file
  untracked cache: record .gitignore information and dir hierarchy
  untracked cache: initial untracked cache validation
  untracked cache: invalidate dirs recursively if .gitignore changes
  untracked cache: record/validate dir mtime and reuse cached output
  untracked cache: mark what dirs should be recursed/saved
  untracked cache: don't open non-existent .gitignore
  untracked cache: save to an index extension
  untracked cache: load from UNTR index extension
  untracked cache: invalidate at index addition or removal
  untracked cache: print untracked statistics with $GIT_TRACE_UNTRACKED
  read-cache.c: split racy stat test to a separate function
  untracked cache: avoid racy timestamps
  status: support untracked cache
  update-index: manually enable or disable untracked cache
  update-index: test the system before enabling untracked cache
  t7063: tests for untracked cache

 .gitignore                                 |   1 +
 Makefile                                   |   1 +
 builtin/commit.c                           |   8 +
 builtin/update-index.c                     | 161 ++++++
 cache.h                                    |   5 +
 dir.c                                      | 853 +++++++++++++++++++++++++++--
 dir.h                                      | 120 +++-
 read-cache.c                               |  51 +-
 t/t7063-status-untracked-cache.sh (new +x) | 352 ++++++++++++
 test-dump-untracked-cache.c (new)          |  61 +++
 unpack-trees.c                             |   7 +-
 wt-status.c                                |   6 +
 12 files changed, 1537 insertions(+), 89 deletions(-)
 create mode 100755 t/t7063-status-untracked-cache.sh
 create mode 100644 test-dump-untracked-cache.c

-- 
1.9.1.346.ga2b5940

             reply	other threads:[~2014-05-07 14:52 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-07 14:51 Nguyễn Thái Ngọc Duy [this message]
2014-05-07 14:51 ` [PATCH 01/20] dir.c: coding style fix Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 02/20] dir.h: move struct exclude declaration to top level Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 03/20] prep_exclude: remove the artificial PATH_MAX limit Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 04/20] dir.c: optionally compute sha-1 of a .gitignore file Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 05/20] untracked cache: record .gitignore information and dir hierarchy Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 06/20] untracked cache: initial untracked cache validation Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 07/20] untracked cache: invalidate dirs recursively if .gitignore changes Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 08/20] untracked cache: record/validate dir mtime and reuse cached output Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 09/20] untracked cache: mark what dirs should be recursed/saved Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 10/20] untracked cache: don't open non-existent .gitignore Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 11/20] untracked cache: save to an index extension Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 12/20] untracked cache: load from UNTR " Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 13/20] untracked cache: invalidate at index addition or removal Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 14/20] untracked cache: print untracked statistics with $GIT_TRACE_UNTRACKED Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 15/20] read-cache.c: split racy stat test to a separate function Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 16/20] untracked cache: avoid racy timestamps Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 17/20] status: support untracked cache Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 18/20] update-index: manually enable or disable " Nguyễn Thái Ngọc Duy
2014-05-07 14:51 ` [PATCH 19/20] update-index: test the system before enabling " Nguyễn Thái Ngọc Duy
2014-05-07 14:52 ` [PATCH 20/20] t7063: tests for " Nguyễn Thái Ngọc Duy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1399474320-6840-1-git-send-email-pclouds@gmail.com \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).