git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ben Peart <peartben@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: "Git Mailing List" <git@vger.kernel.org>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Johannes Schindelin" <johannes.schindelin@gmx.de>,
	"David Turner" <David.Turner@twosigma.com>,
	"Jeff King" <peff@peff.net>,
	"Christian Couder" <christian.couder@gmail.com>,
	"Ben Peart" <benpeart@microsoft.com>
Subject: Re: [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor
Date: Wed, 7 Jun 2017 15:51:26 -0400	[thread overview]
Message-ID: <0cfd57d3-4092-af06-0d6c-cb3ceecc4633@gmail.com> (raw)
In-Reply-To: <CACBZZX48t3Jcy=eiga1f_ATSZZvy9_LG9wEe7avD2NCq2bsmJA@mail.gmail.com>



On 6/2/2017 7:06 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> I don't have time to update the perf test now or dig into it, but most
> of what you're describing in this mail doesn't at all match with the
> ad-hoc tests I ran in
> https://public-inbox.org/git/CACBZZX5e58bWuf3NdDYTxu2KyZj29hHONzN=rp-7vXd8nURyWQ@mail.gmail.com/
> 
> There (at the very end of the E-Mail) I'm running watchman in a tight
> loop while I flush the entire fs cache, its runtime is never longer
> than 600ms, with 3ms being the norm.

I added a perf trace around the entire query-fsmonitor hook proc (patch 
below) to measure the total actual impact of running the hook script + 
querying watchman + parsing the output with perl + passing the result 
back to git. On my machine, the total cost of the hook runs between 130 
ms and 180 ms when there are zero changes to report (ie best case).

With short status times, the overhead of watchman simply outweighs any 
gains in performance - especially when you have a warm file system cache 
as that cancels out the biggest win of avoiding the IO associated with 
scanning the working directory.


diff --git a/fsmonitor.c b/fsmonitor.c
index 763a8a3a3f..cb47f31863 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -210,9 +210,11 @@ void refresh_by_fsmonitor(struct index_state *istate)
          * If we have a last update time, call query-monitor for the set of
          * changes since that time.
          */
-       if (istate->fsmonitor_last_update)
+       if (istate->fsmonitor_last_update) {
                 query_success = !query_fsmonitor(HOOK_INTERFACE_VERSION,
                         istate->fsmonitor_last_update, &query_result);
+               trace_performance_since(last_update, "query-fsmonitor");
+       }

         if (query_success) {
                 /* Mark all entries returned by the monitor as dirty */



> 
> I.e. flushing the cache doesn't slow things down much at all compared
> to how long a "git status" takes from cold cache. Something else must
> be going on, and the smoking gun is the gprof output I posted in the
> follow-up E-Mail:
> https://public-inbox.org/git/CACBZZX4eZ3G8LQ8O+_BkbkJ-ZXTOkUi9cW=QKYjfHKtmA3pgrA@mail.gmail.com/
> 
> There with the fsmonitor we end up calling blk_SHA1_Block ~100K times
> during "status", but IIRC (I don't have the output in front of me,
> this is from memory) something like twenty times without the
> fsmonitor.
> 
> It can't be a coincidence that with the fscache:
> 
> $ pwd; git ls-files | wc -l
> /home/avar/g/linux
> 59844
> 
> And you can see that in the fsmonitor "git status" we make exactly
> that many calls to cache_entry_from_ondisk(), but those calls don't
> show up at all in the non-fscache codepath.
> 

I don't see how the gprof numbers for the non-fsmonitor case can be 
correct.  It appears they don't contain any calls related to loading the 
index while the fsmonitor gprof numbers do.  Here is a typical call stack:

git.exe!cache_entry_from_ondisk()
git.exe!create_from_disk()
git.exe!do_read_index()
git.exe!read_index_from()
git.exe!read_index()

During read_index(), cache_entry_from_ondisk() gets called for every 
item in the index (which explains the 59K calls).  How can the 
non-fsmonitor codepath not be loading the index?

> So, again, I haven't dug and really must step away from the computer
> now, but this really looks like the fscache saves us the recursive
> readdir() / lstat() etc, but in return we somehow fall though to a
> codepath where we re-read the entire on-disk state back into the
> index, which we don't do in the non-fscache codepath.
> 

I've run multiple profiles and compared them with fsmonitor on and off 
and have been unable to find any performance regression caused by 
fsmonitor (other than flagging the index as dirty at times when it isn't 
required which I have fixed for the next patch series).

I have done many performance runs and when I subtract the _actual_ time 
spent in the hook from the overall command time, it comes in at slightly 
less time than when status is run with fsmonitor off.  This also leads 
me to believe there is no regression with fsmonitor on.

All this leads me back to my original conclusion: the reason status is 
slower in these specific cases is because the overhead of calling the 
hook exceeds the savings gained. If your status calls are taking less 
than a second, it just doesn't make sense to add the complexity and 
overhead of calling a file system watcher.

I'm working on an updated perf test that will demonstrate the best case 
scenario (warm watchman, cold file system cache) in addition to the 
worst case (cold watchman, warm file system cache).  The reality is that 
in normal use cases, perf will be between the two. I'll add that to the 
next iteration of the patch series.

  reply	other threads:[~2017-06-07 19:51 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-01 15:50 [PATCH v4 0/6] Fast git status via a file system watcher Ben Peart
2017-06-01 15:51 ` [PATCH v4 1/6] bswap: add 64 bit endianness helper get_be64 Ben Peart
2017-06-01 15:51 ` [PATCH v4 2/6] dir: make lookup_untracked() available outside of dir.c Ben Peart
2017-06-01 15:51 ` [PATCH v4 3/6] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files Ben Peart
2017-06-01 15:51 ` [PATCH v4 4/6] fsmonitor: add test cases for fsmonitor extension Ben Peart
2017-06-01 15:51 ` [PATCH v4 5/6] fsmonitor: add documentation for the " Ben Peart
2017-06-01 15:51 ` [PATCH v4 6/6] fsmonitor: add a sample query-fsmonitor hook script for Watchman Ben Peart
2017-06-07 21:38   ` Ævar Arnfjörð Bjarmason
2017-06-01 19:57 ` [PATCH v4 0/6] Fast git status via a file system watcher Ævar Arnfjörð Bjarmason
2017-06-01 21:06   ` Ben Peart
2017-06-01 21:12     ` Ævar Arnfjörð Bjarmason
2017-06-01 21:13     ` Stefan Beller
2017-06-01 21:26       ` Jeff King
2017-06-01 20:51 ` Ævar Arnfjörð Bjarmason
2017-06-01 21:13   ` Ævar Arnfjörð Bjarmason
2017-06-02  0:40     ` Ben Peart
2017-06-02 10:28       ` [WIP/PATCH 7/6] perf: add a performance test for core.fsmonitor Ævar Arnfjörð Bjarmason
2017-06-02 21:44         ` David Turner
2017-06-03 18:08           ` Ævar Arnfjörð Bjarmason
2017-06-05 14:27           ` Ben Peart
2017-06-02 22:05         ` Ben Peart
2017-06-02 23:06           ` Ævar Arnfjörð Bjarmason
2017-06-07 19:51             ` Ben Peart [this message]
2017-06-07 21:46               ` Ævar Arnfjörð Bjarmason
2017-06-08  1:57                 ` Ben Peart
2017-06-04  1:59         ` Junio C Hamano
2017-06-04  7:46           ` Ævar Arnfjörð Bjarmason
2017-06-04  8:21             ` Jeff King
2017-06-02  1:56 ` [PATCH v4 0/6] Fast git status via a file system watcher Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0cfd57d3-4092-af06-0d6c-cb3ceecc4633@gmail.com \
    --to=peartben@gmail.com \
    --cc=David.Turner@twosigma.com \
    --cc=avarab@gmail.com \
    --cc=benpeart@microsoft.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).