git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Torsten Bögershausen" <tboegi@web.de>
To: Duy Nguyen <pclouds@gmail.com>, git@vger.kernel.org
Cc: dturner@twopensource.com
Subject: Re: [RFC] On watchman support
Date: Thu, 13 Nov 2014 06:05:52 +0100	[thread overview]
Message-ID: <54643C30.6010204@web.de> (raw)
In-Reply-To: <20141111124901.GA6011@lanh>

On 2014-11-11 13.49, Duy Nguyen wrote:
> I've come to the last piece to speed up "git status", watchman
> support. And I realized it's not as good as I thought.
> 
> Watchman could be used for two things: to avoid refreshing the index,
> and to avoid searching for ignored files. The first one can be done
> (with the patch below as demonstration). And it should keep refresh
> cost to near zero in the best case, the cost is proportional to the
> number of modified files.
> 
> For avoiding searching for ignored files. My intention was to build on
> top of untracked cache. If watchman can tell me what files are added
> or deleted since last observed time, then I can invalidate just
> directories that contain them, or even better, calculate ignore status
> for those files only.
> 
> This is important because in reality compilers and editors tend to
> update files by creating a new version then rename them, updating
> directory mtime and invalidating untracked cache as a consequence. As
> you edit more files (or your rebuild touches more dirs), untracked
> cache performance drops (until the next "git status"). The numbers I
> posted so far are the best case.
> 
> The problem with watchman is it cannot tell me "new" files since the
> last observed time (let's say 'T'). If a file exists at 'T', gets
> deleted then recreated, then watchman tells me it's a new file. I want
> to separate those from ones that do not exist before 'T'.
> 
> David's watchman approach does not have this problem because he keeps
> track of all entries under $GIT_WORK_TREE and knows which files are
> truely new. But I don't really want to keep the whole file list around,
> especially when watchman already manages the same list.
> 
> So we got a few options:
> 
> 1) Convince watchman devs to add something to make it work
> 
> 2) Fork watchman
> 
> 3) Make another daemon to keep file list around, or put it in a shared
>    memory.
> 
> 4) Move David's watchman series forward (and maybe make use of shared
>    mem for fs_cache).
> 
> 5) Go with something similar to the patch below and accept untracked
>    cache performance degrades from time to time
> 
> 6) ??
> 
> I'm working on 1). 2) is just bad taste, listed for completeness
> only. If we go with 3) and watchman starts to support Windows (seems
> to be in their plan), we'll need to rework some how. And I really
> don't like 3)
> 
> If 1-3 does not work out, we're left without 4) and 5). We could
> support both, but proobably not worth the code complexity and should
> just go with one.
> 
> And if we go with 4) we should probably think of dropping untracked
> cache if watchman will support Windows in the end. 4) also has another
> advantage over untracked cache, that it could speed up listing ignored
> files as well as untracked files.
> 
> Comments?
> 
[remove the patch]
From a Git user perspective it could be good to have something like this:

a) git status -u
b) git status -uno
c) git status -umtime
d) git status -uwatchman

We know that a) and b) already exist.
c) Can be convenient to have, in order to do benchmarking and testing.
  When the UNTR extension is not found, Git can give an error,
  saying something like this:
  No mtime information found, use "git update-index --untracked-cache"
d) does not yet exist

Of course we may want to configure the default for "git status" in a default variable,
like status.findUntrackedFiles, which can be empty "", "mtime" or "watchman",
and we may add other backends later.

A short test showed that watchman compiles under Mac OS.
The patch did not compile out of the box (both Git and watchman declare
there own version of usage(), some C99 complaints from the compiler in watchman,
nothing that can not be fixed easily)


I will test the mtime patch under networked file systems the next weeks.


The short version:
Go with c), d) then 5) until we have something better :-) 

  reply	other threads:[~2014-11-13  5:06 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-11 12:49 [RFC] On watchman support Duy Nguyen
2014-11-13  5:05 ` Torsten Bögershausen [this message]
2014-11-13 12:22   ` Duy Nguyen
2014-11-15  7:24     ` Torsten Bögershausen
2014-11-18  0:25 ` David Turner
2014-11-18 10:48   ` Duy Nguyen
2014-11-18 18:12     ` David Turner
2014-11-18 20:55       ` Junio C Hamano
2014-11-18 21:12         ` David Turner
2014-11-18 21:26           ` Junio C Hamano
2014-11-19  1:46             ` Jeff King
2014-11-28 11:13       ` Duy Nguyen
2014-12-01 20:45         ` David Turner
2014-11-19 15:26   ` Paolo Ciarrocchi
2014-11-19 16:43     ` David Turner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54643C30.6010204@web.de \
    --to=tboegi@web.de \
    --cc=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).