From: David Turner <dturner@twopensource.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: Watchman support for git
Date: Thu, 15 May 2014 15:42:51 -0400 [thread overview]
Message-ID: <1400182971.14179.49.camel@stross> (raw)
In-Reply-To: <CACsJy8C49EDwjtv_L2pTRy3XbPptp7+nNzT=Jp4BaH_TOZtvnQ@mail.gmail.com>
On Wed, 2014-05-14 at 17:36 +0700, Duy Nguyen wrote:
> >> With that in
> >> mind, I think you don't need to keep a fs cache on disk at all. All
> >> you need to store (in the index) is the clock value from watchman.
> >> After we parse the index, we perform a "since" query to get path names
> >> (and perhaps "exists" and "mode" for later). Then we set CE_VALID bit
> >> on entries that are _not_ in the query result. The remaining items
> >> will be lstat'd by git (see [1] and read-cache.c changes on the next
> >> few patches). Assuming the number of updated files is reasonably
> >> small, we won't be punished by lookup time.
> >
> > I considered this, but rejected it for a few reasons:
> > 1. We still need to know about the untracked files. I guess we could
> > do an index extension for just that, like your untracked cache.
>
> Yes. But consider that the number of untracked files is usually small
> (otherwise 'git status' would look very messy). And your fscache would
> need to store excluded file list too, which could be a lot bigger (one
> pair of .[ch] -> one .o). _But_ yours would make 'git status
> --ignored' work. I don't consider that a major use case for
> optimization, but people may have different opinions.
I don't think --ignored is a major use case. But I do think it's nice
to get as much mileage as possible out of a change like this.
> > 2. That doesn't eliminate opendir/readdir, I think. Those are a major
> > cost. I did not thoroughly review your patches from January/February,
> > but I didn't notice anything in there about this part of dir.c.
> > Also the cost of open(nonexistent .gitignore) to do ignore processing.
>
> Assuming untracked cache is in use, opendir/readdir is replaced with
> lstat. And cheap lstat can be solved with watchman without storing
> anything extra. I solve open(.gitignore) too by checking its stat data
> with the one in index. If matches, I reuse the version in tree. This
> does not necessarily make it cheaper, but it increases locality so it
> might be. _Modified_ .gitignore files have to go through
> open(.gitignore), but people don't update .gitignore often.
Interesting -- if all that actually works, then it's probably the right
approach.
> > 3. Changing a file in the index (e.g. git add) requires clearing
> > the CE_VALID bit; this means that third-party libraries (e.g. jgit)
> > that change the git repo need to understand this extension for correct
> > operation. Maybe that's just the nature of extensions, but it's not
> > something that my present patch set requires.
>
> I don't store CE_VALID on disk. Instead I store a new bit CE_WATCHED.
> Only after loading and validating against watchman that I turn
> CE_WATCHED to CE_VALID in memory. So JGit, libgit2 are not confused.
>
> I assume you won't change your mind about this. Which is fine to me. I
> will still try out my approach with your libwatchman though. Just
> curious about its performance and complexity, compared to your
> approach.
I am open-minded here. This code is really the first time I have looked
at git's internals, and I accept that your way might be better. If
you're going to try the watchman version of your approach, then we do a
direct comparison. Let me know if there is something I can do to help
out on that.
> A bit off topic, but msys fork has another "fscache" in compat/win32.
> If you could come up with a different name, maybe it'll be less
> confusing for them after merging. But this is not a big deal, as this
> fscache won't work on windows anyway.
Does wtcache sounds like a better name?
next prev parent reply other threads:[~2014-05-15 19:43 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-02 23:14 Watchman support for git dturner
2014-05-02 23:14 ` [PATCH 1/3] After chdir to run grep, return to old directory dturner
2014-05-06 22:24 ` Junio C Hamano
2014-05-07 0:06 ` David Turner
2014-05-07 3:00 ` Jeff King
2014-05-07 3:33 ` David Turner
2014-05-07 17:42 ` Junio C Hamano
2014-05-07 20:57 ` David Turner
2014-05-02 23:14 ` [PATCH 3/3] Watchman support dturner
2014-05-02 23:20 ` Watchman support for git Felipe Contreras
2014-05-03 2:24 ` David Turner
2014-05-03 3:40 ` Felipe Contreras
2014-05-05 18:08 ` David Turner
2014-05-05 18:14 ` Felipe Contreras
2014-05-08 19:17 ` Sebastian Schuberth
2014-05-09 7:08 ` David Lang
2014-05-09 17:17 ` David Turner
2014-05-09 18:08 ` David Lang
2014-05-09 18:17 ` David Turner
2014-05-09 18:27 ` David Lang
2014-05-09 18:47 ` David Turner
2014-05-03 0:52 ` Duy Nguyen
2014-05-03 4:39 ` David Turner
2014-05-03 8:49 ` Duy Nguyen
2014-05-03 20:49 ` David Turner
2014-05-04 0:15 ` Duy Nguyen
2014-05-06 3:13 ` David Turner
2014-05-06 0:26 ` Duy Nguyen
2014-05-06 0:30 ` Duy Nguyen
2014-05-10 5:26 ` Duy Nguyen
2014-05-10 18:38 ` David Turner
2014-05-11 0:21 ` Duy Nguyen
2014-05-11 22:56 ` David Turner
2014-05-12 10:45 ` Duy Nguyen
2014-05-13 22:38 ` David Turner
2014-05-13 22:54 ` Duy Nguyen
2014-05-13 23:19 ` David Turner
2014-05-10 8:16 ` Duy Nguyen
2014-05-13 23:44 ` David Turner
2014-05-14 10:36 ` Duy Nguyen
2014-05-14 10:52 ` Duy Nguyen
2014-05-15 19:42 ` David Turner [this message]
2014-05-19 10:10 ` Duy Nguyen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1400182971.14179.49.camel@stross \
--to=dturner@twopensource.com \
--cc=git@vger.kernel.org \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.