Re: What's cooking in git.git (Jun 2016, #05; Thu, 16)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Junio C Hamano <gitster@pobox.com>
To: "Torsten Bögershausen" <tboegi@web.de>
Cc: Joey Hess <id@joeyh.name>, git@vger.kernel.org
Subject: Re: What's cooking in git.git (Jun 2016, #05; Thu, 16)
Date: Thu, 14 Jul 2016 11:17:44 -0700	[thread overview]
Message-ID: <xmqqtwfsqe2f.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <af41e13f-0320-2e55-a6ac-3fdb46f0bb35@web.de> ("Torsten Bögershausen"'s message of "Thu, 14 Jul 2016 04:09:02 +0200")

Torsten Bögershausen <tboegi@web.de> writes:

>> There's less redundant work going on than at first seems, because
>> .gitattribute files are only read once and cached. Verified by strace.
>>
> OK, I think I missed that work (not enough time for Git at the moment)
> Junio, please help me out, do we have a cache here now?

The call convert_attrs() makes to the attribute subsystem is:

    git_check_attr(path, NUM_CONV_ATTRS, ccheck)

Conceptually, this call has to do the following, and for the very
first call that is actually what it does:

 1. Read all the relevant attrubutes files.  If you are asking for
    "a/x/1", we need to read $GIT_DIR/info/attributes,
    ".gitattributes", "a/.gitattributes" and "a/x/.gitattributes".

 2. Find matching patterns that cover "a/x/1", and pick up the
    attribute definition from the above.

If you have asked for "a/x/1", it is very likely that you would next
ask for "a/x/2" (think: "git checkout a/x"), and we can realize that
exactly the same set of attributes files apply to that path.  So an
obvious optimization is to cache the result of the first step.

In addition, it is also likely that you would later ask for "a/y/3"
before asking for "b/z/4" (think: "git add .").  A part of the step
1. that was done when you asked for "a/x/1" and then was reused when
you asked for "a/x/2" can further be reused for this request, by
discarding only what was read from "a/x/.gitattributes" and reading
only from "a/y/.gitattributes".

The above two optimizations are done in prepare_attr_stack().

Unfortunately this is one of the three reasons why the attribute
subsystem is not thread-ready.  I.e. there is only one global cache,
so if you spawn two threads to speed up "git add ." by handing paths
[a-m]* to one and [n-z]* to the other, they would thrash the cache
and making it ineffective (even if we protect the cache with mutex,
which obviously has not been done).

I earlier started looking into this, but the effort stalled.  But
for a single-thread use, the attributes read from the filesystem are
cached, and the cache is designed to perform well as long as you
make requests in-order.

To make the attribute look-up thread-ready, the attribute cache
needs to become per-thread.

Orthogonal of the threading issue, there is another posssible
optimization that is not there yet.  The cache can be tied to what
is in ccheck[] to further reduce the size of the cache, making step
2. a lot cheaper.  Currently in step 1. we read and keep everything,
but if we tie the cache to the contents of ccheck[], we can read and
ignore entries we read in step 1. that does not talk about the
attributes ccheck[] is interested in.  My plan is to either

 (1) make the cache per-thread, limit the reading done in 1. to
     ccheck[], but invalidate the cache when a different set of
     attributes are asked; or

 (2) make the cache per <thread, ccheck[]>.

     prev parent reply	other threads:[~2016-07-14 18:17 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-17  3:20 What's cooking in git.git (Jun 2016, #05; Thu, 16) Junio C Hamano
2016-06-17 13:25 ` Pranit Bauva
2016-06-17 17:55 ` Vasco Almeida
2016-06-17 22:05 ` Lars Schneider
2016-06-17 22:17   ` Junio C Hamano
2016-06-18 17:09   ` Michael Haggerty
2016-06-19  7:59   ` Michael Haggerty
2016-06-19 15:04     ` Lars Schneider
2016-06-19 16:11       ` Lars Schneider
2016-06-19 18:13         ` Junio C Hamano
2016-06-19 18:49           ` Lars Schneider
2016-06-19 18:53             ` Lars Schneider
2016-06-19 18:09     ` Junio C Hamano
2016-06-19 23:51       ` Junio C Hamano
2016-06-20  7:57         ` Lars Schneider
2016-06-23  7:32           ` Michael Haggerty
2016-06-27  7:09             ` Lars Schneider
2016-06-27 16:29             ` Junio C Hamano
2016-06-28  9:23             ` Michael Haggerty
2016-06-28 17:44               ` Junio C Hamano
2016-06-18  4:18 ` Michael Haggerty
2016-06-18 18:20   ` Junio C Hamano
2016-06-19  8:15     ` Michael Haggerty
2016-06-19 18:07       ` Junio C Hamano
2016-06-20  6:06 ` Torsten Bögershausen
2016-06-20 20:06   ` Junio C Hamano
2016-06-22 21:09   ` Joey Hess
2016-06-23 13:13     ` Torsten Bögershausen
2016-07-12 22:20       ` Joey Hess
2016-07-14  2:09         ` Torsten Bögershausen
2016-07-14 18:17           ` Junio C Hamano [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqtwfsqe2f.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=id@joeyh.name \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.