From: Junio C Hamano <gitster@pobox.com>
To: Michael Haggerty <mhagger@alum.mit.edu>
Cc: "Jeff King" <peff@peff.net>, "Brodie Rao" <brodie@sf.io>,
git@vger.kernel.org, "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: [PATCH v2 4/5] get_sha1: speed up ambiguous 40-hex test
Date: Thu, 09 Jan 2014 10:25:44 -0800 [thread overview]
Message-ID: <xmqq61ptau6v.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <52CD7835.2020708@alum.mit.edu> (Michael Haggerty's message of "Wed, 08 Jan 2014 17:09:25 +0100")
Michael Haggerty <mhagger@alum.mit.edu> writes:
> It's not only racy WRT other processes. If the current git process
> would create a new reference, it wouldn't be reflected in the cache.
>
> It's true that the main ref_cache doesn't invalidate itself
> automatically either when a new reference is created, so it's not really
> a fair complaint. However, as we add places where the cache is
> invalidated, it is easy to overlook this cache that is stuck in static
> variables within a function definition and it is impossible to
> invalidate it. Might it not be better to attach the cache to the
> ref_cache structure instead, and couple its lifetime to that object?
>
> Alternatively, the cache could be created and managed on the caller
> side, since the caller would know when the cache would have to be
> invalidated. Also, different callers are likely to have different
> performance characteristics. It is unlikely that the time to initialize
> the cache will be amortized in most cases; in fact, "rev-list --stdin"
> might be the *only* plausible use case.
True.
> Regarding the overall strategy: you gather all refnames that could be
> confused with an SHA-1 into a sha1_array, then later look up SHA-1s in
> the array to see if they are ambiguous. This is a very special-case
> optimization for SHA-1s.
>
> I wonder whether another approach would gain almost the same amount of
> performance but be more general. We could change dwim_ref() (or a
> version of it?) to read its data out of a ref_cache instead of going to
> disk every time. Then, at the cost of populating the relevant parts of
> the ref_cache once, we would have fast dwim_ref() calls for all strings.
If opendir-readdir to grab only the names (but not values) of many
refs is a lot faster than stat-open-read a handful of dwim-ref
locations for a given name, that optimization might be worthwhile,
but I think that requires an update to read_loose_refs() not to
read_ref_full() and the users of refs API to instead lazily resolve
the refs, no?
If I ask for five names (say 'maint', 'master', 'next', 'pu',
'jch'), the current code will do 5 dwim_ref()s, each of which will
consult 6 locations with resolve_ref_unsafe(), totalling 30 calls to
resolve_ref_unsafe(), each of which in turn is essentially an open
followed by either an return on ENOENT or a read. So 30 opens and 5
reads in total.
With your lazy ref_cache scheme, instead we would enumerate all the
loose ones in the same 6 directories (e.g. refs/tags/, refs/heads),
so 6 opendir()s with as many readdir()s as I have loose refs, plus
we open-read them in read_loose_refs() called from get_ref_dir()
with the current ref_cache code. For me, "find .git/refs/heads"
gives 500+ lines of output, which suggests that using the ref_cache
mechanism for dwim_ref() may not be a huge win, unless it is updated
to be extremely lazy, and readdir()s turns out to be extremely less
heavier than open-read. Also it is unlikely that the cost to
initialize the cache is amortized to be a net win unless we are
dealing with tons of dwim_ref()s.
next prev parent reply other threads:[~2014-01-09 18:25 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-07 3:32 [PATCH] sha1_name: don't resolve refs when core.warnambiguousrefs is false Brodie Rao
2014-01-07 3:35 ` Brodie Rao
2014-01-07 17:13 ` Jeff King
2014-01-07 17:51 ` Junio C Hamano
2014-01-07 17:52 ` Jeff King
2014-01-07 19:38 ` Junio C Hamano
2014-01-07 19:58 ` Jeff King
2014-01-07 20:31 ` Junio C Hamano
2014-01-07 22:08 ` Jeff King
2014-01-07 22:10 ` [PATCH 1/4] cat-file: refactor error handling of batch_objects Jeff King
2014-01-07 22:10 ` [PATCH 2/4] cat-file: fix a minor memory leak in batch_objects Jeff King
2014-01-07 22:10 ` [PATCH 3/4] cat-file: restore ambiguity warning flag " Jeff King
2014-01-07 22:11 ` [PATCH 4/4] revision: turn off object/refname ambiguity check for --stdin Jeff King
2014-01-07 23:56 ` [PATCH v2] speeding up 40-hex ambiguity check Jeff King
2014-01-07 23:57 ` [PATCH v2 1/5] cat-file: refactor error handling of batch_objects Jeff King
2014-01-07 23:57 ` [PATCH v2 2/5] cat-file: fix a minor memory leak in batch_objects Jeff King
2014-01-07 23:58 ` [PATCH v2 3/5] refs: teach for_each_ref a flag to avoid recursion Jeff King
2014-01-08 3:47 ` [PATCH v3 " Jeff King
2014-01-08 10:23 ` Jeff King
2014-01-08 11:29 ` Michael Haggerty
2014-01-09 21:49 ` Jeff King
2014-01-10 8:59 ` Michael Haggerty
2014-01-10 9:15 ` Jeff King
2014-01-09 17:51 ` Junio C Hamano
2014-01-09 21:55 ` Jeff King
2014-01-07 23:59 ` [PATCH v2 4/5] get_sha1: speed up ambiguous 40-hex test Jeff King
2014-01-08 16:09 ` Michael Haggerty
2014-01-09 18:25 ` Junio C Hamano [this message]
2014-01-10 9:41 ` Jeff King
2014-01-14 9:50 ` Jeff King
2014-01-14 11:34 ` Michael Haggerty
2014-01-08 0:00 ` [PATCH v2 5/5] get_sha1: drop object/refname ambiguity flag Jeff King
2014-01-08 16:34 ` Michael Haggerty
2014-01-07 6:45 ` [PATCH] sha1_name: don't resolve refs when core.warnambiguousrefs is false Duy Nguyen
2014-01-07 17:24 ` Junio C Hamano
2014-01-07 19:23 ` Brodie Rao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqq61ptau6v.fsf@gitster.dls.corp.google.com \
--to=gitster@pobox.com \
--cc=brodie@sf.io \
--cc=git@vger.kernel.org \
--cc=mhagger@alum.mit.edu \
--cc=pclouds@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.