From: Michael Haggerty <mhagger@alum.mit.edu>
To: Jeff King <peff@peff.net>
Cc: "Junio C Hamano" <gitster@pobox.com>, "Brodie Rao" <brodie@sf.io>,
git@vger.kernel.org, "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: [PATCH v2 4/5] get_sha1: speed up ambiguous 40-hex test
Date: Tue, 14 Jan 2014 12:34:37 +0100 [thread overview]
Message-ID: <52D520CD.7070902@alum.mit.edu> (raw)
In-Reply-To: <20140114095002.GA32258@sigill.intra.peff.net>
On 01/14/2014 10:50 AM, Jeff King wrote:
> On Fri, Jan 10, 2014 at 04:41:20AM -0500, Jeff King wrote:
>
>> That being said, we could further optimize this by not opening the files
>> at all (and make that the responsibility of do_one_ref, which we are
>> avoiding here). I am slightly worried about the open() cost of my
>> solution. It's amortized away in a big call, but it is probably
>> noticeable for something like `git rev-parse <40-hex>`.
>
> I took a look at this. It gets a bit hairy. My strategy is to add a flag
> to ask read_loose_refs to create REF_INCOMPLETE values. We currently use
> this flag for loose REF_DIRs to mean "we haven't opendir()'d the
> subdirectory yet". This would extend it to the non-REF_DIR case to mean
> "we haven't opened the loose ref file yet". We'd check REF_INCOMPLETE
> before handing the ref_entry to a callback, and complete it if
> necessary.
>
> It gets ugly, though, because we need to pass that flag through quite a
> bit of callstack. get_ref_dir() needs to know it, which means all of
> find_containing_dir, etc need it, meaning it pollutes all of the
> packed-refs code paths too.
>
> I have a half-done patch in this direction if that doesn't sound too
> nasty.
A long time ago I write a patch series to allow incomplete reading of
references, but my version *always* read them lazily, so it was much
simpler (no need to pass a new option down the call stack). It didn't
seem to speed things up in general, so I never submitted it.
Reading lazily only from particular callers is more complicated, and I
can see how it would get messy.
Given the race avoidance needed between packed/loose references, lazy
reading would mean that after each reference is read, the packed-refs
file would need to be stat()ted again to make sure that it hasn't been
changed since the last check. I know this isn't an issue for your use
case, because you plan *never* to read the file contents. But it does
increase the price of lazy reference reading to most callers.
On the other hand, if we ever go in the direction of routing *all*
reference lookups--including lookups of single references--through the
cache, then lazy reading of references probably becomes essential to
avoid populating more of the cache than necessary.
>>> This doesn't correctly handle the rule
>>>
>>> "refs/remotes/%.*s/HEAD"
>> [...]
>
>> I'll see how painful it is to make it work.
>
> It's actually reasonably painful. I thought at first we could get away
> with more cleverly parsing the rule, find the prefix (up to the
> placeholder), and then look for the suffix ("/HEAD") inside there. But
> it can never work with the current do_for_each_* code. That code only
> triggers a callback when we see a concrete ref. It _never_ lets the
> callbacks see an intermediate directory.
>
> So a NO_RECURSE flag is not sufficient to handle this case. I'd need to
> teach do_for_each_ref to recurse based on pathspecs, or a custom
> callback function. And that is getting quite complicated.
Another possibility would be to have an "int recurse" parameter rather
than "bool recurse", telling how many levels to recurse. Then one could
do a
do_for_each_ref(..., "refs/remotes", ..., recurse=2)
to get all of the refs/remotes/*/HEAD references. Though since all of
the heads for a remote are also siblings of "refs/remotes/foo/HEAD", it
could still involve a lot of superfluous file reading. And the integer
wouldn't fit conveniently in the flags parameter.
> I think it might be simpler to just do my own custom traversal. What I
> need is much simpler than what do_for_each_entry provides. I don't need
> recursion, and I don't actually need to look at the loose and packed
> refs together. It's OK for me to do them one at a time because I don't
> care about the actual value; I just want to know about which refs exist.
Yes. Still, the code is really piling up for this one warning for the
contrived eventuality that somebody wants to pass SHA-1s and branch
names together in a single cat-file invocation *and* wants to pass lots
of inputs at once and so is worried about performance *and* has
reference names that look like SHA-1s. Otherwise we could just leave
the warning disabled in this case, as now. Or we could add a new
"--hashes-only" option that tells cat-file to treat all of its
arguments/inputs as SHA-1s; such an option would permit an even faster
code path for bulk callers.
Michael
--
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/
next prev parent reply other threads:[~2014-01-14 11:34 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-07 3:32 [PATCH] sha1_name: don't resolve refs when core.warnambiguousrefs is false Brodie Rao
2014-01-07 3:35 ` Brodie Rao
2014-01-07 17:13 ` Jeff King
2014-01-07 17:51 ` Junio C Hamano
2014-01-07 17:52 ` Jeff King
2014-01-07 19:38 ` Junio C Hamano
2014-01-07 19:58 ` Jeff King
2014-01-07 20:31 ` Junio C Hamano
2014-01-07 22:08 ` Jeff King
2014-01-07 22:10 ` [PATCH 1/4] cat-file: refactor error handling of batch_objects Jeff King
2014-01-07 22:10 ` [PATCH 2/4] cat-file: fix a minor memory leak in batch_objects Jeff King
2014-01-07 22:10 ` [PATCH 3/4] cat-file: restore ambiguity warning flag " Jeff King
2014-01-07 22:11 ` [PATCH 4/4] revision: turn off object/refname ambiguity check for --stdin Jeff King
2014-01-07 23:56 ` [PATCH v2] speeding up 40-hex ambiguity check Jeff King
2014-01-07 23:57 ` [PATCH v2 1/5] cat-file: refactor error handling of batch_objects Jeff King
2014-01-07 23:57 ` [PATCH v2 2/5] cat-file: fix a minor memory leak in batch_objects Jeff King
2014-01-07 23:58 ` [PATCH v2 3/5] refs: teach for_each_ref a flag to avoid recursion Jeff King
2014-01-08 3:47 ` [PATCH v3 " Jeff King
2014-01-08 10:23 ` Jeff King
2014-01-08 11:29 ` Michael Haggerty
2014-01-09 21:49 ` Jeff King
2014-01-10 8:59 ` Michael Haggerty
2014-01-10 9:15 ` Jeff King
2014-01-09 17:51 ` Junio C Hamano
2014-01-09 21:55 ` Jeff King
2014-01-07 23:59 ` [PATCH v2 4/5] get_sha1: speed up ambiguous 40-hex test Jeff King
2014-01-08 16:09 ` Michael Haggerty
2014-01-09 18:25 ` Junio C Hamano
2014-01-10 9:41 ` Jeff King
2014-01-14 9:50 ` Jeff King
2014-01-14 11:34 ` Michael Haggerty [this message]
2014-01-08 0:00 ` [PATCH v2 5/5] get_sha1: drop object/refname ambiguity flag Jeff King
2014-01-08 16:34 ` Michael Haggerty
2014-01-07 6:45 ` [PATCH] sha1_name: don't resolve refs when core.warnambiguousrefs is false Duy Nguyen
2014-01-07 17:24 ` Junio C Hamano
2014-01-07 19:23 ` Brodie Rao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52D520CD.7070902@alum.mit.edu \
--to=mhagger@alum.mit.edu \
--cc=brodie@sf.io \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=pclouds@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).