From: Michael Haggerty <mhagger@alum.mit.edu>
To: Jeff King <peff@peff.net>
Cc: "Junio C Hamano" <gitster@pobox.com>, "Brodie Rao" <brodie@sf.io>,
git@vger.kernel.org, "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: [PATCH v3 3/5] refs: teach for_each_ref a flag to avoid recursion
Date: Fri, 10 Jan 2014 09:59:25 +0100 [thread overview]
Message-ID: <52CFB66D.5070800@alum.mit.edu> (raw)
In-Reply-To: <20140109214926.GA32069@sigill.intra.peff.net>
On 01/09/2014 10:49 PM, Jeff King wrote:
> On Wed, Jan 08, 2014 at 12:29:51PM +0100, Michael Haggerty wrote:
>
>>> Here's a fixed version of patch 3/5.
>>
>> v2 4/5 doesn't apply cleanly on top of v3 3/5. So I'm basing my review
>> on the branch you have at GitHub peff/git "jk/cat-file-warn-ambiguous";
>> I hope it is the same.
>
> Hrmph. I didn't have to do any conflict resolution during the rebase, so
> I would think it would apply at least with "am -3".
That may be; I didn't try with "-3".
> [...]
>>> static int do_for_each_entry(struct ref_cache *refs, const char *base,
>>> - each_ref_entry_fn fn, void *cb_data)
>>> + each_ref_entry_fn fn, void *cb_data,
>>> + int flags)
>>> {
>>> struct packed_ref_cache *packed_ref_cache;
>>> struct ref_dir *loose_dir;
>>
>> A few lines after this, do_for_each_entry() calls
>> prime_ref_dir(loose_dir) to ensure that all of the loose references that
>> will be iterated over are read before the packed-refs file is checked.
>> It seems to me that prime_ref_dir() should also get a flags parameter to
>> prevent it reading more loose references than necessary, something like
>> this:
>
> Hmm. I hadn't considered that, but yeah, it definitely nullifies part of
> the purpose of the optimization.
>
> However, is it safe to prime only part of the loose ref namespace? The
> point of that priming is to avoid the race fixed in 98eeb09, which
> depends on us caching the loose refs before the packed refs. But when we
> read packed-refs, we will be reading and storing _all_ of it, even if we
> do not touch it in this traversal. So it does not affect the race for
> this traversal, but have we setup a cache situation where a subsequent
> for_each_ref in the same process would be subject to the race?
prime_ref_dir() is called by do_for_each_entry(), which all the
iteration functions pass through. It is always called before the
iteration starts, and it primes only the subtree of the refs hierarchy
that is being iterated over. For example, if iterating over
"refs/heads" then it only primes references with that prefix.
This is OK, because if later somebody iterates over a broader part of
the refs hierarchy (say, "refs"), then priming is done again, including
re-checking the packed refs. If the packed-refs file was changed
between the iterations, then the first iteration (if it is still
running) continues using the old packed-refs cache while the second
iteration uses the new packed-refs cache. (So the first iteration will
have a stale, but self-consistent, view of the references.)
If do_for_each_entry() gets the DO_FOR_EACH_NO_RECURSE option, then it
knows that it will only traverse one level of the refs hierarchy. So if
it passes the option to prime_ref_dir(), then the same level will be
primed. If somebody later iterates over the same part of the hierarchy
without DO_FOR_EACH_NO_RECURSE, they will re-prime without
DO_FOR_EACH_NO_RECURSE then, if the packed-refs file has been changed,
load a new version of it. So they will also get a self-consistent view
of the references and I think everything will be OK.
> I'm starting to wonder if this optimization is worth it.
It's true that this is quite a special-case optimization.
I think reference handling will have to move in the direction of
transactions, to remove one or two known race conditions. That is why I
described the alternative of having the DWIM function do its lookups in
a ref cache. It would move in the direction of consciously taking a
snapshot of the ref tree and using it for a whole "transaction", which I
think is a style that we will want to use in more places. It's just
hard to judge whether this alternative would actually solve the
performance problem that you were originally trying to address--a point
that Junio discussed elsewhere in this thread.
(This is something I would be willing to work on if you feel like I am
pushing you to enlarge the scope of your work beyond what you are
interested in.)
>>> [...]
>>> @@ -1718,7 +1732,7 @@ static int do_for_each_ref(struct ref_cache *refs, const char *base,
>>> data.fn = fn;
>>> data.cb_data = cb_data;
>>>
>>> - return do_for_each_entry(refs, base, do_one_ref, &data);
>>> + return do_for_each_entry(refs, base, do_one_ref, &data, flags);
>>> }
>>
>> This change makes the DO_FOR_EACH_NO_RECURSE option usable with
>> do_for_each_ref() (even though it is never in fact used). It should
>> either be mentioned in the docstring or (if there is a reason not to
>> allow it) explicitly prohibited.
>
> Hrm, yeah. I guess there are no callers, and there is no plan for any.
> So we could just pass "0" here, and then "flags" passed to
> do_for_each_ref really is _just_ for the callback data that goes to
> do_one_ref. That clears up the weird "combined namespace" stuff I
> mentioned in the commit message, and is a bit cleaner. I'll take it in
> that direction.
It would also be possible to swing in the other direction. I don't
remember a particular reason why I left the DO_FOR_EACH_INCLUDE_BROKEN
handling at the do_for_each_ref() level rather than handling it at the
do_for_each_entry() level. But now that you are passing the flags
parameter all the way down the call stack, it wouldn't cost anything to
support both of the DO_FOR_EACH flags everywhere and just document it
that way.
> [...]
Michael
--
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/
next prev parent reply other threads:[~2014-01-10 8:59 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-07 3:32 [PATCH] sha1_name: don't resolve refs when core.warnambiguousrefs is false Brodie Rao
2014-01-07 3:35 ` Brodie Rao
2014-01-07 17:13 ` Jeff King
2014-01-07 17:51 ` Junio C Hamano
2014-01-07 17:52 ` Jeff King
2014-01-07 19:38 ` Junio C Hamano
2014-01-07 19:58 ` Jeff King
2014-01-07 20:31 ` Junio C Hamano
2014-01-07 22:08 ` Jeff King
2014-01-07 22:10 ` [PATCH 1/4] cat-file: refactor error handling of batch_objects Jeff King
2014-01-07 22:10 ` [PATCH 2/4] cat-file: fix a minor memory leak in batch_objects Jeff King
2014-01-07 22:10 ` [PATCH 3/4] cat-file: restore ambiguity warning flag " Jeff King
2014-01-07 22:11 ` [PATCH 4/4] revision: turn off object/refname ambiguity check for --stdin Jeff King
2014-01-07 23:56 ` [PATCH v2] speeding up 40-hex ambiguity check Jeff King
2014-01-07 23:57 ` [PATCH v2 1/5] cat-file: refactor error handling of batch_objects Jeff King
2014-01-07 23:57 ` [PATCH v2 2/5] cat-file: fix a minor memory leak in batch_objects Jeff King
2014-01-07 23:58 ` [PATCH v2 3/5] refs: teach for_each_ref a flag to avoid recursion Jeff King
2014-01-08 3:47 ` [PATCH v3 " Jeff King
2014-01-08 10:23 ` Jeff King
2014-01-08 11:29 ` Michael Haggerty
2014-01-09 21:49 ` Jeff King
2014-01-10 8:59 ` Michael Haggerty [this message]
2014-01-10 9:15 ` Jeff King
2014-01-09 17:51 ` Junio C Hamano
2014-01-09 21:55 ` Jeff King
2014-01-07 23:59 ` [PATCH v2 4/5] get_sha1: speed up ambiguous 40-hex test Jeff King
2014-01-08 16:09 ` Michael Haggerty
2014-01-09 18:25 ` Junio C Hamano
2014-01-10 9:41 ` Jeff King
2014-01-14 9:50 ` Jeff King
2014-01-14 11:34 ` Michael Haggerty
2014-01-08 0:00 ` [PATCH v2 5/5] get_sha1: drop object/refname ambiguity flag Jeff King
2014-01-08 16:34 ` Michael Haggerty
2014-01-07 6:45 ` [PATCH] sha1_name: don't resolve refs when core.warnambiguousrefs is false Duy Nguyen
2014-01-07 17:24 ` Junio C Hamano
2014-01-07 19:23 ` Brodie Rao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52CFB66D.5070800@alum.mit.edu \
--to=mhagger@alum.mit.edu \
--cc=brodie@sf.io \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=pclouds@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).