All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>,
	Michael Haggerty <mhagger@alum.mit.edu>,
	git discussion list <git@vger.kernel.org>,
	David Turner <dturner@twopensource.com>
Subject: Re: RFC: New reference iteration paradigm
Date: Sat, 26 May 2018 19:25:32 +0200	[thread overview]
Message-ID: <86h8mu1g8j.fsf@gmail.com> (raw)
In-Reply-To: <20160331193150.GC5013@sigill.intra.peff.net> (Jeff King's message of "Thu, 31 Mar 2016 15:31:51 -0400")

Jeff King <peff@peff.net> writes:
> On Thu, Mar 31, 2016 at 11:01:44AM -0700, Junio C Hamano wrote:
>> Michael Haggerty <mhagger@alum.mit.edu> writes:
>> 
>>> the backend now has to implement
>>>
>>>> struct ref_iterator *ref_iterator_begin_fn(const char *submodule,
>>>>                                            const char *prefix,
>>>>                                            unsigned int flags);

The problem with callbacks, including their lack of composability, is
often solved in high-level languages by using Promises / Futures (or
Channels / Supplies).

But I think in this case implementing a straightforward Iterator pattern
would be a better and simpler solution, especially in C.

>>> The ref_iterator itself has to implement two main methods:
>>>
>>>> int iterator_advance_fn(struct ref_iterator *ref_iterator);
>>>> void iterator_free_fn(struct ref_iterator *ref_iterator);
>>>
>>> A loop over references now looks something like
>>>
>>>> struct ref_iterator *iter = each_ref_in_iterator("refs/tags/");
>>>> while (ref_iterator_advance(iter)) {
>>>>         /* code using iter->refname, iter->oid, iter->flags */
>>>> }
>> 
>> We'd want to take advantage of the tree-like organization of the
>> refs (i.e. refs/tags/a and refs/tags/b sit next to each other and
>> they are closer to each other than they are to refs/heads/a) so that
>> a request "I want to iterate only over tags, even though I may have
>> millions of other kinds of refs" can be done with cost that is
>> proportional to how many tags you have.
>> 
>> The current implementation of for_each_tag_ref() that goes down to
>> do_for_each_entry() in files-backend.c has that propertly, and the
>> new iteration mechanism with the above design seems to keep it,
>> which is very nice.
>
> Actually, that is a slight fiction. :)
>
> We traverse only the loose ref directories we need, but we populate the
> entire packed-refs tree in one go. That's fine if you have a reasonable
> number of refs and care about the cold-cache case (where you care a lot
> about hitting directories you don't need to, but reading the entire
> packed-refs file isn't a big deal). But it's really bad if you have an
> 800MB packed-refs file, as looking up one tiny subset of the entries
> wastes a lot of RAM and CPU pulling that into our internal
> representation[1].
>
> At one point I wrote a patch to binary search the packed-refs file, find
> the first "refs/tags/" entry, and then walk linearly through there. What
> stopped me is that the current refs.c code (I guess file-backend.c these
> days) was not happy with me partially filling in the ref_dir structs in
> this "inside out" way.
[...]

Isn't this what reftable - an alternative way of storing refs in Git,
tested by being used by JGit - would solve?  See Christian Couder post
"Implementing reftable in Git"

  https://public-inbox.org/git/CAP8UFD0PPZSjBnxCA7ez91vBuatcHKQ+JUWvTD1iHcXzPBjPBg@mail.gmail.com/t/#u

'Efficient lookup of an entire namespace, such as refs/tags/' is
allegedly one of the objectives of this format.

Regards,
-- 
Jakub Narębski

  parent reply	other threads:[~2018-05-26 17:30 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-31 16:13 RFC: New reference iteration paradigm Michael Haggerty
2016-03-31 18:01 ` Junio C Hamano
2016-03-31 19:31   ` Jeff King
2016-03-31 20:08     ` Junio C Hamano
2018-05-26 17:25     ` Jakub Narebski [this message]
2018-05-29 16:52       ` Jeff King
2016-03-31 20:15 ` David Turner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86h8mu1g8j.fsf@gmail.com \
    --to=jnareb@gmail.com \
    --cc=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.