From: Thomas Rast <trast@student.ethz.ch>
To: Thomas Gummerer <t.gummerer@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>, <git@vger.kernel.org>,
<mhagger@alum.mit.edu>, Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Subject: Re: [GSoC] Designing a faster index format - Progress report
Date: Tue, 29 May 2012 15:29:09 +0200 [thread overview]
Message-ID: <87vcjfi09m.fsf@thomas.inf.ethz.ch> (raw)
In-Reply-To: <CACsJy8D+WgEr4i2H-1oiBLY5oLurM0aNxGovbVEZDvr7OGgknw@mail.gmail.com> (Nguyen Thai Ngoc Duy's message of "Sun, 27 May 2012 19:23:07 +0700")
Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
> On Sun, May 27, 2012 at 4:27 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Thomas Gummerer <t.gummerer@gmail.com> writes:
>>
>>> Ah ok, thanks for the clarification, I understand what you meant now.
>>> I think however, that it's not very beneficial to do this conversion
>>> now. git ls-files needs the whole index file anyway, so it's probably
>>> not a very good test.
>>
>> Think about "git ls-files t/" and "git ls-files -u".
>
> Or harder things like "ls-files -- 't/*.sh'"
>
>> The former obviously does *not* have to look at the whole thing, even
>> though the current code assumes the in-core data structure that has the
>> whole thing in a flat array. IIRC, you had unmerged entries tucked at the
>> end outside the main index data, so the latter is also an interesting
>> demonstration of how wonderful the new data format could be.
>
> and "ls-files -uc" can show how you combine unmerged entries back.
> There's also entry existence check deep in "ls-files -o" that you can
> show how good bsearch on trees is, though that might be going too far
> for an experiment because the call chain is really deep, way outside
> ls-files.c:
>a
> show_files (builtin/ls-files.c)
> fill_directory (dir.c)
> read_directory
> read_directory_recursive
> treat_path
> treat_one_path
> treat_directory
> directory_exists_in_index
> cache_pos_name (read-cache.c)
>
> I just want to make sure that by exercising the new format with some
> real problems, we are certain we don't overlook anything in designing
> the format (or else could be fixed before finalizing it).
I envision an index API that more strictly controls access to the index.
Right now the API consists largely of read_index, write_index and the
flat the_index->cache array of entries. Eventually it will have to be a
family of calls that support the v5 format, and boil down to suitable
wrappers for older ones. For example (just tossing up ideas):
index_open(struct index_state *index, int fd):
initialization, checking, leaves the "real" data fields empty
index_load_filtered(..., const char **pathspec):
load everything needed to satisfy queries filtered by 'pathspec'
index_for_each_entry(..., void (*callback)(struct cache_entry *ent)):
like the current hand-rolled looping
index_for_each_entry_filtered(..., void (*callback)(struct cache_entry *ent), char **pathspec):
ditto but for a pathspec lookup
etc.
Then I will twist Duy's words to mean that you should make git-ls-files
the poster child of this new API for development and profiling purposes
:-)
Actually converting the rest of the git code base to such an API is too
big an undertaking for the summer, so please don't stray on that path.
--
Thomas Rast
trast@{inf,student}.ethz.ch
next prev parent reply other threads:[~2012-05-29 13:29 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-23 12:21 [GSoC] Designing a faster index format - Progress report Thomas Gummerer
2012-05-24 20:01 ` Thomas Rast
2012-05-24 20:57 ` Junio C Hamano
2012-05-25 11:31 ` Nguyen Thai Ngoc Duy
2012-05-25 20:15 ` Thomas Gummerer
2012-05-26 4:09 ` Nguyen Thai Ngoc Duy
2012-05-27 9:04 ` Thomas Gummerer
2012-05-27 9:27 ` Junio C Hamano
2012-05-27 12:23 ` Nguyen Thai Ngoc Duy
2012-05-28 8:26 ` Thomas Gummerer
2012-05-29 13:29 ` Thomas Rast [this message]
2012-05-29 13:43 ` Nguyen Thai Ngoc Duy
2012-05-29 18:33 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87vcjfi09m.fsf@thomas.inf.ethz.ch \
--to=trast@student.ethz.ch \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mhagger@alum.mit.edu \
--cc=pclouds@gmail.com \
--cc=t.gummerer@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.