git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Rast <trast@student.ethz.ch>
To: Thomas Gummerer <t.gummerer@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>, <git@vger.kernel.org>,
	<mhagger@alum.mit.edu>, Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Subject: Re: [GSoC] Designing a faster index format - Progress report
Date: Tue, 29 May 2012 15:29:09 +0200	[thread overview]
Message-ID: <87vcjfi09m.fsf@thomas.inf.ethz.ch> (raw)
In-Reply-To: <CACsJy8D+WgEr4i2H-1oiBLY5oLurM0aNxGovbVEZDvr7OGgknw@mail.gmail.com> (Nguyen Thai Ngoc Duy's message of "Sun, 27 May 2012 19:23:07 +0700")

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

> On Sun, May 27, 2012 at 4:27 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Thomas Gummerer <t.gummerer@gmail.com> writes:
>>
>>> Ah ok, thanks for the clarification, I understand what you meant now.
>>> I think however, that it's not very beneficial to do this conversion
>>> now. git ls-files needs the whole index file anyway, so it's probably
>>> not a very good test.
>>
>> Think about "git ls-files t/" and "git ls-files -u".
>
> Or harder things like "ls-files -- 't/*.sh'"
>
>> The former obviously does *not* have to look at the whole thing, even
>> though the current code assumes the in-core data structure that has the
>> whole thing in a flat array.  IIRC, you had unmerged entries tucked at the
>> end outside the main index data, so the latter is also an interesting
>> demonstration of how wonderful the new data format could be.
>
> and "ls-files -uc" can show how you combine unmerged entries back.
> There's also entry existence check deep in "ls-files -o" that you can
> show how good bsearch on trees is, though that might be going too far
> for an experiment because the call chain is really deep, way outside
> ls-files.c:
>a
> show_files (builtin/ls-files.c)
>  fill_directory (dir.c)
>   read_directory
>    read_directory_recursive
>     treat_path
>      treat_one_path
>       treat_directory
>        directory_exists_in_index
>         cache_pos_name (read-cache.c)
>
> I just want to make sure that by exercising the new format with some
> real problems, we are certain we don't overlook anything in designing
> the format (or else could be fixed before finalizing it).

I envision an index API that more strictly controls access to the index.
Right now the API consists largely of read_index, write_index and the
flat the_index->cache array of entries.  Eventually it will have to be a
family of calls that support the v5 format, and boil down to suitable
wrappers for older ones.  For example (just tossing up ideas):

  index_open(struct index_state *index, int fd):
    initialization, checking, leaves the "real" data fields empty

  index_load_filtered(..., const char **pathspec):
    load everything needed to satisfy queries filtered by 'pathspec'

  index_for_each_entry(..., void (*callback)(struct cache_entry *ent)):
    like the current hand-rolled looping

  index_for_each_entry_filtered(..., void (*callback)(struct cache_entry *ent), char **pathspec):
    ditto but for a pathspec lookup

etc.

Then I will twist Duy's words to mean that you should make git-ls-files
the poster child of this new API for development and profiling purposes
:-)

Actually converting the rest of the git code base to such an API is too
big an undertaking for the summer, so please don't stray on that path.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

  parent reply	other threads:[~2012-05-29 13:29 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-23 12:21 [GSoC] Designing a faster index format - Progress report Thomas Gummerer
2012-05-24 20:01 ` Thomas Rast
2012-05-24 20:57   ` Junio C Hamano
2012-05-25 11:31 ` Nguyen Thai Ngoc Duy
2012-05-25 20:15   ` Thomas Gummerer
2012-05-26  4:09     ` Nguyen Thai Ngoc Duy
2012-05-27  9:04       ` Thomas Gummerer
2012-05-27  9:27         ` Junio C Hamano
2012-05-27 12:23           ` Nguyen Thai Ngoc Duy
2012-05-28  8:26             ` Thomas Gummerer
2012-05-29 13:29             ` Thomas Rast [this message]
2012-05-29 13:43               ` Nguyen Thai Ngoc Duy
2012-05-29 18:33               ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87vcjfi09m.fsf@thomas.inf.ethz.ch \
    --to=trast@student.ethz.ch \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    --cc=t.gummerer@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).