Re: [PATCH 05/22] read-cache: add index reading api

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Thomas Gummerer <t.gummerer@gmail.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Thomas Rast <trast@inf.ethz.ch>,
	Michael Haggerty <mhagger@alum.mit.edu>,
	Junio C Hamano <gitster@pobox.com>,
	Robin Rosenberg <robin.rosenberg@dewire.com>
Subject: Re: [PATCH 05/22] read-cache: add index reading api
Date: Mon, 08 Jul 2013 13:20:04 +0200	[thread overview]
Message-ID: <87pputqowr.fsf@gmail.com> (raw)
In-Reply-To: <CACsJy8C0F+v3g+gPon6Y8+z7ObN1Jv8Ln==RNrVsRp7aya74hw@mail.gmail.com>

Duy Nguyen <pclouds@gmail.com> writes:

> On Sun, Jul 7, 2013 at 3:11 PM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>> +/*
>> + * Options by which the index should be filtered when read partially.
>> + *
>> + * pathspec: The pathspec which the index entries have to match
>> + * seen: Used to return the seen parameter from match_pathspec()
>> + * max_prefix, max_prefix_len: These variables are set to the longest
>> + *     common prefix, the length of the longest common prefix of the
>> + *     given pathspec
>> + *
>> + * read_staged: used to indicate if the conflicted entries (entries
>> + *     with a stage) should be included
>> + * read_cache_tree: used to indicate if the cache-tree should be read
>> + * read_resolve_undo: used to indicate if the resolve undo data should
>> + *     be read
>> + */
>> +struct filter_opts {
>> +       const char **pathspec;
>> +       char *seen;
>> +       char *max_prefix;
>> +       int max_prefix_len;
>> +
>> +       int read_staged;
>> +       int read_cache_tree;
>> +       int read_resolve_undo;
>> +};
>> +
>>  struct index_state {
>>         struct cache_entry **cache;
>>         unsigned int version;
>> @@ -270,6 +297,8 @@ struct index_state {
>>         struct hash_table name_hash;
>>         struct hash_table dir_hash;
>>         struct index_ops *ops;
>> +       struct internal_ops *internal_ops;
>> +       struct filter_opts *filter_opts;
>>  };
>
> ...
>
>> -/* remember to discard_cache() before reading a different cache! */
>> -int read_index_from(struct index_state *istate, const char *path)
>> +
>> +int read_index_filtered_from(struct index_state *istate, const char *path,
>> +                            struct filter_opts *opts)
>>  {
>>         int fd, err, i;
>>         struct stat st_old, st_new;
>> @@ -1337,7 +1425,7 @@ int read_index_from(struct index_state *istate, const char *path)
>>                 if (istate->ops->verify_hdr(mmap, mmap_size) < 0)
>>                         err = 1;
>>
>> -               if (istate->ops->read_index(istate, mmap, mmap_size) < 0)
>> +               if (istate->ops->read_index(istate, mmap, mmap_size, opts) < 0)
>>                         err = 1;
>>                 istate->timestamp.sec = st_old.st_mtime;
>>                 istate->timestamp.nsec = ST_MTIME_NSEC(st_old);
>> @@ -1345,6 +1433,7 @@ int read_index_from(struct index_state *istate, const char *path)
>>                         die_errno("cannot stat the open index");
>>
>>                 munmap(mmap, mmap_size);
>> +               istate->filter_opts = opts;
>>                 if (!index_changed(&st_old, &st_new) && !err)
>>                         return istate->cache_nr;
>>         }
>
> Putting filter_opts in index_state feels like a bad design. Iterator
> information should be separated from the iterated object, so that two
> callers can walk through the same index without stepping on each other
> (I'm not talking about multithreading, a caller may walk a bit, then
> the other caller starts walking, then the former caller resumes
> walking again in a call chain).

Yes, you're right.  We need the filter_opts to see what part of the
index has been loaded [1] and which part has been skipped, but it
shouldn't be used for filtering in the for_each_index_entry function.

I think there should be two versions of the for_each_index_entry
function then, where the for_each_index_entry function would behave the
same way as the for_each_index_entry_filtered function with the
filter_opts parameter set to NULL:
for_each_index_entry_filtered(struct index_state *, each_cache_entry_fn, void *cb_data, struct filter_opts *)
for_each_index_entry(struct index_state *, each_cache_entry_fn, void *cb_data)

Both of them then should call index_change_filter_opts to make sure all
the entries that are needed are loaded in the in-memory format.

Does that make sense?

[1] That is only important for the new index-v5 file format, which can
    be loaded partially.  The filter_opts could always be set to NULL,
    as the whole index is always loaded anyway.

next prev parent reply	other threads:[~2013-07-08 11:20 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-07  8:11 [PATCH 00/22] Index v5 Thomas Gummerer
2013-07-07  8:11 ` [PATCH 01/22] t2104: Don't fail for index versions other than [23] Thomas Gummerer
2013-07-07  8:11 ` [PATCH 02/22] read-cache: split index file version specific functionality Thomas Gummerer
2013-07-07  8:11 ` [PATCH 03/22] read-cache: move index v2 specific functions to their own file Thomas Gummerer
2013-07-07  8:11 ` [PATCH 04/22] read-cache: Re-read index if index file changed Thomas Gummerer
2013-07-07  8:11 ` [PATCH 05/22] read-cache: add index reading api Thomas Gummerer
2013-07-08  2:01   ` Duy Nguyen
2013-07-08 11:40     ` Thomas Gummerer
2013-07-08  2:19   ` Duy Nguyen
2013-07-08 11:20     ` Thomas Gummerer [this message]
2013-07-08 12:45       ` Duy Nguyen
2013-07-08 13:37         ` Thomas Gummerer
2013-07-08 20:54         ` [PATCH 5.5/22] Add documentation for the index api Thomas Gummerer
2013-07-09 15:42           ` Duy Nguyen
2013-07-09 20:10             ` Thomas Gummerer
2013-07-10  5:28               ` Duy Nguyen
2013-07-11 11:30                 ` Thomas Gummerer
2013-07-11 11:42                   ` Duy Nguyen
2013-07-11 12:27                     ` Duy Nguyen
2013-07-08 16:36   ` [PATCH 05/22] read-cache: add index reading api Junio C Hamano
2013-07-08 20:10     ` Thomas Gummerer
2013-07-08 23:09       ` Junio C Hamano
2013-07-09 20:13         ` Thomas Gummerer
2013-07-07  8:11 ` [PATCH 06/22] make sure partially read index is not changed Thomas Gummerer
2013-07-08 16:31   ` Junio C Hamano
2013-07-08 18:33     ` Thomas Gummerer
2013-07-07  8:11 ` [PATCH 07/22] dir.c: use index api Thomas Gummerer
2013-07-07  8:11 ` [PATCH 08/22] tree.c: " Thomas Gummerer
2013-07-07  8:11 ` [PATCH 09/22] name-hash.c: " Thomas Gummerer
2013-07-07  8:11 ` [PATCH 10/22] grep.c: Use " Thomas Gummerer
2013-07-07  8:11 ` [PATCH 11/22] ls-files.c: use the " Thomas Gummerer
2013-07-07  8:11 ` [PATCH 12/22] read-cache: make read_blob_data_from_index use " Thomas Gummerer
2013-07-07  8:11 ` [PATCH 13/22] documentation: add documentation of the index-v5 file format Thomas Gummerer
2013-07-11 10:39   ` Duy Nguyen
2013-07-11 11:39     ` Thomas Gummerer
2013-07-11 11:47       ` Duy Nguyen
2013-07-11 12:26         ` Thomas Gummerer
2013-07-11 12:50           ` Duy Nguyen
2013-07-07  8:11 ` [PATCH 14/22] read-cache: make in-memory format aware of stat_crc Thomas Gummerer
2013-07-07  8:11 ` [PATCH 15/22] read-cache: read index-v5 Thomas Gummerer
2013-07-07 20:18   ` Eric Sunshine
2013-07-08 11:40     ` Thomas Gummerer
2013-07-07  8:11 ` [PATCH 16/22] read-cache: read resolve-undo data Thomas Gummerer
2013-07-07  8:11 ` [PATCH 17/22] read-cache: read cache-tree in index-v5 Thomas Gummerer
2013-07-07 20:41   ` Eric Sunshine
2013-07-07  8:11 ` [PATCH 18/22] read-cache: write index-v5 Thomas Gummerer
2013-07-07 20:43   ` Eric Sunshine
2013-07-07  8:11 ` [PATCH 19/22] read-cache: write index-v5 cache-tree data Thomas Gummerer
2013-07-07  8:11 ` [PATCH 20/22] read-cache: write resolve-undo data for index-v5 Thomas Gummerer
2013-07-07  8:11 ` [PATCH 21/22] update-index.c: rewrite index when index-version is given Thomas Gummerer
2013-07-07  8:12 ` [PATCH 22/22] p0003-index.sh: add perf test for the index formats Thomas Gummerer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pputqowr.fsf@gmail.com \
    --to=t.gummerer@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    --cc=robin.rosenberg@dewire.com \
    --cc=trast@inf.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).