Re: [PATCH v3 15/24] read-cache: read index-v5

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Thomas Gummerer <t.gummerer@gmail.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Thomas Rast <trast@inf.ethz.ch>,
	Michael Haggerty <mhagger@alum.mit.edu>,
	Junio C Hamano <gitster@pobox.com>,
	Robin Rosenberg <robin.rosenberg@dewire.com>,
	Eric Sunshine <sunshine@sunshineco.com>,
	Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Subject: Re: [PATCH v3 15/24] read-cache: read index-v5
Date: Tue, 20 Aug 2013 22:59:23 +0200	[thread overview]
Message-ID: <87wqng3wpg.fsf@gmail.com> (raw)
In-Reply-To: <CACsJy8A1NpYa6nr6vHKLd-Tap3PObabfUB34U90MH_OdfQJN9Q@mail.gmail.com>

Duy Nguyen <pclouds@gmail.com> writes:

> General comment: a short comment before each function describing what
> the function does would be helpful. This only applies for complex
> functions (read_* ones). Of course verify_hdr does not require extra
> explanantion.

Yes, makes sense, I'll do that in the re-roll.

>  On Mon, Aug 19, 2013 at 2:42 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>> +static struct directory_entry *directory_entry_from_ondisk(struct ondisk_directory_entry *ondisk,
>> +                                                  const char *name,
>> +                                                  size_t len)
>> +{
>> +       struct directory_entry *de = xmalloc(directory_entry_size(len));
>> +
>> +       memcpy(de->pathname, name, len);
>> +       de->pathname[len] = '\0';
>> +       de->de_flags      = ntoh_s(ondisk->flags);
>> +       de->de_foffset    = ntoh_l(ondisk->foffset);
>> +       de->de_cr         = ntoh_l(ondisk->cr);
>> +       de->de_ncr        = ntoh_l(ondisk->ncr);
>> +       de->de_nsubtrees  = ntoh_l(ondisk->nsubtrees);
>> +       de->de_nfiles     = ntoh_l(ondisk->nfiles);
>> +       de->de_nentries   = ntoh_l(ondisk->nentries);
>> +       de->de_pathlen    = len;
>> +       hashcpy(de->sha1, ondisk->sha1);
>> +       return de;
>> +}
>
> This function leaves a lot of fields uninitialized..
>
>> +static struct directory_entry *read_directories(unsigned int *dir_offset,
>> +                               unsigned int *dir_table_offset,
>> +                               void *mmap,
>> +                               int mmap_size)
>> +{
>> ....
>> +       de = directory_entry_from_ondisk(disk_de, name, len);
>> +       de->next = NULL;
>> +       de->sub = NULL;
>
> ..and two of them are set to NULL here. Maybe
> directory_entry_from_ondisk() could be made to call
> init_directory_entry() instead so that we don't need to manually reset
> some fields here, which leaves me wondering why other fields are not
> important to reset. init_directory_entry() is introduced later in
> "write index-v5" patch, you so may want to move it up a few patches.

The rest of the fields are only used for compiling the data that will be
written.  Using init_directory_entry() here makes sense anyway though,
thanks for the suggestion.

>> +static int read_entry(struct cache_entry **ce, char *pathname, size_t pathlen,
>> +                     void *mmap, unsigned long mmap_size,
>> +                     unsigned int first_entry_offset,
>> +                     unsigned int foffsetblock)
>> +{
>> +       int len, offset_to_offset;
>> +       char *name;
>> +       uint32_t foffsetblockcrc, *filecrc, *beginning, *end, entry_offset;
>> +       struct ondisk_cache_entry *disk_ce;
>> +
>> +       beginning = ptr_add(mmap, foffsetblock);
>> +       end = ptr_add(mmap, foffsetblock + 4);
>> +       len = ntoh_l(*end) - ntoh_l(*beginning) - sizeof(struct ondisk_cache_entry) - 5;
>
> It took me a while to check and figure out " - 5" here means minus NUL
> and the crc. A short comment would help. I think there's also another
> -5 in read_directories(). Or maybe just rename len to namelen.

Will add a short comment.

>> +struct conflict_entry *create_new_conflict(char *name, int len, int pathlen)
>> +{
>> +       struct conflict_entry *conflict_entry;
>> +
>> +       if (pathlen)
>> +               pathlen++;
>> +       conflict_entry = xmalloc(conflict_entry_size(len));
>> +       conflict_entry->entries = NULL;
>> +       conflict_entry->nfileconflicts = 0;
>> +       conflict_entry->namelen = len;
>> +       memcpy(conflict_entry->name, name, len);
>> +       conflict_entry->name[len] = '\0';
>> +       conflict_entry->pathlen = pathlen;
>> +       conflict_entry->next = NULL;
>
> A memset followed by memcpy and conflict_entry->pathlen = pathlen
> would make this shorter and won't miss new fields added in future.

Makes sense, thanks.

>> +static int read_entries(struct index_state *istate, struct directory_entry *de,
>> +                       unsigned int first_entry_offset, void *mmap,
>> +                       unsigned long mmap_size, unsigned int *nr,
>> +                       unsigned int foffsetblock)
>> +{
>> +       struct cache_entry *ce;
>> +       int i, subdir = 0;
>> +
>> +       for (i = 0; i < de->de_nfiles; i++) {
>> +               unsigned int subdir_foffsetblock = de->de_foffset + foffsetblock + (i * 4);
>> +               if (read_entry(&ce, de->pathname, de->de_pathlen, mmap, mmap_size,
>> +                              first_entry_offset, subdir_foffsetblock) < 0)
>> +                       return -1;
>
> You read one file entry, say abc/def...

You're not quite right here.  I'm reading def here, de is the root
directory and de->sub[subdir] is the first sub directory, named abc/

>> +               while (subdir < de->de_nsubtrees &&
>> +                      cache_name_compare(ce->name + de->de_pathlen,
>> +                                         ce_namelen(ce) - de->de_pathlen,
>> +                                         de->sub[subdir]->pathname + de->de_pathlen,
>> +                                         de->sub[subdir]->de_pathlen - de->de_pathlen) > 0) {
>
> Oh right the entry belongs the the substree "abc" so..

abc/ comes before def, so lets read everything in that directory first.

>> +                       read_entries(istate, de->sub[subdir], first_entry_offset, mmap,
>> +                                    mmap_size, nr, foffsetblock);
>
> you recurse in, which will add following entries like abc/def and abc/xyz...

Recurse in, add abc/def and abc/xyz, and increase nr in the recursion,
so the new entry gets added at the right place.

>> +                       subdir++;
>> +               }
>> +               if (!ce)
>> +                       continue;
>> +               set_index_entry(istate, (*nr)++, ce);
>
> then back here after recusion and add abc/def, again, after abc/xyz.
> Did I read this code correctly?

After the recursion add def to at the 3rd position in the index.  After
that it looks like:
abc/def
abc/xyz
def

I hope that makes it a little clearer.

>> +       }
>> +       for (i = subdir; i < de->de_nsubtrees; i++) {
>> +               read_entries(istate, de->sub[i], first_entry_offset, mmap,
>> +                            mmap_size, nr, foffsetblock);
>> +       }
>> +       return 0;
>> +}
>> +
>
>> +static int read_index_v5(struct index_state *istate, void *mmap,
>> +                        unsigned long mmap_size, struct filter_opts *opts)
>> +{
>> +       unsigned int entry_offset, ndirs, foffsetblock, nr = 0;
>> +       struct directory_entry *root_directory, *de, *last_de;
>> +       const char **paths = NULL;
>> +       struct pathspec adjusted_pathspec;
>> +       int need_root = 0, i;
>> +
>> +       root_directory = read_all_directories(istate, &entry_offset,
>> +                                             &foffsetblock, &ndirs,
>> +                                             mmap, mmap_size);
>> +
>> +       if (opts && opts->pathspec && opts->pathspec->nr) {
>> +               need_root = 0;
>
> need_root is already initialized at declaration.

Right, thanks.

>> +               paths = xmalloc((opts->pathspec->nr + 1)*sizeof(char *));
>> +               paths[opts->pathspec->nr] = NULL;
>> +               for (i = 0; i < opts->pathspec->nr; i++) {
>> +                       char *super = strdup(opts->pathspec->items[i].match);
>> +                       int len = strlen(super);
>> +                       while (len && super[len - 1] == '/' && super[len - 2] == '/')
>> +                               super[--len] = '\0'; /* strip all but one trailing slash */
>> +                       while (len && super[--len] != '/')
>> +                               ; /* scan backwards to next / */
>> +                       if (len >= 0)
>> +                               super[len--] = '\0';
>> +                       if (len <= 0) {
>> +                               need_root = 1;
>> +                               break;
>> +                       }
>> +                       paths[i] = super;
>> +               }
>> +       }
>> +
>> +       if (!need_root)
>> +               parse_pathspec(&adjusted_pathspec, PATHSPEC_ALL_MAGIC, PATHSPEC_PREFER_CWD, NULL, paths);
>> +
>> +       de = root_directory;
>> +       last_de = de;
>> +       while (de) {
>> +               if (need_root ||
>> +                   match_pathspec_depth(&adjusted_pathspec, de->pathname, de->de_pathlen, 0, NULL)) {
>> +                       if (read_entries(istate, de, entry_offset,
>> +                                        mmap, mmap_size, &nr,
>> +                                        foffsetblock) < 0)
>> +                               return -1;
>> +               } else {
>> +                       for (i = 0; i < de->de_nsubtrees; i++) {
>> +                               last_de->next = de->sub[i];
>> +                               last_de = last_de->next;
>> +                       }
>> +               }
>> +               de = de->next;
>
> I'm missing something here. read_entries is a function that reads all
> entries inside "de" including subdirectories and the first "de" is
> root_directory, which makes it read the whole index in.

It does, except when the adjusted_pathspec doesn't match the
root_directory.  In that case all the subdirectories of the
root_directory are added to a queue, which will then be iterated over
and tried to match with the adjusted_pathspec.

This has a bug not covered by the test suite described below when
checking against pathspecs with different levels.

> Because
> de->next is only set in this function, de->next after read_entries()
> is NULL, which termintates the loop and the else block never runs. It
> does not sound right..

If the subdirectory is read it does and the loop should terminate,
because the whole index is read.

It does have a bug in the following test case though, which is not
covered by the test suite.  I'll add this and the test and the fix to
the test-suite:

#!/bin/sh

test_description="Test index-v5 specific corner cases"

. ./test-lib.sh

test_set_index_version 5

test_expect_success 'setup' '
	mkdir -p abc/def def &&
	touch abc/def/xyz def/xyz &&
	git add . &&
	git commit -m "test commit"
'

test_expect_success 'ls-files ordering correct' '
	cat <<-\EOF >expected &&
	abc/def/xyz
	def/xyz
	EOF
	git ls-files abc/def/xyz def/xyz >actual &&
	test_cmp expected actual
'

test_done

This can be solved by the following:

diff --git a/read-cache-v5.c b/read-cache-v5.c
index 10960fd..9963d1f 100644
--- a/read-cache-v5.c
+++ b/read-cache-v5.c
@@ -661,7 +661,9 @@ static int read_index_v5(struct index_state *istate, void *mmap,
                                         foffsetblock) < 0)
                                return -1;
                } else {
+                       last_de = de;
                        for (i = 0; i < de->de_nsubtrees; i++) {
+                               de->sub[i]->next = last_de->next;
                                last_de->next = de->sub[i];
                                last_de = last_de->next;
                        }

next prev parent reply	other threads:[~2013-08-20 20:59 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 01/24] t2104: Don't fail for index versions other than [23] Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 02/24] read-cache: use fixed width integer types Thomas Gummerer
2013-08-18 20:21   ` Eric Sunshine
2013-08-20 19:30   ` Junio C Hamano
2013-08-21  3:05     ` Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 03/24] read-cache: split index file version specific functionality Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 04/24] read-cache: clear version in discard_index() Thomas Gummerer
2013-08-20 19:34   ` Junio C Hamano
2013-08-21  3:06     ` Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 05/24] read-cache: move index v2 specific functions to their own file Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 06/24] read-cache: Don't compare uid, gid and ino on cygwin Thomas Gummerer
2013-08-18 22:34   ` Ramsay Jones
2013-08-20  8:36     ` Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 07/24] read-cache: Re-read index if index file changed Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 08/24] add documentation for the index api Thomas Gummerer
2013-08-18 20:50   ` Eric Sunshine
2013-08-18 19:41 ` [PATCH v3 09/24] read-cache: add index reading api Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 10/24] make sure partially read index is not changed Thomas Gummerer
2013-08-18 21:06   ` Eric Sunshine
2013-08-20  8:46     ` Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 11/24] grep.c: use index api Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 12/24] ls-files.c: " Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 13/24] documentation: add documentation of the index-v5 file format Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 14/24] read-cache: make in-memory format aware of stat_crc Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 15/24] read-cache: read index-v5 Thomas Gummerer
2013-08-19  1:57   ` Eric Sunshine
2013-08-20 14:01   ` Duy Nguyen
2013-08-20 20:59     ` Thomas Gummerer [this message]
2013-08-21  0:44       ` Duy Nguyen
2013-08-20 14:16   ` Duy Nguyen
2013-08-20 21:13     ` Thomas Gummerer
2013-08-23 23:52   ` Duy Nguyen
2013-08-18 19:42 ` [PATCH v3 16/24] read-cache: read resolve-undo data Thomas Gummerer
2013-08-19  1:59   ` Eric Sunshine
2013-08-18 19:42 ` [PATCH v3 17/24] read-cache: read cache-tree in index-v5 Thomas Gummerer
2013-08-24  0:09   ` Duy Nguyen
2013-11-25 15:41     ` Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 18/24] read-cache: write index-v5 Thomas Gummerer
2013-08-24  3:58   ` Duy Nguyen
2013-11-25 15:37     ` Thomas Gummerer
2013-08-24  4:07   ` Duy Nguyen
2013-08-24  9:56     ` Duy Nguyen
2013-08-18 19:42 ` [PATCH v3 19/24] read-cache: write index-v5 cache-tree data Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 20/24] read-cache: write resolve-undo data for index-v5 Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 21/24] update-index.c: rewrite index when index-version is given Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 22/24] p0003-index.sh: add perf test for the index formats Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 23/24] introduce GIT_INDEX_VERSION environment variable Thomas Gummerer
2013-08-21  0:57   ` Duy Nguyen
2013-08-21  4:01     ` Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 24/24] test-lib: allow setting the index format version Thomas Gummerer
2013-08-24  4:16 ` [PATCH v3 00/24] Index-v5 Duy Nguyen
2013-08-25  3:07   ` Junio C Hamano
2013-08-25  4:40     ` Duy Nguyen
2013-08-31  5:23     ` Thomas Gummerer

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:10960fd dfblob:9963d1f )
 OR (
bs:"Re: [PATCH v3 15/24] read-cache: read index-v5" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wqng3wpg.fsf@gmail.com \
    --to=t.gummerer@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    --cc=ramsay@ramsay1.demon.co.uk \
    --cc=robin.rosenberg@dewire.com \
    --cc=sunshine@sunshineco.com \
    --cc=trast@inf.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.