Re: [PATCH v2 12/19] read-cache: read index-v5

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Thomas Gummerer <t.gummerer@gmail.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Thomas Rast <trast@inf.ethz.ch>,
	Michael Haggerty <mhagger@alum.mit.edu>,
	Junio C Hamano <gitster@pobox.com>,
	Robin Rosenberg <robin.rosenberg@dewire.com>,
	Eric Sunshine <sunshine@sunshineco.com>
Subject: Re: [PATCH v2 12/19] read-cache: read index-v5
Date: Wed, 07 Aug 2013 10:13:42 +0200	[thread overview]
Message-ID: <87mwotrk95.fsf@gmail.com> (raw)
In-Reply-To: <CACsJy8C6HRCYMR3Q=j-D=2kgzvA7=0tauSnwrjpXzSPZWe+VZw@mail.gmail.com>

Duy Nguyen <pclouds@gmail.com> writes:

> On Sat, Jul 13, 2013 at 12:26 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>> +struct directory_entry {
>> +       struct directory_entry *next;
>> +       struct directory_entry *next_hash;
>> +       struct cache_entry *ce;
>> +       struct cache_entry *ce_last;
>> +       struct conflict_entry *conflict;
>> +       struct conflict_entry *conflict_last;
>> +       unsigned int conflict_size;
>> +       unsigned int de_foffset;
>> +       unsigned int de_cr;
>> +       unsigned int de_ncr;
>> +       unsigned int de_nsubtrees;
>> +       unsigned int de_nfiles;
>> +       unsigned int de_nentries;
>> +       unsigned char sha1[20];
>> +       unsigned short de_flags;
>> +       unsigned int de_pathlen;
>> +       char pathname[FLEX_ARRAY];
>> +};
>> +
>> +struct conflict_part {
>> +       struct conflict_part *next;
>> +       unsigned short flags;
>> +       unsigned short entry_mode;
>> +       unsigned char sha1[20];
>> +};
>> +
>> +struct conflict_entry {
>> +       struct conflict_entry *next;
>> +       unsigned int nfileconflicts;
>> +       struct conflict_part *entries;
>> +       unsigned int namelen;
>> +       unsigned int pathlen;
>> +       char name[FLEX_ARRAY];
>> +};
>> +
>> +struct ondisk_conflict_part {
>> +       unsigned short flags;
>> +       unsigned short entry_mode;
>> +       unsigned char sha1[20];
>> +};
>
> These new structs should probably be in read-cache-v5.c, or read-cache.h

Makes sense, thanks.

>>  #define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)
>> +#define directory_entry_size(len) (offsetof(struct directory_entry,pathname) + (len) + 1)
>> +#define conflict_entry_size(len) (offsetof(struct conflict_entry,name) + (len) + 1)
>
> These are used by read-cache-v5.c only so far. I'd say move them to
> read-cache.h or read-cache-v5.c together with the new structs.

Thanks.

>> +struct ondisk_cache_entry {
>> +       unsigned short flags;
>> +       unsigned short mode;
>> +       struct cache_time mtime;
>> +       unsigned int size;
>> +       int stat_crc;
>> +       unsigned char sha1[20];
>> +};
>> +
>> +struct ondisk_directory_entry {
>> +       unsigned int foffset;
>> +       unsigned int cr;
>> +       unsigned int ncr;
>> +       unsigned int nsubtrees;
>> +       unsigned int nfiles;
>> +       unsigned int nentries;
>> +       unsigned char sha1[20];
>> +       unsigned short flags;
>> +};
>
> Perhaps use uint32_t, uint16_t and friends for all on-disk structures?

We got this in the makefile, so I think we should be fine without it.
It still makes sense for clarity though I think.

         ifdef NO_UINTMAX_T
	       BASIC_CFLAGS += -Duintmax_t=uint32_t
         endif

While at it I'll make the code for v[234] use them too.

>> +static struct directory_entry *read_directories(unsigned int *dir_offset,
>> +                               unsigned int *dir_table_offset,
>> +                               void *mmap,
>> +                               int mmap_size)
>> +{
>> +       int i, ondisk_directory_size;
>> +       uint32_t *filecrc, *beginning, *end;
>> +       struct directory_entry *current = NULL;
>> +       struct ondisk_directory_entry *disk_de;
>> +       struct directory_entry *de;
>> +       unsigned int data_len, len;
>> +       char *name;
>> +
>> +       /*
>> +        * Length of pathname + nul byte for termination + size of
>> +        * members of ondisk_directory_entry. (Just using the size
>> +        * of the struct doesn't work, because there may be padding
>> +        * bytes for the struct)
>> +        */
>> +       ondisk_directory_size = sizeof(disk_de->flags)
>> +               + sizeof(disk_de->foffset)
>> +               + sizeof(disk_de->cr)
>> +               + sizeof(disk_de->ncr)
>> +               + sizeof(disk_de->nsubtrees)
>> +               + sizeof(disk_de->nfiles)
>> +               + sizeof(disk_de->nentries)
>> +               + sizeof(disk_de->sha1);
>> +       name = ptr_add(mmap, *dir_offset);
>> +       beginning = ptr_add(mmap, *dir_table_offset);
>> +       end = ptr_add(mmap, *dir_table_offset + 4);
>> +       len = ntoh_l(*end) - ntoh_l(*beginning) - ondisk_directory_size - 5;
>> +       disk_de = ptr_add(mmap, *dir_offset + len + 1);
>> +       de = directory_entry_from_ondisk(disk_de, name, len);
>> +       de->next = NULL;
>> +
>> +       data_len = len + 1 + ondisk_directory_size;
>> +       filecrc = ptr_add(mmap, *dir_offset + data_len);
>> +       if (!check_crc32(0, ptr_add(mmap, *dir_offset), data_len, ntoh_l(*filecrc)))
>> +               goto unmap;
>> +
>> +       *dir_table_offset += 4;
>> +       *dir_offset += data_len + 4; /* crc code */
>> +
>> +       current = de;
>> +       for (i = 0; i < de->de_nsubtrees; i++) {
>> +               current->next = read_directories(dir_offset, dir_table_offset,
>> +                                               mmap, mmap_size);
>> +               while (current->next)
>> +                       current = current->next;
>> +       }
>> +
>> +       return de;
>> +unmap:
>> +       munmap(mmap, mmap_size);
>> +       die("directory crc doesn't match for '%s'", de->pathname);
>> +}
>
> You don't have to munmap when you die() anway.

Will change that in the re-roll.

> I'm not sure if flatten
> the directory hierarchy into a list (linked by next pointer) is a good
> idea, or we should maintain the tree structure in memory. Still a lot
> of reading to figure that out..
>
> I skipped from here..
>
>> +static void ce_queue_push(struct cache_entry **head,
>> +                            struct cache_entry **tail,
>> +                            struct cache_entry *ce)
>> +{
>
> ...
>
>> +static int read_conflicts(struct conflict_entry **head,
>> +                         struct directory_entry *de,
>> +                         void **mmap, unsigned long mmap_size)
>> +{
>
> till the end of this function. Not interested in conflict stuff yet.
>
>
>> +static struct directory_entry *read_head_directories(struct index_state *istate,
>> +                                                    unsigned int *entry_offset,
>> +                                                    unsigned int *foffsetblock,
>> +                                                    unsigned int *ndirs,
>> +                                                    void *mmap, unsigned long mmap_size)
>> +{
>
> Maybe read_all_directories is a better nam.

Makes sense, thanks.

>> +static int read_index_filtered_v5(struct index_state *istate, void *mmap,
>> +                                 unsigned long mmap_size, struct filter_opts *opts)
>> +{
>> +       unsigned int entry_offset, ndirs, foffsetblock, nr = 0;
>> +       struct directory_entry *root_directory, *de;
>> +       const char **adjusted_pathspec = NULL;
>> +       int need_root = 0, i, n;
>> +       char *oldpath, *seen;
>> +
>> +       ...
>> +
>> +       de = root_directory;
>> +       while (de) {
>> +               if (need_root ||
>> +                   match_pathspec(adjusted_pathspec, de->pathname, de->de_pathlen, 0, NULL)) {
>> +                       unsigned int subdir_foffsetblock = de->de_foffset + foffsetblock;
>> +                       unsigned int *off = mmap + subdir_foffsetblock;
>> +                       unsigned int subdir_entry_offset = entry_offset + ntoh_l(*off);
>> +                       oldpath = de->pathname;
>> +                       do {
>> +                               if (read_entries(istate, &de, &subdir_entry_offset,
>> +                                                &mmap, mmap_size, &nr,
>> +                                                &subdir_foffsetblock) < 0)
>> +                                       return -1;
>> +                       } while (de && !prefixcmp(de->pathname, oldpath));
>> +               } else
>> +                       de = de->next;
>> +       }
>
> Hm.. if we maintain a tree structure here (one link to the first
> subdir, one link to the next sibling), I think the "do" loop could be
> done without prefixcmp. Just check if "de" returned by read_entries is
> the next sibling "de" (iow the end of current directory recursively).

Yes, the tree-structure makes sense.  I've implemented it a bit
differently though, instead of using two pointers, I'm using one pointer
to an array of directory entries, which can be iterated over.

>> +       istate->cache_nr = nr;
>> +       return 0;
>> +}
>> +
>> +static int read_index_v5(struct index_state *istate, void *mmap,
>> +                        unsigned long mmap_size, struct filter_opts *opts)
>> +{
>> +       unsigned int entry_offset, ndirs, foffsetblock, nr = 0;
>> +       struct directory_entry *root_directory, *de;
>> +
>> +       if (opts != NULL)
>> +               return read_index_filtered_v5(istate, mmap, mmap_size, opts);
>> +
>> +       root_directory = read_head_directories(istate, &entry_offset,
>> +                                              &foffsetblock, &ndirs,
>> +                                              mmap, mmap_size);
>> +       de = root_directory;
>> +       while (de)
>> +               if (read_entries(istate, &de, &entry_offset, &mmap,
>> +                                mmap_size, &nr, &foffsetblock) < 0)
>> +                       return -1;
>> +       istate->cache_nr = nr;
>> +       return 0;
>> +}
>
> Make it call read_index_filtered_v5 with an empty pathspec instead.
> match_pathspec* returns true immediately if pathspec is empty. Without
> the removal of prefixcmp() in the "do" loop mentioned above,
> read_index_filtered_v5 can't be more expensive than this version.

Yes right, will change in the re-roll.

> That was it! Lunch time! Maybe I'll read the rest in the afternoon, or
> someday next week.

Thanks a lot for taking the time to review my code.

next prev parent reply	other threads:[~2013-08-07  8:13 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-12 17:26 [PATCH v2 00/19] Index-v5 Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 01/19] t2104: Don't fail for index versions other than [23] Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 02/19] read-cache: split index file version specific functionality Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 03/19] read-cache: move index v2 specific functions to their own file Thomas Gummerer
2013-07-14  3:10   ` Duy Nguyen
2013-07-19 14:53     ` Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 04/19] read-cache: Re-read index if index file changed Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 05/19] Add documentation for the index api Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 06/19] read-cache: add index reading api Thomas Gummerer
2013-07-14  3:21   ` Duy Nguyen
2013-07-12 17:26 ` [PATCH v2 07/19] make sure partially read index is not changed Thomas Gummerer
2013-07-14  3:29   ` Duy Nguyen
2013-07-17 12:56     ` Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 08/19] grep.c: Use index api Thomas Gummerer
2013-07-14  3:32   ` Duy Nguyen
2013-07-15  9:51     ` Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 09/19] ls-files.c: use " Thomas Gummerer
2013-07-14  3:39   ` Duy Nguyen
2013-07-17  8:07     ` Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 10/19] documentation: add documentation of the index-v5 file format Thomas Gummerer
2013-07-14  3:59   ` Duy Nguyen
2013-07-17  8:09     ` Thomas Gummerer
2013-08-04 11:26   ` Duy Nguyen
2013-08-04 17:58     ` Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 11/19] read-cache: make in-memory format aware of stat_crc Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 12/19] read-cache: read index-v5 Thomas Gummerer
2013-07-14  4:42   ` Duy Nguyen
2013-08-07  8:13     ` Thomas Gummerer [this message]
2013-07-15 10:12   ` Duy Nguyen
2013-07-17  8:11     ` Thomas Gummerer
2013-08-08  2:00       ` Duy Nguyen
2013-08-08 13:28         ` Thomas Gummerer
2013-08-09 13:10         ` Thomas Gummerer
2013-08-07  8:23     ` Thomas Gummerer
2013-08-08  2:09       ` Duy Nguyen
2013-07-12 17:26 ` [PATCH v2 13/19] read-cache: read resolve-undo data Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 14/19] read-cache: read cache-tree in index-v5 Thomas Gummerer
2013-07-12 17:27 ` [PATCH v2 15/19] read-cache: write index-v5 Thomas Gummerer
2013-07-12 17:27 ` [PATCH v2 16/19] read-cache: write index-v5 cache-tree data Thomas Gummerer
2013-07-12 17:27 ` [PATCH v2 17/19] read-cache: write resolve-undo data for index-v5 Thomas Gummerer
2013-07-12 17:27 ` [PATCH v2 18/19] update-index.c: rewrite index when index-version is given Thomas Gummerer
2013-07-12 17:27 ` [PATCH v2 19/19] p0003-index.sh: add perf test for the index formats Thomas Gummerer
2013-07-14  2:59 ` [PATCH v2 00/19] Index-v5 Duy Nguyen
2013-07-15  9:30   ` Thomas Gummerer
2013-07-15  9:38     ` Duy Nguyen
2013-07-17  8:12       ` Thomas Gummerer
2013-07-17 23:58         ` Junio C Hamano
2013-07-19 17:37           ` Thomas Gummerer
2013-07-19 18:25             ` Junio C Hamano
2013-07-16 21:03 ` Ramsay Jones
2013-07-17  8:04   ` Thomas Gummerer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mwotrk95.fsf@gmail.com \
    --to=t.gummerer@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    --cc=robin.rosenberg@dewire.com \
    --cc=sunshine@sunshineco.com \
    --cc=trast@inf.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).