Re: [PATCH v2 12/19] read-cache: read index-v5

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Thomas Gummerer <t.gummerer@gmail.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Thomas Rast <trast@inf.ethz.ch>,
	Michael Haggerty <mhagger@alum.mit.edu>,
	Junio C Hamano <gitster@pobox.com>,
	Robin Rosenberg <robin.rosenberg@dewire.com>,
	Eric Sunshine <sunshine@sunshineco.com>
Subject: Re: [PATCH v2 12/19] read-cache: read index-v5
Date: Wed, 07 Aug 2013 10:13:42 +0200	[thread overview]
Message-ID: <87mwotrk95.fsf@gmail.com> (raw)
In-Reply-To: <CACsJy8C6HRCYMR3Q=j-D=2kgzvA7=0tauSnwrjpXzSPZWe+VZw@mail.gmail.com>

Duy Nguyen <pclouds@gmail.com> writes:

> On Sat, Jul 13, 2013 at 12:26 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>> +struct directory_entry {
>> +       struct directory_entry *next;
>> +       struct directory_entry *next_hash;
>> +       struct cache_entry *ce;
>> +       struct cache_entry *ce_last;
>> +       struct conflict_entry *conflict;
>> +       struct conflict_entry *conflict_last;
>> +       unsigned int conflict_size;
>> +       unsigned int de_foffset;
>> +       unsigned int de_cr;
>> +       unsigned int de_ncr;
>> +       unsigned int de_nsubtrees;
>> +       unsigned int de_nfiles;
>> +       unsigned int de_nentries;
>> +       unsigned char sha1[20];
>> +       unsigned short de_flags;
>> +       unsigned int de_pathlen;
>> +       char pathname[FLEX_ARRAY];
>> +};
>> +
>> +struct conflict_part {
>> +       struct conflict_part *next;
>> +       unsigned short flags;
>> +       unsigned short entry_mode;
>> +       unsigned char sha1[20];
>> +};
>> +
>> +struct conflict_entry {
>> +       struct conflict_entry *next;
>> +       unsigned int nfileconflicts;
>> +       struct conflict_part *entries;
>> +       unsigned int namelen;
>> +       unsigned int pathlen;
>> +       char name[FLEX_ARRAY];
>> +};
>> +
>> +struct ondisk_conflict_part {
>> +       unsigned short flags;
>> +       unsigned short entry_mode;
>> +       unsigned char sha1[20];
>> +};
>
> These new structs should probably be in read-cache-v5.c, or read-cache.h

Makes sense, thanks.

>>  #define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)
>> +#define directory_entry_size(len) (offsetof(struct directory_entry,pathname) + (len) + 1)
>> +#define conflict_entry_size(len) (offsetof(struct conflict_entry,name) + (len) + 1)
>
> These are used by read-cache-v5.c only so far. I'd say move them to
> read-cache.h or read-cache-v5.c together with the new structs.

Thanks.

>> +struct ondisk_cache_entry {
>> +       unsigned short flags;
>> +       unsigned short mode;
>> +       struct cache_time mtime;
>> +       unsigned int size;
>> +       int stat_crc;
>> +       unsigned char sha1[20];
>> +};
>> +
>> +struct ondisk_directory_entry {
>> +       unsigned int foffset;
>> +       unsigned int cr;
>> +       unsigned int ncr;
>> +       unsigned int nsubtrees;
>> +       unsigned int nfiles;
>> +       unsigned int nentries;
>> +       unsigned char sha1[20];
>> +       unsigned short flags;
>> +};
>
> Perhaps use uint32_t, uint16_t and friends for all on-disk structures?

We got this in the makefile, so I think we should be fine without it.
It still makes sense for clarity though I think.

         ifdef NO_UINTMAX_T
	       BASIC_CFLAGS += -Duintmax_t=uint32_t
         endif

While at it I'll make the code for v[234] use them too.

>> +static struct directory_entry *read_directories(unsigned int *dir_offset,
>> +                               unsigned int *dir_table_offset,
>> +                               void *mmap,
>> +                               int mmap_size)
>> +{
>> +       int i, ondisk_directory_size;
>> +       uint32_t *filecrc, *beginning, *end;
>> +       struct directory_entry *current = NULL;
>> +       struct ondisk_directory_entry *disk_de;
>> +       struct directory_entry *de;
>> +       unsigned int data_len, len;
>> +       char *name;
>> +
>> +       /*
>> +        * Length of pathname + nul byte for termination + size of
>> +        * members of ondisk_directory_entry. (Just using the size
>> +        * of the struct doesn't work, because there may be padding
>> +        * bytes for the struct)
>> +        */
>> +       ondisk_directory_size = sizeof(disk_de->flags)
>> +               + sizeof(disk_de->foffset)
>> +               + sizeof(disk_de->cr)
>> +               + sizeof(disk_de->ncr)
>> +               + sizeof(disk_de->nsubtrees)
>> +               + sizeof(disk_de->nfiles)
>> +               + sizeof(disk_de->nentries)
>> +               + sizeof(disk_de->sha1);
>> +       name = ptr_add(mmap, *dir_offset);
>> +       beginning = ptr_add(mmap, *dir_table_offset);
>> +       end = ptr_add(mmap, *dir_table_offset + 4);
>> +       len = ntoh_l(*end) - ntoh_l(*beginning) - ondisk_directory_size - 5;
>> +       disk_de = ptr_add(mmap, *dir_offset + len + 1);
>> +       de = directory_entry_from_ondisk(disk_de, name, len);
>> +       de->next = NULL;
>> +
>> +       data_len = len + 1 + ondisk_directory_size;
>> +       filecrc = ptr_add(mmap, *dir_offset + data_len);
>> +       if (!check_crc32(0, ptr_add(mmap, *dir_offset), data_len, ntoh_l(*filecrc)))
>> +               goto unmap;
>> +
>> +       *dir_table_offset += 4;
>> +       *dir_offset += data_len + 4; /* crc code */
>> +
>> +       current = de;
>> +       for (i = 0; i < de->de_nsubtrees; i++) {
>> +               current->next = read_directories(dir_offset, dir_table_offset,
>> +                                               mmap, mmap_size);
>> +               while (current->next)
>> +                       current = current->next;
>> +       }
>> +
>> +       return de;
>> +unmap:
>> +       munmap(mmap, mmap_size);
>> +       die("directory crc doesn't match for '%s'", de->pathname);
>> +}
>
> You don't have to munmap when you die() anway.

Will change that in the re-roll.

> I'm not sure if flatten
> the directory hierarchy into a list (linked by next pointer) is a good
> idea, or we should maintain the tree structure in memory. Still a lot
> of reading to figure that out..
>
> I skipped from here..
>
>> +static void ce_queue_push(struct cache_entry **head,
>> +                            struct cache_entry **tail,
>> +                            struct cache_entry *ce)
>> +{
>
> ...
>
>> +static int read_conflicts(struct conflict_entry **head,
>> +                         struct directory_entry *de,
>> +                         void **mmap, unsigned long mmap_size)
>> +{
>
> till the end of this function. Not interested in conflict stuff yet.
>
>
>> +static struct directory_entry *read_head_directories(struct index_state *istate,
>> +                                                    unsigned int *entry_offset,
>> +                                                    unsigned int *foffsetblock,
>> +                                                    unsigned int *ndirs,
>> +                                                    void *mmap, unsigned long mmap_size)
>> +{
>
> Maybe read_all_directories is a better nam.

Makes sense, thanks.

>> +static int read_index_filtered_v5(struct index_state *istate, void *mmap,
>> +                                 unsigned long mmap_size, struct filter_opts *opts)
>> +{
>> +       unsigned int entry_offset, ndirs, foffsetblock, nr = 0;
>> +       struct directory_entry *root_directory, *de;
>> +       const char **adjusted_pathspec = NULL;
>> +       int need_root = 0, i, n;
>> +       char *oldpath, *seen;
>> +
>> +       ...
>> +
>> +       de = root_directory;
>> +       while (de) {
>> +               if (need_root ||
>> +                   match_pathspec(adjusted_pathspec, de->pathname, de->de_pathlen, 0, NULL)) {
>> +                       unsigned int subdir_foffsetblock = de->de_foffset + foffsetblock;
>> +                       unsigned int *off = mmap + subdir_foffsetblock;
>> +                       unsigned int subdir_entry_offset = entry_offset + ntoh_l(*off);
>> +                       oldpath = de->pathname;
>> +                       do {
>> +                               if (read_entries(istate, &de, &subdir_entry_offset,
>> +                                                &mmap, mmap_size, &nr,
>> +                                                &subdir_foffsetblock) < 0)
>> +                                       return -1;
>> +                       } while (de && !prefixcmp(de->pathname, oldpath));
>> +               } else
>> +                       de = de->next;
>> +       }
>
> Hm.. if we maintain a tree structure here (one link to the first
> subdir, one link to the next sibling), I think the "do" loop could be
> done without prefixcmp. Just check if "de" returned by read_entries is
> the next sibling "de" (iow the end of current directory recursively).

Yes, the tree-structure makes sense.  I've implemented it a bit
differently though, instead of using two pointers, I'm using one pointer
to an array of directory entries, which can be iterated over.

>> +       istate->cache_nr = nr;
>> +       return 0;
>> +}
>> +
>> +static int read_index_v5(struct index_state *istate, void *mmap,
>> +                        unsigned long mmap_size, struct filter_opts *opts)
>> +{
>> +       unsigned int entry_offset, ndirs, foffsetblock, nr = 0;
>> +       struct directory_entry *root_directory, *de;
>> +
>> +       if (opts != NULL)
>> +               return read_index_filtered_v5(istate, mmap, mmap_size, opts);
>> +
>> +       root_directory = read_head_directories(istate, &entry_offset,
>> +                                              &foffsetblock, &ndirs,
>> +                                              mmap, mmap_size);
>> +       de = root_directory;
>> +       while (de)
>> +               if (read_entries(istate, &de, &entry_offset, &mmap,
>> +                                mmap_size, &nr, &foffsetblock) < 0)
>> +                       return -1;
>> +       istate->cache_nr = nr;
>> +       return 0;
>> +}
>
> Make it call read_index_filtered_v5 with an empty pathspec instead.
> match_pathspec* returns true immediately if pathspec is empty. Without
> the removal of prefixcmp() in the "do" loop mentioned above,
> read_index_filtered_v5 can't be more expensive than this version.

Yes right, will change in the re-roll.

> That was it! Lunch time! Maybe I'll read the rest in the afternoon, or
> someday next week.

Thanks a lot for taking the time to review my code.

next prev parent reply	other threads:[~2013-08-07  8:13 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-12 17:26 [PATCH v2 00/19] Index-v5 Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 01/19] t2104: Don't fail for index versions other than [23] Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 02/19] read-cache: split index file version specific functionality Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 03/19] read-cache: move index v2 specific functions to their own file Thomas Gummerer
2013-07-14  3:10   ` Duy Nguyen
2013-07-19 14:53     ` Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 04/19] read-cache: Re-read index if index file changed Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 05/19] Add documentation for the index api Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 06/19] read-cache: add index reading api Thomas Gummerer
2013-07-14  3:21   ` Duy Nguyen
2013-07-12 17:26 ` [PATCH v2 07/19] make sure partially read index is not changed Thomas Gummerer
2013-07-14  3:29   ` Duy Nguyen
2013-07-17 12:56     ` Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 08/19] grep.c: Use index api Thomas Gummerer
2013-07-14  3:32   ` Duy Nguyen
2013-07-15  9:51     ` Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 09/19] ls-files.c: use " Thomas Gummerer
2013-07-14  3:39   ` Duy Nguyen
2013-07-17  8:07     ` Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 10/19] documentation: add documentation of the index-v5 file format Thomas Gummerer
2013-07-14  3:59   ` Duy Nguyen
2013-07-17  8:09     ` Thomas Gummerer
2013-08-04 11:26   ` Duy Nguyen
2013-08-04 17:58     ` Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 11/19] read-cache: make in-memory format aware of stat_crc Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 12/19] read-cache: read index-v5 Thomas Gummerer
2013-07-14  4:42   ` Duy Nguyen
2013-08-07  8:13     ` Thomas Gummerer [this message]
2013-07-15 10:12   ` Duy Nguyen
2013-07-17  8:11     ` Thomas Gummerer
2013-08-08  2:00       ` Duy Nguyen
2013-08-08 13:28         ` Thomas Gummerer
2013-08-09 13:10         ` Thomas Gummerer
2013-08-07  8:23     ` Thomas Gummerer
2013-08-08  2:09       ` Duy Nguyen
2013-07-12 17:26 ` [PATCH v2 13/19] read-cache: read resolve-undo data Thomas Gummerer
2013-07-12 17:26 ` [PATCH v2 14/19] read-cache: read cache-tree in index-v5 Thomas Gummerer
2013-07-12 17:27 ` [PATCH v2 15/19] read-cache: write index-v5 Thomas Gummerer
2013-07-12 17:27 ` [PATCH v2 16/19] read-cache: write index-v5 cache-tree data Thomas Gummerer
2013-07-12 17:27 ` [PATCH v2 17/19] read-cache: write resolve-undo data for index-v5 Thomas Gummerer
2013-07-12 17:27 ` [PATCH v2 18/19] update-index.c: rewrite index when index-version is given Thomas Gummerer
2013-07-12 17:27 ` [PATCH v2 19/19] p0003-index.sh: add perf test for the index formats Thomas Gummerer
2013-07-14  2:59 ` [PATCH v2 00/19] Index-v5 Duy Nguyen
2013-07-15  9:30   ` Thomas Gummerer
2013-07-15  9:38     ` Duy Nguyen
2013-07-17  8:12       ` Thomas Gummerer
2013-07-17 23:58         ` Junio C Hamano
2013-07-19 17:37           ` Thomas Gummerer
2013-07-19 18:25             ` Junio C Hamano
2013-07-16 21:03 ` Ramsay Jones
2013-07-17  8:04   ` Thomas Gummerer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mwotrk95.fsf@gmail.com \
    --to=t.gummerer@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    --cc=robin.rosenberg@dewire.com \
    --cc=sunshine@sunshineco.com \
    --cc=trast@inf.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.