From: Thomas Gummerer <t.gummerer@gmail.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
Thomas Rast <trast@inf.ethz.ch>,
Michael Haggerty <mhagger@alum.mit.edu>,
Junio C Hamano <gitster@pobox.com>,
Robin Rosenberg <robin.rosenberg@dewire.com>,
Eric Sunshine <sunshine@sunshineco.com>,
Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Subject: Re: [PATCH v3 18/24] read-cache: write index-v5
Date: Mon, 25 Nov 2013 16:37:13 +0100 [thread overview]
Message-ID: <87li0co5li.fsf@gmail.com> (raw)
In-Reply-To: <CACsJy8BocjJqgyJwa8DzwiNdmpu-yJH9vVuEPunxjtRyr-rxYA@mail.gmail.com>
[sorry about taking so much time to get back to you, was too busy with
other stuff to work on git]
Duy Nguyen <pclouds@gmail.com> writes:
> On Mon, Aug 19, 2013 at 2:42 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>> +char *super_directory(const char *filename)
>> +{
>> + char *slash;
>> +
>> + slash = strrchr(filename, '/');
>> + if (slash)
>> + return xmemdupz(filename, slash-filename);
>> + return NULL;
>> +}
>> +
>
> Why is this function not static? There are a few other in this patch
> (I did not notice in others, but I wasn't looking for them..)
That's just an oversight, it should be static. Will look for the others
and change them to static too.
> But isn't this what dirname() is?
>
> Another point about this function is it returns a new allocated
> string, but I see no free() anywhere in this patch. Leak alert!
Yes, you're right, I'll use dirname instead. That also solves the free
problem, as dirname modifies the string. I'll still use a wrapper
around it, because dirname returns "." when there is no super_directory,
but using NULL for that makes it simpler to check it.
>> +struct directory_entry *init_directory_entry(char *pathname, int len)
>> +{
>> + struct directory_entry *de = xmalloc(directory_entry_size(len));
>> +
>> + memcpy(de->pathname, pathname, len);
>> + de->pathname[len] = '\0';
>> + de->de_flags = 0;
>> + de->de_foffset = 0;
>> + de->de_cr = 0;
>> + de->de_ncr = 0;
>> + de->de_nsubtrees = 0;
>> + de->de_nfiles = 0;
>> + de->de_nentries = 0;
>> + memset(de->sha1, 0, 20);
>> + de->de_pathlen = len;
>> + de->next = NULL;
>> + de->next_hash = NULL;
>> + de->ce = NULL;
>> + de->ce_last = NULL;
>> + de->conflict = NULL;
>> + de->conflict_last = NULL;
>> + de->conflict_size = 0;
>> + return de;
>> +}
>
> I think this function could be shortened to
>
> struct directory_entry *de = xcalloc(1, directory_entry_size(len));
> memcpy(de->pathname, pathname, len);
> de->de_pathlen = len;
> return de;
Makes sense, thanks.
>> +static struct directory_entry *get_directory(char *dir, unsigned int dir_len,
>> + struct hash_table *table,
>> + unsigned int *total_dir_len,
>> + unsigned int *ndir,
>> + struct directory_entry **current)
>> +{
>> + struct directory_entry *tmp = NULL, *search, *new, *ret;
>> + uint32_t crc;
>> +
>> + search = find_directory(dir, dir_len, &crc, table);
>> + if (search)
>> + return search;
>> + while (!search) {
>> + new = init_directory_entry(dir, dir_len);
>> + insert_directory_entry(new, table, total_dir_len, ndir, crc);
>> + if (!tmp)
>> + ret = new;
>> + else
>> + new->de_nsubtrees = 1;
>> + new->next = tmp;
>> + tmp = new;
>> + dir = super_directory(dir);
>
> It feels more natural to create directory_entry(s) from parent to
> subdir. If you do so you could reset dir to the remaining of directory
> and perform strchr() and do not need to allocate new memory everytime
> you call super_directory (because it relies on NUL at the end of the
> string).
I'm creating them from subdir to parent because it saves a few calls
whenever a superdir is already in the hash_table. And I'm using dirname
so the new memory allocations are not a problem anymore.
>> + dir_len = dir ? strlen(dir) : 0;
>> + search = find_directory(dir, dir_len, &crc, table);
>> + }
>> + search->de_nsubtrees++;
>> + (*current)->next = tmp;
>> + while ((*current)->next)
>> + *current = (*current)->next;
>> +
>> + return ret;
>> +}
>
>> +static struct directory_entry *compile_directory_data(struct index_state *istate,
>> + int nfile,
>> + unsigned int *ndir,
>> + unsigned int *total_dir_len,
>> + unsigned int *total_file_len)
>> +{
>> + int i, dir_len = -1;
>> + char *dir;
>> + struct directory_entry *de, *current, *search;
>> + struct cache_entry **cache = istate->cache;
>> + struct conflict_entry *conflict_entry;
>> + struct hash_table table;
>> + uint32_t crc;
>> +
>> + init_hash(&table);
>> + de = init_directory_entry("", 0);
>> + current = de;
>> + *ndir = 1;
>> + *total_dir_len = 1;
>> + crc = crc32(0, (Bytef*)de->pathname, de->de_pathlen);
>> + insert_hash(crc, de, &table);
>> + conflict_entry = NULL;
>> + for (i = 0; i < nfile; i++) {
>> + if (cache[i]->ce_flags & CE_REMOVE)
>> + continue;
>> +
>> + if (dir_len < 0
>> + || cache[i]->name[dir_len] != '/'
>
> Need a check to make sure name[dir_len] is not out of bound
Thanks.
>> + || strchr(cache[i]->name + dir_len + 1, '/')
>> + || cache_name_compare(cache[i]->name, ce_namelen(cache[i]),
>> + dir, dir_len)) {
>
> In my opinon, "if (dir_len < 0 || !(must && be && a && subdirectory))"
> is easier to read..
Makes sense, will change.
>> + dir = super_directory(cache[i]->name);
>> + dir_len = dir ? strlen(dir) : 0;
>> + search = get_directory(dir, dir_len, &table,
>> + total_dir_len, ndir,
>> + ¤t);
>> + }
>> + search->de_nfiles++;
>> + *total_file_len += ce_namelen(cache[i]) + 1;
>> + if (search->de_pathlen)
>> + *total_file_len -= search->de_pathlen + 1;
>> + ce_queue_push(&(search->ce), &(search->ce_last), cache[i]);
>> +
>> + if (ce_stage(cache[i]) > 0) {
>> + struct conflict_part *conflict_part;
>> + if (!conflict_entry ||
>> + cache_name_compare(conflict_entry->name, conflict_entry->namelen,
>> + cache[i]->name, ce_namelen(cache[i]))) {
>> + conflict_entry = create_conflict_entry_from_ce(cache[i], search->de_pathlen);
>> + add_conflict_to_directory_entry(search, conflict_entry);
>> + }
>> + conflict_part = conflict_part_from_inmemory(cache[i]);
>> + add_part_to_conflict_entry(search, conflict_entry, conflict_part);
>> + }
>> + }
>> + return de;
>> +}
>> +
>
>> +static int write_directories(struct directory_entry *de, int fd, int conflict_offset)
>> +{
>> + struct directory_entry *current;
>> + struct ondisk_directory_entry ondisk;
>> + int current_offset, offset_write, ondisk_size, foffset;
>> + uint32_t crc;
>> +
>> + /*
>> + * This is needed because the compiler aligns structs to sizes multiple
>> + * of 4
>> + */
>> + ondisk_size = sizeof(ondisk.flags)
>> + + sizeof(ondisk.foffset)
>> + + sizeof(ondisk.cr)
>> + + sizeof(ondisk.ncr)
>> + + sizeof(ondisk.nsubtrees)
>> + + sizeof(ondisk.nfiles)
>> + + sizeof(ondisk.nentries)
>> + + sizeof(ondisk.sha1);
>> + current = de;
>> + current_offset = 0;
>> + foffset = 0;
>> + while (current) {
>> + int pathlen;
>> +
>> + offset_write = htonl(current_offset);
>> + if (ce_write(NULL, fd, &offset_write, 4) < 0)
>> + return -1;
>> + if (current->de_pathlen == 0)
>> + pathlen = 0;
>> + else
>> + pathlen = current->de_pathlen + 1;
>> + current_offset += pathlen + 1 + ondisk_size + 4;
>> + current = current->next;
>> + }
>> + /*
>> + * Write one more offset, which points to the end of the entries,
>> + * because we use it for calculating the dir length, instead of
>> + * using strlen.
>> + */
>> + offset_write = htonl(current_offset);
>> + if (ce_write(NULL, fd, &offset_write, 4) < 0)
>> + return -1;
>> + current = de;
>> + while (current) {
>> + crc = 0;
>> + if (current->de_pathlen == 0) {
>> + if (ce_write(&crc, fd, current->pathname, 1) < 0)
>> + return -1;
>> + } else {
>> + char *path;
>> + path = xmalloc(sizeof(char) * (current->de_pathlen + 2));
>> + memcpy(path, current->pathname, current->de_pathlen);
>> + memcpy(path + current->de_pathlen, "/\0", 2);
>> + if (ce_write(&crc, fd, path, current->de_pathlen + 2) < 0)
>> + return -1;
>
> xmalloc without free
In the new version the pathname is included at the end of the ondisk
struct, for which I added a free.
>> + }
>> + current->de_foffset = foffset;
>> + current->de_cr = conflict_offset;
>> + ondisk_from_directory_entry(current, &ondisk);
>> + if (ce_write(&crc, fd, &ondisk, ondisk_size) < 0)
>> + return -1;
>> + crc = htonl(crc);
>> + if (ce_write(NULL, fd, &crc, 4) < 0)
>> + return -1;
>> + conflict_offset += current->conflict_size;
>> + foffset += current->de_nfiles * 4;
>> + current = current->next;
>> + }
>> + return 0;
>> +}
>> +
>> +static int write_entries(struct index_state *istate,
>> + struct directory_entry *de,
>> + int entries,
>> + int fd,
>> + int offset_to_offset)
>> +{
>> + int offset, offset_write, ondisk_size;
>> + struct directory_entry *current;
>> +
>> + offset = 0;
>> + ondisk_size = sizeof(struct ondisk_cache_entry);
>> + current = de;
>> + while (current) {
>
> A short comment a the beginning of this block saying this writes
> fileoffsets table would be nice.
Done, thanks.
>> + int pathlen;
>> + struct cache_entry *ce = current->ce;
>> +
>> + if (current->de_pathlen == 0)
>> + pathlen = 0;
>> + else
>> + pathlen = current->de_pathlen + 1;
>> + while (ce) {
>> + if (ce->ce_flags & CE_REMOVE)
>> + continue;
>
> How come CE_REMOVE'd entries get here? I thought they were all ignored
> at compile_directory_data()
They don't, I've added this line unnecessarily. Will remove it.
>> + if (!ce_uptodate(ce) && is_racy_timestamp(istate, ce))
>> + ce_smudge_racily_clean_entry(ce);
>> + if (is_null_sha1(ce->sha1))
>> + return error("cache entry has null sha1: %s", ce->name);
>> +
>> + offset_write = htonl(offset);
>> + if (ce_write(NULL, fd, &offset_write, 4) < 0)
>> + return -1;
>> + offset += ce_namelen(ce) - pathlen + 1 + ondisk_size + 4;
>> + ce = ce->next_ce;
>> + }
>> + current = current->next;
>> + }
>> + /*
>> + * Write one more offset, which points to the end of the entries,
>> + * because we use it for calculating the file length, instead of
>> + * using strlen.
>> + */
>> + offset_write = htonl(offset);
>> + if (ce_write(NULL, fd, &offset_write, 4) < 0)
>> + return -1;
>> +
>> + offset = offset_to_offset;
>> + current = de;
>> + while (current) {
>> + int pathlen;
>> + struct cache_entry *ce = current->ce;
>> +
>> + if (current->de_pathlen == 0)
>> + pathlen = 0;
>> + else
>> + pathlen = current->de_pathlen + 1;
>> + while (ce) {
>> + struct ondisk_cache_entry ondisk;
>> + uint32_t crc, calc_crc;
>> +
>> + if (ce->ce_flags & CE_REMOVE)
>> + continue;
>> + calc_crc = htonl(offset);
>> + crc = crc32(0, (Bytef*)&calc_crc, 4);
>> + if (ce_write(&crc, fd, ce->name + pathlen,
>> + ce_namelen(ce) - pathlen + 1) < 0)
>> + return -1;
>> + ondisk_from_cache_entry(ce, &ondisk);
>> + if (ce_write(&crc, fd, &ondisk, ondisk_size) < 0)
>> + return -1;
>> + crc = htonl(crc);
>> + if (ce_write(NULL, fd, &crc, 4) < 0)
>> + return -1;
>> + offset += 4;
>> + ce = ce->next_ce;
>> + }
>> + current = current->next;
>> + }
>> + return 0;
>> +}
>
>> +static int write_index_v5(struct index_state *istate, int newfd)
>> +{
>> + struct cache_header hdr;
>> + struct cache_header_v5 hdr_v5;
>> + struct cache_entry **cache = istate->cache;
>> + struct directory_entry *de;
>> + struct ondisk_directory_entry *ondisk;
>> + unsigned int entries = istate->cache_nr;
>> + unsigned int i, removed, total_dir_len, ondisk_directory_size;
>> + unsigned int total_file_len, conflict_offset, foffsetblock;
>> + unsigned int ndir;
>> + uint32_t crc;
>> +
>> + if (istate->filter_opts)
>> + die("BUG: index: cannot write a partially read index");
>> +
>> + for (i = removed = 0; i < entries; i++) {
>> + if (cache[i]->ce_flags & CE_REMOVE)
>> + removed++;
>> + }
>> + hdr.hdr_signature = htonl(CACHE_SIGNATURE);
>> + hdr.hdr_version = htonl(istate->version);
>> + hdr.hdr_entries = htonl(entries - removed);
>> + hdr_v5.hdr_nextension = htonl(0); /* Currently no extensions are supported */
>> +
>> + total_dir_len = 0;
>> + total_file_len = 0;
>> + de = compile_directory_data(istate, entries, &ndir,
>> + &total_dir_len, &total_file_len);
>> + hdr_v5.hdr_ndir = htonl(ndir);
>> +
>> + /*
>> + * This is needed because the compiler aligns structs to sizes multipe
>> + * of 4
>> + */
>> + ondisk_directory_size = sizeof(ondisk->flags)
>> + + sizeof(ondisk->foffset)
>> + + sizeof(ondisk->cr)
>> + + sizeof(ondisk->ncr)
>> + + sizeof(ondisk->nsubtrees)
>> + + sizeof(ondisk->nfiles)
>> + + sizeof(ondisk->nentries)
>> + + sizeof(ondisk->sha1);
>
> There is a similar statement in read code. This calls for a macro to
> share this sum.
This is no longer needed, as I switched the flag size to 32 bits, to
enable adding the pathname at the end of the struct.
>> + foffsetblock = sizeof(hdr) + sizeof(hdr_v5) + 4
>> + + (ndir + 1) * 4
>> + + total_dir_len
>> + + ndir * (ondisk_directory_size + 4);
>> + hdr_v5.hdr_fblockoffset = htonl(foffsetblock + (entries - removed + 1) * 4);
>> + crc = 0;
>> + if (ce_write(&crc, newfd, &hdr, sizeof(hdr)) < 0)
>> + return -1;
>> + if (ce_write(&crc, newfd, &hdr_v5, sizeof(hdr_v5)) < 0)
>> + return -1;
>> + crc = htonl(crc);
>> + if (ce_write(NULL, newfd, &crc, 4) < 0)
>> + return -1;
>> +
>> + conflict_offset = foffsetblock +
>> + + (entries - removed + 1) * 4
>> + + total_file_len
>> + + (entries - removed) * (sizeof(struct ondisk_cache_entry) + 4);
>> + if (write_directories(de, newfd, conflict_offset) < 0)
>> + return -1;
>> + if (write_entries(istate, de, entries, newfd, foffsetblock) < 0)
>> + return -1;
>> + if (write_conflicts(istate, de, newfd) < 0)
>> + return -1;
>> + return ce_flush(newfd);
>> +}
In the re-roll the conflicted and the resolve-undo entries will also be
written in an extension as you suggested in the other email. I'll
re-roll the series with a POC for partial writing tomorrow if all goes
well.
next prev parent reply other threads:[~2013-11-25 15:37 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 01/24] t2104: Don't fail for index versions other than [23] Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 02/24] read-cache: use fixed width integer types Thomas Gummerer
2013-08-18 20:21 ` Eric Sunshine
2013-08-20 19:30 ` Junio C Hamano
2013-08-21 3:05 ` Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 03/24] read-cache: split index file version specific functionality Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 04/24] read-cache: clear version in discard_index() Thomas Gummerer
2013-08-20 19:34 ` Junio C Hamano
2013-08-21 3:06 ` Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 05/24] read-cache: move index v2 specific functions to their own file Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 06/24] read-cache: Don't compare uid, gid and ino on cygwin Thomas Gummerer
2013-08-18 22:34 ` Ramsay Jones
2013-08-20 8:36 ` Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 07/24] read-cache: Re-read index if index file changed Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 08/24] add documentation for the index api Thomas Gummerer
2013-08-18 20:50 ` Eric Sunshine
2013-08-18 19:41 ` [PATCH v3 09/24] read-cache: add index reading api Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 10/24] make sure partially read index is not changed Thomas Gummerer
2013-08-18 21:06 ` Eric Sunshine
2013-08-20 8:46 ` Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 11/24] grep.c: use index api Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 12/24] ls-files.c: " Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 13/24] documentation: add documentation of the index-v5 file format Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 14/24] read-cache: make in-memory format aware of stat_crc Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 15/24] read-cache: read index-v5 Thomas Gummerer
2013-08-19 1:57 ` Eric Sunshine
2013-08-20 14:01 ` Duy Nguyen
2013-08-20 20:59 ` Thomas Gummerer
2013-08-21 0:44 ` Duy Nguyen
2013-08-20 14:16 ` Duy Nguyen
2013-08-20 21:13 ` Thomas Gummerer
2013-08-23 23:52 ` Duy Nguyen
2013-08-18 19:42 ` [PATCH v3 16/24] read-cache: read resolve-undo data Thomas Gummerer
2013-08-19 1:59 ` Eric Sunshine
2013-08-18 19:42 ` [PATCH v3 17/24] read-cache: read cache-tree in index-v5 Thomas Gummerer
2013-08-24 0:09 ` Duy Nguyen
2013-11-25 15:41 ` Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 18/24] read-cache: write index-v5 Thomas Gummerer
2013-08-24 3:58 ` Duy Nguyen
2013-11-25 15:37 ` Thomas Gummerer [this message]
2013-08-24 4:07 ` Duy Nguyen
2013-08-24 9:56 ` Duy Nguyen
2013-08-18 19:42 ` [PATCH v3 19/24] read-cache: write index-v5 cache-tree data Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 20/24] read-cache: write resolve-undo data for index-v5 Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 21/24] update-index.c: rewrite index when index-version is given Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 22/24] p0003-index.sh: add perf test for the index formats Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 23/24] introduce GIT_INDEX_VERSION environment variable Thomas Gummerer
2013-08-21 0:57 ` Duy Nguyen
2013-08-21 4:01 ` Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 24/24] test-lib: allow setting the index format version Thomas Gummerer
2013-08-24 4:16 ` [PATCH v3 00/24] Index-v5 Duy Nguyen
2013-08-25 3:07 ` Junio C Hamano
2013-08-25 4:40 ` Duy Nguyen
2013-08-31 5:23 ` Thomas Gummerer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87li0co5li.fsf@gmail.com \
--to=t.gummerer@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mhagger@alum.mit.edu \
--cc=pclouds@gmail.com \
--cc=ramsay@ramsay1.demon.co.uk \
--cc=robin.rosenberg@dewire.com \
--cc=sunshine@sunshineco.com \
--cc=trast@inf.ethz.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).