* [PATCH 0/3] read-cache: use stack ondisk struct when writing index @ 2017-08-21 21:24 Kevin Willford 2017-08-21 21:24 ` [PATCH 1/3] perf: add test for writing the index Kevin Willford ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: Kevin Willford @ 2017-08-21 21:24 UTC (permalink / raw) To: git; +Cc: gitster, peff, peartben, Kevin Willford When writing out the index, the ondisk struct was being allocated and freed within the loop of cache entries. A better way would be to use a ondisk struct on the stack and reuse it avoiding the alloc and free calls. A test has been added to measure the performance of writing the index (p0007-write-cache). Running this test on smaller repos showed no degradation in performance. On larger repos there was ~3-7% improvement. 0007.2: write_locked_index 10 times (1000001 files) 5.98(0.00+0.04) 5.75(0.01+0.04) -3.8% 0007.2: write_locked_index 10 times (1000001 files) 6.20(0.00+0.06) 5.86(0.01+0.03) -5.5% 0007.2: write_locked_index 3 times (4394531 files) 10.29(0.04+0.03) 9.75(0.04+0.01) -5.2% 0007.2: write_locked_index 3 times (4394531 files) 10.52(0.00+0.04) 9.79(0.03+0.03) -6.9% Kevin Willford (3): perf: add test for writing the index read-cache: fix memory leak in do_write_index read-cache: avoid allocating every ondisk entry when writing Makefile | 1 + read-cache.c | 62 +++++++++++++++++++++++++-------------------- t/helper/test-write-cache.c | 23 +++++++++++++++++ t/perf/p0007-write-cache.sh | 29 +++++++++++++++++++++ 4 files changed, 87 insertions(+), 28 deletions(-) create mode 100644 t/helper/test-write-cache.c create mode 100755 t/perf/p0007-write-cache.sh -- 2.14.1.205.g2812f3410d ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/3] perf: add test for writing the index 2017-08-21 21:24 [PATCH 0/3] read-cache: use stack ondisk struct when writing index Kevin Willford @ 2017-08-21 21:24 ` Kevin Willford 2017-08-21 21:24 ` [PATCH 2/3] read-cache: fix memory leak in do_write_index Kevin Willford 2017-08-21 21:24 ` [PATCH 3/3] read-cache: avoid allocating every ondisk entry when writing Kevin Willford 2 siblings, 0 replies; 6+ messages in thread From: Kevin Willford @ 2017-08-21 21:24 UTC (permalink / raw) To: git; +Cc: gitster, peff, peartben, Kevin Willford A performance test for writing the index to be able to determine if changes to allocating ondisk structure help. Signed-off-by: Kevin Willford <kewillf@microsoft.com> --- Makefile | 1 + t/helper/test-write-cache.c | 23 +++++++++++++++++++++++ t/perf/p0007-write-cache.sh | 29 +++++++++++++++++++++++++++++ 3 files changed, 53 insertions(+) create mode 100644 t/helper/test-write-cache.c create mode 100755 t/perf/p0007-write-cache.sh diff --git a/Makefile b/Makefile index 86ec29202b..c6b061086f 100644 --- a/Makefile +++ b/Makefile @@ -655,6 +655,7 @@ TEST_PROGRAMS_NEED_X += test-parse-options TEST_PROGRAMS_NEED_X += test-path-utils TEST_PROGRAMS_NEED_X += test-prio-queue TEST_PROGRAMS_NEED_X += test-read-cache +TEST_PROGRAMS_NEED_X += test-write-cache TEST_PROGRAMS_NEED_X += test-ref-store TEST_PROGRAMS_NEED_X += test-regex TEST_PROGRAMS_NEED_X += test-revision-walking diff --git a/t/helper/test-write-cache.c b/t/helper/test-write-cache.c new file mode 100644 index 0000000000..b7ee039669 --- /dev/null +++ b/t/helper/test-write-cache.c @@ -0,0 +1,23 @@ +#include "cache.h" +#include "lockfile.h" + +static struct lock_file index_lock; + +int cmd_main(int argc, const char **argv) +{ + int i, cnt = 1, lockfd; + if (argc == 2) + cnt = strtol(argv[1], NULL, 0); + setup_git_directory(); + read_cache(); + for (i = 0; i < cnt; i++) { + lockfd = hold_locked_index(&index_lock, LOCK_DIE_ON_ERROR); + if (0 <= lockfd) { + write_locked_index(&the_index, &index_lock, COMMIT_LOCK); + } else { + rollback_lock_file(&index_lock); + } + } + + return 0; +} diff --git a/t/perf/p0007-write-cache.sh b/t/perf/p0007-write-cache.sh new file mode 100755 index 0000000000..261fe92fd9 --- /dev/null +++ b/t/perf/p0007-write-cache.sh @@ -0,0 +1,29 @@ +#!/bin/sh + +test_description="Tests performance of writing the index" + +. ./perf-lib.sh + +test_perf_default_repo + +test_expect_success "setup repo" ' + if git rev-parse --verify refs/heads/p0006-ballast^{commit} + then + echo Assuming synthetic repo from many-files.sh + git config --local core.sparsecheckout 1 + cat >.git/info/sparse-checkout <<-EOF + /* + !ballast/* + EOF + else + echo Assuming non-synthetic repo... + fi && + nr_files=$(git ls-files | wc -l) +' + +count=3 +test_perf "write_locked_index $count times ($nr_files files)" " + test-write-cache $count +" + +test_done -- 2.14.1.205.g2812f3410d ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/3] read-cache: fix memory leak in do_write_index 2017-08-21 21:24 [PATCH 0/3] read-cache: use stack ondisk struct when writing index Kevin Willford 2017-08-21 21:24 ` [PATCH 1/3] perf: add test for writing the index Kevin Willford @ 2017-08-21 21:24 ` Kevin Willford 2017-08-21 23:02 ` Junio C Hamano 2017-08-21 21:24 ` [PATCH 3/3] read-cache: avoid allocating every ondisk entry when writing Kevin Willford 2 siblings, 1 reply; 6+ messages in thread From: Kevin Willford @ 2017-08-21 21:24 UTC (permalink / raw) To: git; +Cc: gitster, peff, peartben, Kevin Willford The previous_name_buf was never getting released when there was an error in ce_write_entry or allow was false and execution was returned to the caller. Signed-off-by: Kevin Willford <kewillf@microsoft.com> --- read-cache.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/read-cache.c b/read-cache.c index acfb028f48..47220cc30d 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2192,7 +2192,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, int newfd = tempfile->fd; git_SHA_CTX c; struct cache_header hdr; - int i, err, removed, extended, hdr_version; + int i, err = 0, removed, extended, hdr_version; struct cache_entry **cache = istate->cache; int entries = istate->cache_nr; struct stat st; @@ -2247,15 +2247,21 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, if (allow) warning(msg, ce->name); else - return error(msg, ce->name); + err = error(msg, ce->name); drop_cache_tree = 1; } if (ce_write_entry(&c, newfd, ce, previous_name) < 0) - return -1; + err = -1; + + if (err) + break; } strbuf_release(&previous_name_buf); + if (err) + return err; + /* Write extension data here */ if (!strip_extensions && istate->split_index) { struct strbuf sb = STRBUF_INIT; -- 2.14.1.205.g2812f3410d ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 2/3] read-cache: fix memory leak in do_write_index 2017-08-21 21:24 ` [PATCH 2/3] read-cache: fix memory leak in do_write_index Kevin Willford @ 2017-08-21 23:02 ` Junio C Hamano 0 siblings, 0 replies; 6+ messages in thread From: Junio C Hamano @ 2017-08-21 23:02 UTC (permalink / raw) To: Kevin Willford; +Cc: git, peff, peartben Kevin Willford <kewillf@microsoft.com> writes: > The previous_name_buf was never getting released when there > was an error in ce_write_entry or allow was false and execution > was returned to the caller. > > Signed-off-by: Kevin Willford <kewillf@microsoft.com> > --- Thanks for spotting an old error that dates back to at least 2013 this month. The patch looks sensible. > read-cache.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/read-cache.c b/read-cache.c > index acfb028f48..47220cc30d 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -2192,7 +2192,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, > int newfd = tempfile->fd; > git_SHA_CTX c; > struct cache_header hdr; > - int i, err, removed, extended, hdr_version; > + int i, err = 0, removed, extended, hdr_version; > struct cache_entry **cache = istate->cache; > int entries = istate->cache_nr; > struct stat st; > @@ -2247,15 +2247,21 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, > if (allow) > warning(msg, ce->name); > else > - return error(msg, ce->name); > + err = error(msg, ce->name); > > drop_cache_tree = 1; > } > if (ce_write_entry(&c, newfd, ce, previous_name) < 0) > - return -1; > + err = -1; > + > + if (err) > + break; > } > strbuf_release(&previous_name_buf); > > + if (err) > + return err; > + > /* Write extension data here */ > if (!strip_extensions && istate->split_index) { > struct strbuf sb = STRBUF_INIT; ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 3/3] read-cache: avoid allocating every ondisk entry when writing 2017-08-21 21:24 [PATCH 0/3] read-cache: use stack ondisk struct when writing index Kevin Willford 2017-08-21 21:24 ` [PATCH 1/3] perf: add test for writing the index Kevin Willford 2017-08-21 21:24 ` [PATCH 2/3] read-cache: fix memory leak in do_write_index Kevin Willford @ 2017-08-21 21:24 ` Kevin Willford 2017-08-21 23:42 ` Junio C Hamano 2 siblings, 1 reply; 6+ messages in thread From: Kevin Willford @ 2017-08-21 21:24 UTC (permalink / raw) To: git; +Cc: gitster, peff, peartben, Kevin Willford When writing the index for each entry an ondisk struct will be allocated and freed in ce_write_entry. We can do better by using a ondisk struct on the stack for each entry. This is accomplished by using a stack ondisk_cache_entry_extended outside looping through the entries in do_write_index. Only the fixed fields of this struct are used when writing and depending on whether it is extended or not the flags2 field will be written. The name field is not used and instead the cache_entry name field is used directly when writing out the name. Because ce_write is using a buffer and memcpy to fill the buffer before flushing to disk, we don't have to worry about doing multiple ce_write calls. Running the p0007-write-cache.sh tests would save anywhere between 3-7% when the index had over a million entries with no performance degradation on small repos. Signed-off-by: Kevin Willford <kewillf@microsoft.com> --- read-cache.c | 50 +++++++++++++++++++++++++------------------------- 1 file changed, 25 insertions(+), 25 deletions(-) diff --git a/read-cache.c b/read-cache.c index 47220cc30d..694bed8d82 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1499,6 +1499,7 @@ struct ondisk_cache_entry_extended { }; /* These are only used for v3 or lower */ +#define align_padding_size(size, len) ((size + (len) + 8) & ~7) - (size + len) #define align_flex_name(STRUCT,len) ((offsetof(struct STRUCT,name) + (len) + 8) & ~7) #define ondisk_cache_entry_size(len) align_flex_name(ondisk_cache_entry,len) #define ondisk_cache_entry_extended_size(len) align_flex_name(ondisk_cache_entry_extended,len) @@ -2032,7 +2033,7 @@ static void ce_smudge_racily_clean_entry(struct cache_entry *ce) } /* Copy miscellaneous fields but not the name */ -static char *copy_cache_entry_to_ondisk(struct ondisk_cache_entry *ondisk, +static void copy_cache_entry_to_ondisk(struct ondisk_cache_entry *ondisk, struct cache_entry *ce) { short flags; @@ -2056,32 +2057,35 @@ static char *copy_cache_entry_to_ondisk(struct ondisk_cache_entry *ondisk, struct ondisk_cache_entry_extended *ondisk2; ondisk2 = (struct ondisk_cache_entry_extended *)ondisk; ondisk2->flags2 = htons((ce->ce_flags & CE_EXTENDED_FLAGS) >> 16); - return ondisk2->name; - } - else { - return ondisk->name; } } static int ce_write_entry(git_SHA_CTX *c, int fd, struct cache_entry *ce, - struct strbuf *previous_name) + struct strbuf *previous_name, struct ondisk_cache_entry *ondisk) { int size; - struct ondisk_cache_entry *ondisk; int saved_namelen = saved_namelen; /* compiler workaround */ - char *name; int result; + static unsigned char padding[8] = { 0x00 }; if (ce->ce_flags & CE_STRIP_NAME) { saved_namelen = ce_namelen(ce); ce->ce_namelen = 0; } + if (ce->ce_flags & CE_EXTENDED) + size = offsetof(struct ondisk_cache_entry_extended, name); + else + size = offsetof(struct ondisk_cache_entry, name); + if (!previous_name) { - size = ondisk_ce_size(ce); - ondisk = xcalloc(1, size); - name = copy_cache_entry_to_ondisk(ondisk, ce); - memcpy(name, ce->name, ce_namelen(ce)); + int len = ce_namelen(ce); + copy_cache_entry_to_ondisk(ondisk, ce); + result = ce_write(c, fd, ondisk, size); + if (!result) + result = ce_write(c, fd, ce->name, len); + if (!result) + result = ce_write(c, fd, padding, align_padding_size(size, len)); } else { int common, to_remove, prefix_size; unsigned char to_remove_vi[16]; @@ -2094,16 +2098,12 @@ static int ce_write_entry(git_SHA_CTX *c, int fd, struct cache_entry *ce, to_remove = previous_name->len - common; prefix_size = encode_varint(to_remove, to_remove_vi); - if (ce->ce_flags & CE_EXTENDED) - size = offsetof(struct ondisk_cache_entry_extended, name); - else - size = offsetof(struct ondisk_cache_entry, name); - size += prefix_size + (ce_namelen(ce) - common + 1); - - ondisk = xcalloc(1, size); - name = copy_cache_entry_to_ondisk(ondisk, ce); - memcpy(name, to_remove_vi, prefix_size); - memcpy(name + prefix_size, ce->name + common, ce_namelen(ce) - common); + copy_cache_entry_to_ondisk(ondisk, ce); + result = ce_write(c, fd, ondisk, size); + if (!result) + result = ce_write(c, fd, to_remove_vi, prefix_size); + if (!result) + result = ce_write(c, fd, ce->name + common, ce_namelen(ce) - common + 1); strbuf_splice(previous_name, common, to_remove, ce->name + common, ce_namelen(ce) - common); @@ -2113,8 +2113,6 @@ static int ce_write_entry(git_SHA_CTX *c, int fd, struct cache_entry *ce, ce->ce_flags &= ~CE_STRIP_NAME; } - result = ce_write(c, fd, ondisk, size); - free(ondisk); return result; } @@ -2196,6 +2194,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, struct cache_entry **cache = istate->cache; int entries = istate->cache_nr; struct stat st; + struct ondisk_cache_entry_extended ondisk; struct strbuf previous_name_buf = STRBUF_INIT, *previous_name; int drop_cache_tree = 0; @@ -2232,6 +2231,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, return -1; previous_name = (hdr_version == 4) ? &previous_name_buf : NULL; + for (i = 0; i < entries; i++) { struct cache_entry *ce = cache[i]; if (ce->ce_flags & CE_REMOVE) @@ -2251,7 +2251,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, drop_cache_tree = 1; } - if (ce_write_entry(&c, newfd, ce, previous_name) < 0) + if (ce_write_entry(&c, newfd, ce, previous_name, (struct ondisk_cache_entry *)&ondisk) < 0) err = -1; if (err) -- 2.14.1.205.g2812f3410d ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 3/3] read-cache: avoid allocating every ondisk entry when writing 2017-08-21 21:24 ` [PATCH 3/3] read-cache: avoid allocating every ondisk entry when writing Kevin Willford @ 2017-08-21 23:42 ` Junio C Hamano 0 siblings, 0 replies; 6+ messages in thread From: Junio C Hamano @ 2017-08-21 23:42 UTC (permalink / raw) To: Kevin Willford; +Cc: git, peff, peartben Kevin Willford <kewillf@microsoft.com> writes: > When writing the index for each entry an ondisk struct will be > allocated and freed in ce_write_entry. We can do better by > using a ondisk struct on the stack for each entry. > > This is accomplished by using a stack ondisk_cache_entry_extended > outside looping through the entries in do_write_index. Only the > fixed fields of this struct are used when writing and depending on > whether it is extended or not the flags2 field will be written. > The name field is not used and instead the cache_entry name field > is used directly when writing out the name. Because ce_write is > using a buffer and memcpy to fill the buffer before flushing to disk, > we don't have to worry about doing multiple ce_write calls. The code already heavily assumes that the only difference between ondisk_cache_entry and its _extended variant are identical in their earlier parts, so having a single instance of the larger one on the stack and passing it around makes sense as an optimization. Multiple chomped calls to ce_write() does make me nervous, not due to potential I/O cost (which is none) but to potential programming mistakes for people who touch the code. The version immediately after this patch looks correct in that once "result" is set to non zero to indicate an error nothing is written out, but keeping that property over time, when a newer index format is eventually invented, may become an additional burden. But overall, I think this is a good change. Will queue all three. Thanks. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-08-21 23:42 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-08-21 21:24 [PATCH 0/3] read-cache: use stack ondisk struct when writing index Kevin Willford 2017-08-21 21:24 ` [PATCH 1/3] perf: add test for writing the index Kevin Willford 2017-08-21 21:24 ` [PATCH 2/3] read-cache: fix memory leak in do_write_index Kevin Willford 2017-08-21 23:02 ` Junio C Hamano 2017-08-21 21:24 ` [PATCH 3/3] read-cache: avoid allocating every ondisk entry when writing Kevin Willford 2017-08-21 23:42 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).