* [PATCH] midx: reduce memory pressure while writing bitmaps @ 2022-07-18 20:36 Derrick Stolee via GitGitGadget 2022-07-18 21:47 ` Ævar Arnfjörð Bjarmason 2022-07-19 15:26 ` [PATCH v2 0/3] " Derrick Stolee via GitGitGadget 0 siblings, 2 replies; 8+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-07-18 20:36 UTC (permalink / raw) To: git Cc: gitster, me, vdye, chakrabortyabhradeep79, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> We noticed that some 'git multi-pack-index write --bitmap' processes were running with very high memory. It turns out that a lot of this memory is required to store a list of every object in the written multi-pack-index, with a second copy that has additional information used for the bitmap writing logic. Using 'valgrind --tool=massif' before this change, the following chart shows how memory load increased and was maintained throughout the process: GB 4.102^ :: | @ @::@@::@@::::::::@::::::@@:#:::::::::::::@@:: : | :::::@@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :::: :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :::: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : 0 +---------------------------------------------------------------> It turns out that the 'struct write_midx_context' data is persisting through the life of the process, including the 'entries' array. This array is used last inside find_commits_for_midx_bitmap() within write_midx_bitmap(). If we free (and nullify) the array at that point, we can free a decent chunk of memory before the bitmap logic adds more to the memory footprint. Here is the massif memory load chart after this change: GB 3.111^ # | # :::::::::::@::::::::::::::@ | # ::::::::::::::::::::::::: : :: : @:: ::::: :: ::@ | @# :::::::::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | :::@#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ 0 +---------------------------------------------------------------> It is unfortunate that the lifetime of the 'entries' array is less clear. To make this simpler, I added a few things to try and prevent an accidental reference: 1. Using FREE_AND_NULL() we will at least get a segfault from reading a NULL pointer instead of a use-after-free. 2. 'entries_nr' is also set to zero to make any loop that would iterate over the entries be trivial. 3. Set the 'ctx' pointer to NULL within write_midx_bitmap() so it does not get another reference later. This requires adding a local copy of 'pack_order' giving us a reference that we can use later in the method. 4. Add significant comments in write_midx_bitmap() and write_midx_internal() to add warnings for future authors who might accidentally add references to this cleared memory. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- midx: reduce memory pressure while writing bitmaps The thing I'm most worried about with this patch is whether there is enough (or too much) defensive programming. Thanks, -Stolee Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1292%2Fderrickstolee%2Fbitmap-memory-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1292/derrickstolee/bitmap-memory-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/1292 midx.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/midx.c b/midx.c index 5f0dd386b02..cc31d803a5f 100644 --- a/midx.c +++ b/midx.c @@ -1063,6 +1063,7 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash, struct commit **commits = NULL; uint32_t i, commits_nr; uint16_t options = 0; + uint32_t *pack_order; char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash)); int ret; @@ -1076,6 +1077,15 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash, commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx); + /* + * Remove the ctx.entries to reduce memory pressure. + * Nullify 'ctx' to help avoid adding new references to ctx->entries. + */ + FREE_AND_NULL(ctx->entries); + ctx->entries_nr = 0; + pack_order = ctx->pack_order; + ctx = NULL; + /* * Build the MIDX-order index based on pdata.objects (which is already * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of @@ -1102,7 +1112,7 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash, * bitmap_writer_finish(). */ for (i = 0; i < pdata.nr_objects; i++) - index[ctx->pack_order[i]] = &pdata.objects[i].idx; + index[pack_order[i]] = &pdata.objects[i].idx; bitmap_writer_select_commits(commits, commits_nr, -1); ret = bitmap_writer_build(&pdata); @@ -1443,6 +1453,11 @@ static int write_midx_internal(const char *object_dir, if (flags & MIDX_WRITE_REV_INDEX && git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0)) write_midx_reverse_index(midx_name.buf, midx_hash, &ctx); + + /* + * Writing the bitmap must be last, as it will free ctx.entries + * to reduce memory pressure during the bitmap write. + */ if (flags & MIDX_WRITE_BITMAP) { if (write_midx_bitmap(midx_name.buf, midx_hash, &ctx, refs_snapshot, flags) < 0) { base-commit: 9dd64cb4d310986dd7b8ca7fff92f9b61e0bd21a -- gitgitgadget ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] midx: reduce memory pressure while writing bitmaps 2022-07-18 20:36 [PATCH] midx: reduce memory pressure while writing bitmaps Derrick Stolee via GitGitGadget @ 2022-07-18 21:47 ` Ævar Arnfjörð Bjarmason 2022-07-19 13:50 ` Derrick Stolee 2022-07-19 15:26 ` [PATCH v2 0/3] " Derrick Stolee via GitGitGadget 1 sibling, 1 reply; 8+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-07-18 21:47 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: git, gitster, me, vdye, chakrabortyabhradeep79, Derrick Stolee On Mon, Jul 18 2022, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <derrickstolee@github.com> > [...] > It is unfortunate that the lifetime of the 'entries' array is less > clear. To make this simpler, I added a few things to try and prevent an > accidental reference: > > 1. Using FREE_AND_NULL() we will at least get a segfault from reading a > NULL pointer instead of a use-after-free. > > 2. 'entries_nr' is also set to zero to make any loop that would iterate > over the entries be trivial. > > 3. Set the 'ctx' pointer to NULL within write_midx_bitmap() so it does > not get another reference later. This requires adding a local copy > of 'pack_order' giving us a reference that we can use later in the > method. > > 4. Add significant comments in write_midx_bitmap() and > write_midx_internal() to add warnings for future authors who might > accidentally add references to this cleared memory. > [...] > + /* > + * Remove the ctx.entries to reduce memory pressure. > + * Nullify 'ctx' to help avoid adding new references to ctx->entries. > + */ > + FREE_AND_NULL(ctx->entries); > + ctx->entries_nr = 0; > + pack_order = ctx->pack_order; > + ctx = NULL; After this change this is a ~70 line function, but only 3 lines at the top actually use ctx for anything: /* the bug check for ctx.nr... */ prepare_midx_packing_data(&pdata, ctx); commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx); Did you consider just splitting it up so that that there's a "prepare write" function? Then you don't need to worry about the scoping of ctx. I'd think that would be better, then you also wouldn't need to implement your own free-ing, nothing after this seems to use ctx->entries_nr (but I just skimmed it), so it could just fall through to the free() at the end of write_midx_internal() (the only caller), couldn't it? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] midx: reduce memory pressure while writing bitmaps 2022-07-18 21:47 ` Ævar Arnfjörð Bjarmason @ 2022-07-19 13:50 ` Derrick Stolee 0 siblings, 0 replies; 8+ messages in thread From: Derrick Stolee @ 2022-07-19 13:50 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason, Derrick Stolee via GitGitGadget Cc: git, gitster, me, vdye, chakrabortyabhradeep79 On 7/18/2022 5:47 PM, Ævar Arnfjörð Bjarmason wrote: > > On Mon, Jul 18 2022, Derrick Stolee via GitGitGadget wrote: > >> From: Derrick Stolee <derrickstolee@github.com> >> [...] >> It is unfortunate that the lifetime of the 'entries' array is less >> clear. To make this simpler, I added a few things to try and prevent an >> accidental reference: >> >> 1. Using FREE_AND_NULL() we will at least get a segfault from reading a >> NULL pointer instead of a use-after-free. >> >> 2. 'entries_nr' is also set to zero to make any loop that would iterate >> over the entries be trivial. >> >> 3. Set the 'ctx' pointer to NULL within write_midx_bitmap() so it does >> not get another reference later. This requires adding a local copy >> of 'pack_order' giving us a reference that we can use later in the >> method. >> >> 4. Add significant comments in write_midx_bitmap() and >> write_midx_internal() to add warnings for future authors who might >> accidentally add references to this cleared memory. >> [...] >> + /* >> + * Remove the ctx.entries to reduce memory pressure. >> + * Nullify 'ctx' to help avoid adding new references to ctx->entries. >> + */ >> + FREE_AND_NULL(ctx->entries); >> + ctx->entries_nr = 0; >> + pack_order = ctx->pack_order; >> + ctx = NULL; > > After this change this is a ~70 line function, but only 3 lines at the > top actually use ctx for anything: > > /* the bug check for ctx.nr... */ > prepare_midx_packing_data(&pdata, ctx); > commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx); > > Did you consider just splitting it up so that that there's a "prepare > write" function? Then you don't need to worry about the scoping of ctx. I did not, and that's a good suggestion. Extracting these prepare steps into write_midx_internal() works to reduce the complexity and make the early free()ing more clear. > I'd think that would be better, then you also wouldn't need to implement > your own free-ing, nothing after this seems to use ctx->entries_nr (but > I just skimmed it), so it could just fall through to the free() at the > end of write_midx_internal() (the only caller), couldn't it? I think this paragraph misunderstands the point. The bitmaps are being computed and written before the MIDX lock file completes, so the free() of the entries array is after the bitmaps are computed. To reduce the memory pressure (by ~25%) by freeing early is the point of this patch. We still want that free(ctx.entries) after the cleanup: label for the error cases, but for the "happy path" we can free early. By doing the refactoring, this point of having an earlier free() makes things more clear. Thanks, -Stolee ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 0/3] midx: reduce memory pressure while writing bitmaps 2022-07-18 20:36 [PATCH] midx: reduce memory pressure while writing bitmaps Derrick Stolee via GitGitGadget 2022-07-18 21:47 ` Ævar Arnfjörð Bjarmason @ 2022-07-19 15:26 ` Derrick Stolee via GitGitGadget 2022-07-19 15:26 ` [PATCH v2 1/3] pack-bitmap-write: use const for hashes Derrick Stolee via GitGitGadget ` (2 more replies) 1 sibling, 3 replies; 8+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-07-19 15:26 UTC (permalink / raw) To: git Cc: gitster, me, vdye, chakrabortyabhradeep79, Ævar Arnfjörð Bjarmason, Derrick Stolee We noticed an instance where writing multi-pack-index bitmaps was taking a lot of memory. This small change can reduce the memory pressure slightly (~25%), but more will be needed to significantly reduce the memory pressure. Such a change would require updating the bitmap writing code to use on-disk data structures instead. This is particularly tricky when the multi-pack-index has not been fully written, because we don't want a point in time where the object store has a new multi-pack-index without a reachability bitmap. Updates in v2 ============= To reduce confusion on the lifetime of 'ctx.entries', some refactoring patches are inserted to first extract the use of 'ctx' out of write_midx_bitmap() and into write_midx_internal(). This makes the FREE_AND_NULL() stand out more clearly. Thanks, -Stolee Derrick Stolee (3): pack-bitmap-write: use const for hashes midx: extract bitmap write setup midx: reduce memory pressure while writing bitmaps midx.c | 69 +++++++++++++++++++++++++++++---------------- pack-bitmap-write.c | 2 +- pack-bitmap.h | 2 +- 3 files changed, 47 insertions(+), 26 deletions(-) base-commit: 9dd64cb4d310986dd7b8ca7fff92f9b61e0bd21a Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1292%2Fderrickstolee%2Fbitmap-memory-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1292/derrickstolee/bitmap-memory-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/1292 Range-diff vs v1: -: ----------- > 1: a09fdbb8f3e pack-bitmap-write: use const for hashes 1: 9104bc55795 ! 2: 4dfb7ae5112 midx: reduce memory pressure while writing bitmaps @@ Metadata Author: Derrick Stolee <derrickstolee@github.com> ## Commit message ## - midx: reduce memory pressure while writing bitmaps + midx: extract bitmap write setup - We noticed that some 'git multi-pack-index write --bitmap' processes - were running with very high memory. It turns out that a lot of this - memory is required to store a list of every object in the written - multi-pack-index, with a second copy that has additional information - used for the bitmap writing logic. + The write_midx_bitmap() method is a long method that does a lot of + steps. It requires the write_midx_context struct for use in + prepare_midx_packing_data() and find_commits_for_midx_bitmap(), but + after that only needs the pack_order array. - Using 'valgrind --tool=massif' before this change, the following chart - shows how memory load increased and was maintained throughout the - process: - - GB - 4.102^ :: - | @ @::@@::@@::::::::@::::::@@:#:::::::::::::@@:: : - | :::::@@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : - | :::: :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : - | :::: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : - | : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : - | : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : - | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : - | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : - | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : - | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : - | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : - | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : - | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : - | @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : - | @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : - | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : - | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : - | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : - | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : - 0 +---------------------------------------------------------------> - - It turns out that the 'struct write_midx_context' data is persisting - through the life of the process, including the 'entries' array. This - array is used last inside find_commits_for_midx_bitmap() within - write_midx_bitmap(). If we free (and nullify) the array at that point, - we can free a decent chunk of memory before the bitmap logic adds more - to the memory footprint. - - Here is the massif memory load chart after this change: - - GB - 3.111^ # - | # :::::::::::@::::::::::::::@ - | # ::::::::::::::::::::::::: : :: : @:: ::::: :: ::@ - | @# :::::::::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ - | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ - | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ - | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ - | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ - | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ - | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ - | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ - | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ - | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ - | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ - | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ - | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ - | :::@#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ - | :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ - | :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ - | :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ - 0 +---------------------------------------------------------------> - - It is unfortunate that the lifetime of the 'entries' array is less - clear. To make this simpler, I added a few things to try and prevent an - accidental reference: - - 1. Using FREE_AND_NULL() we will at least get a segfault from reading a - NULL pointer instead of a use-after-free. - - 2. 'entries_nr' is also set to zero to make any loop that would iterate - over the entries be trivial. - - 3. Set the 'ctx' pointer to NULL within write_midx_bitmap() so it does - not get another reference later. This requires adding a local copy - of 'pack_order' giving us a reference that we can use later in the - method. - - 4. Add significant comments in write_midx_bitmap() and - write_midx_internal() to add warnings for future authors who might - accidentally add references to this cleared memory. + This is a messy, but completely non-functional refactoring. The code is + only being moved around to reduce visibility of the write_midx_context + during the longest part of computing reachability bitmaps. Signed-off-by: Derrick Stolee <derrickstolee@github.com> ## midx.c ## -@@ midx.c: static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash, - struct commit **commits = NULL; - uint32_t i, commits_nr; - uint16_t options = 0; -+ uint32_t *pack_order; - char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash)); - int ret; +@@ midx.c: static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr + return cb.commits; + } -@@ midx.c: static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash, +-static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash, +- struct write_midx_context *ctx, ++static int write_midx_bitmap(const char *midx_name, ++ const unsigned char *midx_hash, ++ struct packing_data *pdata, ++ struct commit **commits, ++ uint32_t commits_nr, ++ uint32_t *pack_order, + const char *refs_snapshot, + unsigned flags) + { +- struct packing_data pdata; +- struct pack_idx_entry **index; +- struct commit **commits = NULL; +- uint32_t i, commits_nr; ++ int ret, i; + uint16_t options = 0; +- char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash)); +- int ret; +- +- if (!ctx->entries_nr) +- BUG("cannot write a bitmap without any objects"); ++ struct pack_idx_entry **index; ++ char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, ++ hash_to_hex(midx_hash)); - commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx); + if (flags & MIDX_WRITE_BITMAP_HASH_CACHE) + options |= BITMAP_OPT_HASH_CACHE; -+ /* -+ * Remove the ctx.entries to reduce memory pressure. -+ * Nullify 'ctx' to help avoid adding new references to ctx->entries. -+ */ -+ FREE_AND_NULL(ctx->entries); -+ ctx->entries_nr = 0; -+ pack_order = ctx->pack_order; -+ ctx = NULL; -+ +- prepare_midx_packing_data(&pdata, ctx); +- +- commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx); +- /* * Build the MIDX-order index based on pdata.objects (which is already * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of + * this order). + */ +- ALLOC_ARRAY(index, pdata.nr_objects); +- for (i = 0; i < pdata.nr_objects; i++) +- index[i] = &pdata.objects[i].idx; ++ ALLOC_ARRAY(index, pdata->nr_objects); ++ for (i = 0; i < pdata->nr_objects; i++) ++ index[i] = &pdata->objects[i].idx; + + bitmap_writer_show_progress(flags & MIDX_PROGRESS); +- bitmap_writer_build_type_index(&pdata, index, pdata.nr_objects); ++ bitmap_writer_build_type_index(pdata, index, pdata->nr_objects); + + /* + * bitmap_writer_finish expects objects in lex order, but pack_order @@ midx.c: static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash, + * happens between bitmap_writer_build_type_index() and * bitmap_writer_finish(). */ - for (i = 0; i < pdata.nr_objects; i++) +- for (i = 0; i < pdata.nr_objects; i++) - index[ctx->pack_order[i]] = &pdata.objects[i].idx; -+ index[pack_order[i]] = &pdata.objects[i].idx; ++ for (i = 0; i < pdata->nr_objects; i++) ++ index[pack_order[i]] = &pdata->objects[i].idx; bitmap_writer_select_commits(commits, commits_nr, -1); - ret = bitmap_writer_build(&pdata); +- ret = bitmap_writer_build(&pdata); ++ ret = bitmap_writer_build(pdata); + if (ret < 0) + goto cleanup; + + bitmap_writer_set_checksum(midx_hash); +- bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, options); ++ bitmap_writer_finish(index, pdata->nr_objects, bitmap_name, options); + + cleanup: + free(index); @@ midx.c: static int write_midx_internal(const char *object_dir, if (flags & MIDX_WRITE_REV_INDEX && git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0)) write_midx_reverse_index(midx_name.buf, midx_hash, &ctx); + -+ /* -+ * Writing the bitmap must be last, as it will free ctx.entries -+ * to reduce memory pressure during the bitmap write. -+ */ if (flags & MIDX_WRITE_BITMAP) { - if (write_midx_bitmap(midx_name.buf, midx_hash, &ctx, +- if (write_midx_bitmap(midx_name.buf, midx_hash, &ctx, ++ struct packing_data pdata; ++ struct commit **commits; ++ uint32_t commits_nr; ++ ++ if (!ctx.entries_nr) ++ BUG("cannot write a bitmap without any objects"); ++ ++ prepare_midx_packing_data(&pdata, &ctx); ++ ++ commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, &ctx); ++ ++ if (write_midx_bitmap(midx_name.buf, midx_hash, &pdata, ++ commits, commits_nr, ctx.pack_order, refs_snapshot, flags) < 0) { + error(_("could not write multi-pack bitmap")); + result = 1; -: ----------- > 3: 98e72f71b6b midx: reduce memory pressure while writing bitmaps -- gitgitgadget ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 1/3] pack-bitmap-write: use const for hashes 2022-07-19 15:26 ` [PATCH v2 0/3] " Derrick Stolee via GitGitGadget @ 2022-07-19 15:26 ` Derrick Stolee via GitGitGadget 2022-07-19 15:26 ` [PATCH v2 2/3] midx: extract bitmap write setup Derrick Stolee via GitGitGadget 2022-07-19 15:26 ` [PATCH v2 3/3] midx: reduce memory pressure while writing bitmaps Derrick Stolee via GitGitGadget 2 siblings, 0 replies; 8+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-07-19 15:26 UTC (permalink / raw) To: git Cc: gitster, me, vdye, chakrabortyabhradeep79, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> The next change will use a const array when calling this method. There is no need for the non-const version, so let's do this cleanup quickly. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- pack-bitmap-write.c | 2 +- pack-bitmap.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index c43375bd344..4fcfaed428f 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -683,7 +683,7 @@ static void write_hash_cache(struct hashfile *f, } } -void bitmap_writer_set_checksum(unsigned char *sha1) +void bitmap_writer_set_checksum(const unsigned char *sha1) { hashcpy(writer.pack_checksum, sha1); } diff --git a/pack-bitmap.h b/pack-bitmap.h index 3d3ddd77345..f3a57ca065f 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -75,7 +75,7 @@ int bitmap_has_oid_in_uninteresting(struct bitmap_index *, const struct object_i off_t get_disk_usage_from_bitmap(struct bitmap_index *, struct rev_info *); void bitmap_writer_show_progress(int show); -void bitmap_writer_set_checksum(unsigned char *sha1); +void bitmap_writer_set_checksum(const unsigned char *sha1); void bitmap_writer_build_type_index(struct packing_data *to_pack, struct pack_idx_entry **index, uint32_t index_nr); -- gitgitgadget ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 2/3] midx: extract bitmap write setup 2022-07-19 15:26 ` [PATCH v2 0/3] " Derrick Stolee via GitGitGadget 2022-07-19 15:26 ` [PATCH v2 1/3] pack-bitmap-write: use const for hashes Derrick Stolee via GitGitGadget @ 2022-07-19 15:26 ` Derrick Stolee via GitGitGadget 2022-07-19 15:26 ` [PATCH v2 3/3] midx: reduce memory pressure while writing bitmaps Derrick Stolee via GitGitGadget 2 siblings, 0 replies; 8+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-07-19 15:26 UTC (permalink / raw) To: git Cc: gitster, me, vdye, chakrabortyabhradeep79, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> The write_midx_bitmap() method is a long method that does a lot of steps. It requires the write_midx_context struct for use in prepare_midx_packing_data() and find_commits_for_midx_bitmap(), but after that only needs the pack_order array. This is a messy, but completely non-functional refactoring. The code is only being moved around to reduce visibility of the write_midx_context during the longest part of computing reachability bitmaps. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- midx.c | 56 ++++++++++++++++++++++++++++++++------------------------ 1 file changed, 32 insertions(+), 24 deletions(-) diff --git a/midx.c b/midx.c index 5f0dd386b02..e2dd808b35d 100644 --- a/midx.c +++ b/midx.c @@ -1053,40 +1053,35 @@ static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr return cb.commits; } -static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash, - struct write_midx_context *ctx, +static int write_midx_bitmap(const char *midx_name, + const unsigned char *midx_hash, + struct packing_data *pdata, + struct commit **commits, + uint32_t commits_nr, + uint32_t *pack_order, const char *refs_snapshot, unsigned flags) { - struct packing_data pdata; - struct pack_idx_entry **index; - struct commit **commits = NULL; - uint32_t i, commits_nr; + int ret, i; uint16_t options = 0; - char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash)); - int ret; - - if (!ctx->entries_nr) - BUG("cannot write a bitmap without any objects"); + struct pack_idx_entry **index; + char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, + hash_to_hex(midx_hash)); if (flags & MIDX_WRITE_BITMAP_HASH_CACHE) options |= BITMAP_OPT_HASH_CACHE; - prepare_midx_packing_data(&pdata, ctx); - - commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx); - /* * Build the MIDX-order index based on pdata.objects (which is already * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of * this order). */ - ALLOC_ARRAY(index, pdata.nr_objects); - for (i = 0; i < pdata.nr_objects; i++) - index[i] = &pdata.objects[i].idx; + ALLOC_ARRAY(index, pdata->nr_objects); + for (i = 0; i < pdata->nr_objects; i++) + index[i] = &pdata->objects[i].idx; bitmap_writer_show_progress(flags & MIDX_PROGRESS); - bitmap_writer_build_type_index(&pdata, index, pdata.nr_objects); + bitmap_writer_build_type_index(pdata, index, pdata->nr_objects); /* * bitmap_writer_finish expects objects in lex order, but pack_order @@ -1101,16 +1096,16 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash, * happens between bitmap_writer_build_type_index() and * bitmap_writer_finish(). */ - for (i = 0; i < pdata.nr_objects; i++) - index[ctx->pack_order[i]] = &pdata.objects[i].idx; + for (i = 0; i < pdata->nr_objects; i++) + index[pack_order[i]] = &pdata->objects[i].idx; bitmap_writer_select_commits(commits, commits_nr, -1); - ret = bitmap_writer_build(&pdata); + ret = bitmap_writer_build(pdata); if (ret < 0) goto cleanup; bitmap_writer_set_checksum(midx_hash); - bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, options); + bitmap_writer_finish(index, pdata->nr_objects, bitmap_name, options); cleanup: free(index); @@ -1443,8 +1438,21 @@ static int write_midx_internal(const char *object_dir, if (flags & MIDX_WRITE_REV_INDEX && git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0)) write_midx_reverse_index(midx_name.buf, midx_hash, &ctx); + if (flags & MIDX_WRITE_BITMAP) { - if (write_midx_bitmap(midx_name.buf, midx_hash, &ctx, + struct packing_data pdata; + struct commit **commits; + uint32_t commits_nr; + + if (!ctx.entries_nr) + BUG("cannot write a bitmap without any objects"); + + prepare_midx_packing_data(&pdata, &ctx); + + commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, &ctx); + + if (write_midx_bitmap(midx_name.buf, midx_hash, &pdata, + commits, commits_nr, ctx.pack_order, refs_snapshot, flags) < 0) { error(_("could not write multi-pack bitmap")); result = 1; -- gitgitgadget ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 3/3] midx: reduce memory pressure while writing bitmaps 2022-07-19 15:26 ` [PATCH v2 0/3] " Derrick Stolee via GitGitGadget 2022-07-19 15:26 ` [PATCH v2 1/3] pack-bitmap-write: use const for hashes Derrick Stolee via GitGitGadget 2022-07-19 15:26 ` [PATCH v2 2/3] midx: extract bitmap write setup Derrick Stolee via GitGitGadget @ 2022-07-19 15:26 ` Derrick Stolee via GitGitGadget 2022-07-19 15:59 ` Junio C Hamano 2 siblings, 1 reply; 8+ messages in thread From: Derrick Stolee via GitGitGadget @ 2022-07-19 15:26 UTC (permalink / raw) To: git Cc: gitster, me, vdye, chakrabortyabhradeep79, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <derrickstolee@github.com> We noticed that some 'git multi-pack-index write --bitmap' processes were running with very high memory. It turns out that a lot of this memory is required to store a list of every object in the written multi-pack-index, with a second copy that has additional information used for the bitmap writing logic. Using 'valgrind --tool=massif' before this change, the following chart shows how memory load increased and was maintained throughout the process: GB 4.102^ :: | @ @::@@::@@::::::::@::::::@@:#:::::::::::::@@:: : | :::::@@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :::: :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :::: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : 0 +---------------------------------------------------------------> It turns out that the 'struct write_midx_context' data is persisting through the life of the process, including the 'entries' array. This array is used last inside find_commits_for_midx_bitmap() within write_midx_bitmap(). If we free (and nullify) the array at that point, we can free a decent chunk of memory before the bitmap logic adds more to the memory footprint. Here is the massif memory load chart after this change: GB 3.111^ # | # :::::::::::@::::::::::::::@ | # ::::::::::::::::::::::::: : :: : @:: ::::: :: ::@ | @# :::::::::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | :::@#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ 0 +---------------------------------------------------------------> The previous change introduced a refactoring of write_midx_bitmap() to make it more clear how much of the 'struct write_midx_context' instance is needed at different parts of the process. In addition, the following defensive programming measures were put in place: 1. Using FREE_AND_NULL() we will at least get a segfault from reading a NULL pointer instead of a use-after-free. 2. 'entries_nr' is also set to zero to make any loop that would iterate over the entries be trivial. 3. Add significant comments in write_midx_internal() to add warnings for future authors who might accidentally add references to this cleared memory. Signed-off-by: Derrick Stolee <derrickstolee@github.com> --- midx.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/midx.c b/midx.c index e2dd808b35d..772ab7d2944 100644 --- a/midx.c +++ b/midx.c @@ -1451,6 +1451,15 @@ static int write_midx_internal(const char *object_dir, commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, &ctx); + /* + * The previous steps translated the information from + * 'entries' into information suitable for constructing + * bitmaps. We no longer need that array, so clear it to + * reduce memory pressure. + */ + FREE_AND_NULL(ctx.entries); + ctx.entries_nr = 0; + if (write_midx_bitmap(midx_name.buf, midx_hash, &pdata, commits, commits_nr, ctx.pack_order, refs_snapshot, flags) < 0) { @@ -1459,6 +1468,10 @@ static int write_midx_internal(const char *object_dir, goto cleanup; } } + /* + * NOTE: Do not use ctx.entries beyond this point, since it might + * have been freed in the previous if block. + */ if (ctx.m) close_object_store(the_repository->objects); -- gitgitgadget ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 3/3] midx: reduce memory pressure while writing bitmaps 2022-07-19 15:26 ` [PATCH v2 3/3] midx: reduce memory pressure while writing bitmaps Derrick Stolee via GitGitGadget @ 2022-07-19 15:59 ` Junio C Hamano 0 siblings, 0 replies; 8+ messages in thread From: Junio C Hamano @ 2022-07-19 15:59 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: git, me, vdye, chakrabortyabhradeep79, Ævar Arnfjörð Bjarmason, Derrick Stolee "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > diff --git a/midx.c b/midx.c > index e2dd808b35d..772ab7d2944 100644 > --- a/midx.c > +++ b/midx.c > @@ -1451,6 +1451,15 @@ static int write_midx_internal(const char *object_dir, > > commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, &ctx); > > + /* > + * The previous steps translated the information from > + * 'entries' into information suitable for constructing > + * bitmaps. We no longer need that array, so clear it to > + * reduce memory pressure. > + */ > + FREE_AND_NULL(ctx.entries); > + ctx.entries_nr = 0; > + > if (write_midx_bitmap(midx_name.buf, midx_hash, &pdata, > commits, commits_nr, ctx.pack_order, > refs_snapshot, flags) < 0) { As the reduced helper, thanks to step [1/3], only takes the pack_order[] array, without being even aware of other members in the ctx struct, it is immediately obvious that this early freeing is safe for this call. It is a bit messy. I've been staring at the code and was wondering if we can just get rid of pack_order member from the context, and make pack_order a separate local variable that belong to this function. The separate variable needs to be packaged together with ctx back to please the chunk-format API, so it may require more boilerplate code and may not be an overall win. > @@ -1459,6 +1468,10 @@ static int write_midx_internal(const char *object_dir, > goto cleanup; > } > } > + /* > + * NOTE: Do not use ctx.entries beyond this point, since it might > + * have been freed in the previous if block. > + */ OK. > if (ctx.m) > close_object_store(the_repository->objects); ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-07-19 15:59 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-07-18 20:36 [PATCH] midx: reduce memory pressure while writing bitmaps Derrick Stolee via GitGitGadget 2022-07-18 21:47 ` Ævar Arnfjörð Bjarmason 2022-07-19 13:50 ` Derrick Stolee 2022-07-19 15:26 ` [PATCH v2 0/3] " Derrick Stolee via GitGitGadget 2022-07-19 15:26 ` [PATCH v2 1/3] pack-bitmap-write: use const for hashes Derrick Stolee via GitGitGadget 2022-07-19 15:26 ` [PATCH v2 2/3] midx: extract bitmap write setup Derrick Stolee via GitGitGadget 2022-07-19 15:26 ` [PATCH v2 3/3] midx: reduce memory pressure while writing bitmaps Derrick Stolee via GitGitGadget 2022-07-19 15:59 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).