From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com,
chakrabortyabhradeep79@gmail.com,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"Derrick Stolee" <derrickstolee@github.com>
Subject: [PATCH v2 0/3] midx: reduce memory pressure while writing bitmaps
Date: Tue, 19 Jul 2022 15:26:03 +0000 [thread overview]
Message-ID: <pull.1292.v2.git.1658244366.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1292.git.1658176565751.gitgitgadget@gmail.com>
We noticed an instance where writing multi-pack-index bitmaps was taking a
lot of memory. This small change can reduce the memory pressure slightly
(~25%), but more will be needed to significantly reduce the memory pressure.
Such a change would require updating the bitmap writing code to use on-disk
data structures instead. This is particularly tricky when the
multi-pack-index has not been fully written, because we don't want a point
in time where the object store has a new multi-pack-index without a
reachability bitmap.
Updates in v2
=============
To reduce confusion on the lifetime of 'ctx.entries', some refactoring
patches are inserted to first extract the use of 'ctx' out of
write_midx_bitmap() and into write_midx_internal(). This makes the
FREE_AND_NULL() stand out more clearly.
Thanks, -Stolee
Derrick Stolee (3):
pack-bitmap-write: use const for hashes
midx: extract bitmap write setup
midx: reduce memory pressure while writing bitmaps
midx.c | 69 +++++++++++++++++++++++++++++----------------
pack-bitmap-write.c | 2 +-
pack-bitmap.h | 2 +-
3 files changed, 47 insertions(+), 26 deletions(-)
base-commit: 9dd64cb4d310986dd7b8ca7fff92f9b61e0bd21a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1292%2Fderrickstolee%2Fbitmap-memory-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1292/derrickstolee/bitmap-memory-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1292
Range-diff vs v1:
-: ----------- > 1: a09fdbb8f3e pack-bitmap-write: use const for hashes
1: 9104bc55795 ! 2: 4dfb7ae5112 midx: reduce memory pressure while writing bitmaps
@@ Metadata
Author: Derrick Stolee <derrickstolee@github.com>
## Commit message ##
- midx: reduce memory pressure while writing bitmaps
+ midx: extract bitmap write setup
- We noticed that some 'git multi-pack-index write --bitmap' processes
- were running with very high memory. It turns out that a lot of this
- memory is required to store a list of every object in the written
- multi-pack-index, with a second copy that has additional information
- used for the bitmap writing logic.
+ The write_midx_bitmap() method is a long method that does a lot of
+ steps. It requires the write_midx_context struct for use in
+ prepare_midx_packing_data() and find_commits_for_midx_bitmap(), but
+ after that only needs the pack_order array.
- Using 'valgrind --tool=massif' before this change, the following chart
- shows how memory load increased and was maintained throughout the
- process:
-
- GB
- 4.102^ ::
- | @ @::@@::@@::::::::@::::::@@:#:::::::::::::@@:: :
- | :::::@@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | :::: :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | :::: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- 0 +--------------------------------------------------------------->
-
- It turns out that the 'struct write_midx_context' data is persisting
- through the life of the process, including the 'entries' array. This
- array is used last inside find_commits_for_midx_bitmap() within
- write_midx_bitmap(). If we free (and nullify) the array at that point,
- we can free a decent chunk of memory before the bitmap logic adds more
- to the memory footprint.
-
- Here is the massif memory load chart after this change:
-
- GB
- 3.111^ #
- | # :::::::::::@::::::::::::::@
- | # ::::::::::::::::::::::::: : :: : @:: ::::: :: ::@
- | @# :::::::::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | :::@#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- 0 +--------------------------------------------------------------->
-
- It is unfortunate that the lifetime of the 'entries' array is less
- clear. To make this simpler, I added a few things to try and prevent an
- accidental reference:
-
- 1. Using FREE_AND_NULL() we will at least get a segfault from reading a
- NULL pointer instead of a use-after-free.
-
- 2. 'entries_nr' is also set to zero to make any loop that would iterate
- over the entries be trivial.
-
- 3. Set the 'ctx' pointer to NULL within write_midx_bitmap() so it does
- not get another reference later. This requires adding a local copy
- of 'pack_order' giving us a reference that we can use later in the
- method.
-
- 4. Add significant comments in write_midx_bitmap() and
- write_midx_internal() to add warnings for future authors who might
- accidentally add references to this cleared memory.
+ This is a messy, but completely non-functional refactoring. The code is
+ only being moved around to reduce visibility of the write_midx_context
+ during the longest part of computing reachability bitmaps.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
## midx.c ##
-@@ midx.c: static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
- struct commit **commits = NULL;
- uint32_t i, commits_nr;
- uint16_t options = 0;
-+ uint32_t *pack_order;
- char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash));
- int ret;
+@@ midx.c: static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr
+ return cb.commits;
+ }
-@@ midx.c: static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
+-static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
+- struct write_midx_context *ctx,
++static int write_midx_bitmap(const char *midx_name,
++ const unsigned char *midx_hash,
++ struct packing_data *pdata,
++ struct commit **commits,
++ uint32_t commits_nr,
++ uint32_t *pack_order,
+ const char *refs_snapshot,
+ unsigned flags)
+ {
+- struct packing_data pdata;
+- struct pack_idx_entry **index;
+- struct commit **commits = NULL;
+- uint32_t i, commits_nr;
++ int ret, i;
+ uint16_t options = 0;
+- char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash));
+- int ret;
+-
+- if (!ctx->entries_nr)
+- BUG("cannot write a bitmap without any objects");
++ struct pack_idx_entry **index;
++ char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name,
++ hash_to_hex(midx_hash));
- commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx);
+ if (flags & MIDX_WRITE_BITMAP_HASH_CACHE)
+ options |= BITMAP_OPT_HASH_CACHE;
-+ /*
-+ * Remove the ctx.entries to reduce memory pressure.
-+ * Nullify 'ctx' to help avoid adding new references to ctx->entries.
-+ */
-+ FREE_AND_NULL(ctx->entries);
-+ ctx->entries_nr = 0;
-+ pack_order = ctx->pack_order;
-+ ctx = NULL;
-+
+- prepare_midx_packing_data(&pdata, ctx);
+-
+- commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx);
+-
/*
* Build the MIDX-order index based on pdata.objects (which is already
* in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
+ * this order).
+ */
+- ALLOC_ARRAY(index, pdata.nr_objects);
+- for (i = 0; i < pdata.nr_objects; i++)
+- index[i] = &pdata.objects[i].idx;
++ ALLOC_ARRAY(index, pdata->nr_objects);
++ for (i = 0; i < pdata->nr_objects; i++)
++ index[i] = &pdata->objects[i].idx;
+
+ bitmap_writer_show_progress(flags & MIDX_PROGRESS);
+- bitmap_writer_build_type_index(&pdata, index, pdata.nr_objects);
++ bitmap_writer_build_type_index(pdata, index, pdata->nr_objects);
+
+ /*
+ * bitmap_writer_finish expects objects in lex order, but pack_order
@@ midx.c: static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
+ * happens between bitmap_writer_build_type_index() and
* bitmap_writer_finish().
*/
- for (i = 0; i < pdata.nr_objects; i++)
+- for (i = 0; i < pdata.nr_objects; i++)
- index[ctx->pack_order[i]] = &pdata.objects[i].idx;
-+ index[pack_order[i]] = &pdata.objects[i].idx;
++ for (i = 0; i < pdata->nr_objects; i++)
++ index[pack_order[i]] = &pdata->objects[i].idx;
bitmap_writer_select_commits(commits, commits_nr, -1);
- ret = bitmap_writer_build(&pdata);
+- ret = bitmap_writer_build(&pdata);
++ ret = bitmap_writer_build(pdata);
+ if (ret < 0)
+ goto cleanup;
+
+ bitmap_writer_set_checksum(midx_hash);
+- bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, options);
++ bitmap_writer_finish(index, pdata->nr_objects, bitmap_name, options);
+
+ cleanup:
+ free(index);
@@ midx.c: static int write_midx_internal(const char *object_dir,
if (flags & MIDX_WRITE_REV_INDEX &&
git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0))
write_midx_reverse_index(midx_name.buf, midx_hash, &ctx);
+
-+ /*
-+ * Writing the bitmap must be last, as it will free ctx.entries
-+ * to reduce memory pressure during the bitmap write.
-+ */
if (flags & MIDX_WRITE_BITMAP) {
- if (write_midx_bitmap(midx_name.buf, midx_hash, &ctx,
+- if (write_midx_bitmap(midx_name.buf, midx_hash, &ctx,
++ struct packing_data pdata;
++ struct commit **commits;
++ uint32_t commits_nr;
++
++ if (!ctx.entries_nr)
++ BUG("cannot write a bitmap without any objects");
++
++ prepare_midx_packing_data(&pdata, &ctx);
++
++ commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, &ctx);
++
++ if (write_midx_bitmap(midx_name.buf, midx_hash, &pdata,
++ commits, commits_nr, ctx.pack_order,
refs_snapshot, flags) < 0) {
+ error(_("could not write multi-pack bitmap"));
+ result = 1;
-: ----------- > 3: 98e72f71b6b midx: reduce memory pressure while writing bitmaps
--
gitgitgadget
next prev parent reply other threads:[~2022-07-19 15:26 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-18 20:36 [PATCH] midx: reduce memory pressure while writing bitmaps Derrick Stolee via GitGitGadget
2022-07-18 21:47 ` Ævar Arnfjörð Bjarmason
2022-07-19 13:50 ` Derrick Stolee
2022-07-19 15:26 ` Derrick Stolee via GitGitGadget [this message]
2022-07-19 15:26 ` [PATCH v2 1/3] pack-bitmap-write: use const for hashes Derrick Stolee via GitGitGadget
2022-07-19 15:26 ` [PATCH v2 2/3] midx: extract bitmap write setup Derrick Stolee via GitGitGadget
2022-07-19 15:26 ` [PATCH v2 3/3] midx: reduce memory pressure while writing bitmaps Derrick Stolee via GitGitGadget
2022-07-19 15:59 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.1292.v2.git.1658244366.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=avarab@gmail.com \
--cc=chakrabortyabhradeep79@gmail.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=vdye@github.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).