From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com,
chakrabortyabhradeep79@gmail.com,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"Derrick Stolee" <derrickstolee@github.com>
Subject: [PATCH v2 0/3] midx: reduce memory pressure while writing bitmaps
Date: Tue, 19 Jul 2022 15:26:03 +0000 [thread overview]
Message-ID: <pull.1292.v2.git.1658244366.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1292.git.1658176565751.gitgitgadget@gmail.com>
We noticed an instance where writing multi-pack-index bitmaps was taking a
lot of memory. This small change can reduce the memory pressure slightly
(~25%), but more will be needed to significantly reduce the memory pressure.
Such a change would require updating the bitmap writing code to use on-disk
data structures instead. This is particularly tricky when the
multi-pack-index has not been fully written, because we don't want a point
in time where the object store has a new multi-pack-index without a
reachability bitmap.
Updates in v2
=============
To reduce confusion on the lifetime of 'ctx.entries', some refactoring
patches are inserted to first extract the use of 'ctx' out of
write_midx_bitmap() and into write_midx_internal(). This makes the
FREE_AND_NULL() stand out more clearly.
Thanks, -Stolee
Derrick Stolee (3):
pack-bitmap-write: use const for hashes
midx: extract bitmap write setup
midx: reduce memory pressure while writing bitmaps
midx.c | 69 +++++++++++++++++++++++++++++----------------
pack-bitmap-write.c | 2 +-
pack-bitmap.h | 2 +-
3 files changed, 47 insertions(+), 26 deletions(-)
base-commit: 9dd64cb4d310986dd7b8ca7fff92f9b61e0bd21a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1292%2Fderrickstolee%2Fbitmap-memory-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1292/derrickstolee/bitmap-memory-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1292
Range-diff vs v1:
-: ----------- > 1: a09fdbb8f3e pack-bitmap-write: use const for hashes
1: 9104bc55795 ! 2: 4dfb7ae5112 midx: reduce memory pressure while writing bitmaps
@@ Metadata
Author: Derrick Stolee <derrickstolee@github.com>
## Commit message ##
- midx: reduce memory pressure while writing bitmaps
+ midx: extract bitmap write setup
- We noticed that some 'git multi-pack-index write --bitmap' processes
- were running with very high memory. It turns out that a lot of this
- memory is required to store a list of every object in the written
- multi-pack-index, with a second copy that has additional information
- used for the bitmap writing logic.
+ The write_midx_bitmap() method is a long method that does a lot of
+ steps. It requires the write_midx_context struct for use in
+ prepare_midx_packing_data() and find_commits_for_midx_bitmap(), but
+ after that only needs the pack_order array.
- Using 'valgrind --tool=massif' before this change, the following chart
- shows how memory load increased and was maintained throughout the
- process:
-
- GB
- 4.102^ ::
- | @ @::@@::@@::::::::@::::::@@:#:::::::::::::@@:: :
- | :::::@@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | :::: :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | :::: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
- 0 +--------------------------------------------------------------->
-
- It turns out that the 'struct write_midx_context' data is persisting
- through the life of the process, including the 'entries' array. This
- array is used last inside find_commits_for_midx_bitmap() within
- write_midx_bitmap(). If we free (and nullify) the array at that point,
- we can free a decent chunk of memory before the bitmap logic adds more
- to the memory footprint.
-
- Here is the massif memory load chart after this change:
-
- GB
- 3.111^ #
- | # :::::::::::@::::::::::::::@
- | # ::::::::::::::::::::::::: : :: : @:: ::::: :: ::@
- | @# :::::::::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | :::@#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- | :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
- 0 +--------------------------------------------------------------->
-
- It is unfortunate that the lifetime of the 'entries' array is less
- clear. To make this simpler, I added a few things to try and prevent an
- accidental reference:
-
- 1. Using FREE_AND_NULL() we will at least get a segfault from reading a
- NULL pointer instead of a use-after-free.
-
- 2. 'entries_nr' is also set to zero to make any loop that would iterate
- over the entries be trivial.
-
- 3. Set the 'ctx' pointer to NULL within write_midx_bitmap() so it does
- not get another reference later. This requires adding a local copy
- of 'pack_order' giving us a reference that we can use later in the
- method.
-
- 4. Add significant comments in write_midx_bitmap() and
- write_midx_internal() to add warnings for future authors who might
- accidentally add references to this cleared memory.
+ This is a messy, but completely non-functional refactoring. The code is
+ only being moved around to reduce visibility of the write_midx_context
+ during the longest part of computing reachability bitmaps.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
## midx.c ##
-@@ midx.c: static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
- struct commit **commits = NULL;
- uint32_t i, commits_nr;
- uint16_t options = 0;
-+ uint32_t *pack_order;
- char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash));
- int ret;
+@@ midx.c: static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr
+ return cb.commits;
+ }
-@@ midx.c: static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
+-static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
+- struct write_midx_context *ctx,
++static int write_midx_bitmap(const char *midx_name,
++ const unsigned char *midx_hash,
++ struct packing_data *pdata,
++ struct commit **commits,
++ uint32_t commits_nr,
++ uint32_t *pack_order,
+ const char *refs_snapshot,
+ unsigned flags)
+ {
+- struct packing_data pdata;
+- struct pack_idx_entry **index;
+- struct commit **commits = NULL;
+- uint32_t i, commits_nr;
++ int ret, i;
+ uint16_t options = 0;
+- char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash));
+- int ret;
+-
+- if (!ctx->entries_nr)
+- BUG("cannot write a bitmap without any objects");
++ struct pack_idx_entry **index;
++ char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name,
++ hash_to_hex(midx_hash));
- commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx);
+ if (flags & MIDX_WRITE_BITMAP_HASH_CACHE)
+ options |= BITMAP_OPT_HASH_CACHE;
-+ /*
-+ * Remove the ctx.entries to reduce memory pressure.
-+ * Nullify 'ctx' to help avoid adding new references to ctx->entries.
-+ */
-+ FREE_AND_NULL(ctx->entries);
-+ ctx->entries_nr = 0;
-+ pack_order = ctx->pack_order;
-+ ctx = NULL;
-+
+- prepare_midx_packing_data(&pdata, ctx);
+-
+- commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx);
+-
/*
* Build the MIDX-order index based on pdata.objects (which is already
* in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
+ * this order).
+ */
+- ALLOC_ARRAY(index, pdata.nr_objects);
+- for (i = 0; i < pdata.nr_objects; i++)
+- index[i] = &pdata.objects[i].idx;
++ ALLOC_ARRAY(index, pdata->nr_objects);
++ for (i = 0; i < pdata->nr_objects; i++)
++ index[i] = &pdata->objects[i].idx;
+
+ bitmap_writer_show_progress(flags & MIDX_PROGRESS);
+- bitmap_writer_build_type_index(&pdata, index, pdata.nr_objects);
++ bitmap_writer_build_type_index(pdata, index, pdata->nr_objects);
+
+ /*
+ * bitmap_writer_finish expects objects in lex order, but pack_order
@@ midx.c: static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
+ * happens between bitmap_writer_build_type_index() and
* bitmap_writer_finish().
*/
- for (i = 0; i < pdata.nr_objects; i++)
+- for (i = 0; i < pdata.nr_objects; i++)
- index[ctx->pack_order[i]] = &pdata.objects[i].idx;
-+ index[pack_order[i]] = &pdata.objects[i].idx;
++ for (i = 0; i < pdata->nr_objects; i++)
++ index[pack_order[i]] = &pdata->objects[i].idx;
bitmap_writer_select_commits(commits, commits_nr, -1);
- ret = bitmap_writer_build(&pdata);
+- ret = bitmap_writer_build(&pdata);
++ ret = bitmap_writer_build(pdata);
+ if (ret < 0)
+ goto cleanup;
+
+ bitmap_writer_set_checksum(midx_hash);
+- bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, options);
++ bitmap_writer_finish(index, pdata->nr_objects, bitmap_name, options);
+
+ cleanup:
+ free(index);
@@ midx.c: static int write_midx_internal(const char *object_dir,
if (flags & MIDX_WRITE_REV_INDEX &&
git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0))
write_midx_reverse_index(midx_name.buf, midx_hash, &ctx);
+
-+ /*
-+ * Writing the bitmap must be last, as it will free ctx.entries
-+ * to reduce memory pressure during the bitmap write.
-+ */
if (flags & MIDX_WRITE_BITMAP) {
- if (write_midx_bitmap(midx_name.buf, midx_hash, &ctx,
+- if (write_midx_bitmap(midx_name.buf, midx_hash, &ctx,
++ struct packing_data pdata;
++ struct commit **commits;
++ uint32_t commits_nr;
++
++ if (!ctx.entries_nr)
++ BUG("cannot write a bitmap without any objects");
++
++ prepare_midx_packing_data(&pdata, &ctx);
++
++ commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, &ctx);
++
++ if (write_midx_bitmap(midx_name.buf, midx_hash, &pdata,
++ commits, commits_nr, ctx.pack_order,
refs_snapshot, flags) < 0) {
+ error(_("could not write multi-pack bitmap"));
+ result = 1;
-: ----------- > 3: 98e72f71b6b midx: reduce memory pressure while writing bitmaps
--
gitgitgadget
next prev parent reply other threads:[~2022-07-19 15:26 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-18 20:36 [PATCH] midx: reduce memory pressure while writing bitmaps Derrick Stolee via GitGitGadget
2022-07-18 21:47 ` Ævar Arnfjörð Bjarmason
2022-07-19 13:50 ` Derrick Stolee
2022-07-19 15:26 ` Derrick Stolee via GitGitGadget [this message]
2022-07-19 15:26 ` [PATCH v2 1/3] pack-bitmap-write: use const for hashes Derrick Stolee via GitGitGadget
2022-07-19 15:26 ` [PATCH v2 2/3] midx: extract bitmap write setup Derrick Stolee via GitGitGadget
2022-07-19 15:26 ` [PATCH v2 3/3] midx: reduce memory pressure while writing bitmaps Derrick Stolee via GitGitGadget
2022-07-19 15:59 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.1292.v2.git.1658244366.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=avarab@gmail.com \
--cc=chakrabortyabhradeep79@gmail.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=vdye@github.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.