From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com,
chakrabortyabhradeep79@gmail.com,
Derrick Stolee <derrickstolee@github.com>,
Derrick Stolee <derrickstolee@github.com>
Subject: [PATCH] midx: reduce memory pressure while writing bitmaps
Date: Mon, 18 Jul 2022 20:36:05 +0000 [thread overview]
Message-ID: <pull.1292.git.1658176565751.gitgitgadget@gmail.com> (raw)
From: Derrick Stolee <derrickstolee@github.com>
We noticed that some 'git multi-pack-index write --bitmap' processes
were running with very high memory. It turns out that a lot of this
memory is required to store a list of every object in the written
multi-pack-index, with a second copy that has additional information
used for the bitmap writing logic.
Using 'valgrind --tool=massif' before this change, the following chart
shows how memory load increased and was maintained throughout the
process:
GB
4.102^ ::
| @ @::@@::@@::::::::@::::::@@:#:::::::::::::@@:: :
| :::::@@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :::: :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :::: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
| @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
0 +--------------------------------------------------------------->
It turns out that the 'struct write_midx_context' data is persisting
through the life of the process, including the 'entries' array. This
array is used last inside find_commits_for_midx_bitmap() within
write_midx_bitmap(). If we free (and nullify) the array at that point,
we can free a decent chunk of memory before the bitmap logic adds more
to the memory footprint.
Here is the massif memory load chart after this change:
GB
3.111^ #
| # :::::::::::@::::::::::::::@
| # ::::::::::::::::::::::::: : :: : @:: ::::: :: ::@
| @# :::::::::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| :::@#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
| :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
0 +--------------------------------------------------------------->
It is unfortunate that the lifetime of the 'entries' array is less
clear. To make this simpler, I added a few things to try and prevent an
accidental reference:
1. Using FREE_AND_NULL() we will at least get a segfault from reading a
NULL pointer instead of a use-after-free.
2. 'entries_nr' is also set to zero to make any loop that would iterate
over the entries be trivial.
3. Set the 'ctx' pointer to NULL within write_midx_bitmap() so it does
not get another reference later. This requires adding a local copy
of 'pack_order' giving us a reference that we can use later in the
method.
4. Add significant comments in write_midx_bitmap() and
write_midx_internal() to add warnings for future authors who might
accidentally add references to this cleared memory.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
midx: reduce memory pressure while writing bitmaps
The thing I'm most worried about with this patch is whether there is
enough (or too much) defensive programming.
Thanks, -Stolee
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1292%2Fderrickstolee%2Fbitmap-memory-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1292/derrickstolee/bitmap-memory-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1292
midx.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/midx.c b/midx.c
index 5f0dd386b02..cc31d803a5f 100644
--- a/midx.c
+++ b/midx.c
@@ -1063,6 +1063,7 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
struct commit **commits = NULL;
uint32_t i, commits_nr;
uint16_t options = 0;
+ uint32_t *pack_order;
char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash));
int ret;
@@ -1076,6 +1077,15 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx);
+ /*
+ * Remove the ctx.entries to reduce memory pressure.
+ * Nullify 'ctx' to help avoid adding new references to ctx->entries.
+ */
+ FREE_AND_NULL(ctx->entries);
+ ctx->entries_nr = 0;
+ pack_order = ctx->pack_order;
+ ctx = NULL;
+
/*
* Build the MIDX-order index based on pdata.objects (which is already
* in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
@@ -1102,7 +1112,7 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
* bitmap_writer_finish().
*/
for (i = 0; i < pdata.nr_objects; i++)
- index[ctx->pack_order[i]] = &pdata.objects[i].idx;
+ index[pack_order[i]] = &pdata.objects[i].idx;
bitmap_writer_select_commits(commits, commits_nr, -1);
ret = bitmap_writer_build(&pdata);
@@ -1443,6 +1453,11 @@ static int write_midx_internal(const char *object_dir,
if (flags & MIDX_WRITE_REV_INDEX &&
git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0))
write_midx_reverse_index(midx_name.buf, midx_hash, &ctx);
+
+ /*
+ * Writing the bitmap must be last, as it will free ctx.entries
+ * to reduce memory pressure during the bitmap write.
+ */
if (flags & MIDX_WRITE_BITMAP) {
if (write_midx_bitmap(midx_name.buf, midx_hash, &ctx,
refs_snapshot, flags) < 0) {
base-commit: 9dd64cb4d310986dd7b8ca7fff92f9b61e0bd21a
--
gitgitgadget
next reply other threads:[~2022-07-18 20:36 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-18 20:36 Derrick Stolee via GitGitGadget [this message]
2022-07-18 21:47 ` [PATCH] midx: reduce memory pressure while writing bitmaps Ævar Arnfjörð Bjarmason
2022-07-19 13:50 ` Derrick Stolee
2022-07-19 15:26 ` [PATCH v2 0/3] " Derrick Stolee via GitGitGadget
2022-07-19 15:26 ` [PATCH v2 1/3] pack-bitmap-write: use const for hashes Derrick Stolee via GitGitGadget
2022-07-19 15:26 ` [PATCH v2 2/3] midx: extract bitmap write setup Derrick Stolee via GitGitGadget
2022-07-19 15:26 ` [PATCH v2 3/3] midx: reduce memory pressure while writing bitmaps Derrick Stolee via GitGitGadget
2022-07-19 15:59 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.1292.git.1658176565751.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=chakrabortyabhradeep79@gmail.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=vdye@github.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).