git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Lidong Yan via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Patrick Steinhardt <ps@pks.im>,
	Eric Sunshine <sunshine@sunshineco.com>,
	Lidong Yan <502024330056@smail.nju.edu.cn>
Subject: Re: [PATCH v4] pack-bitmap: remove checks before bitmap_free
Date: Tue, 3 Jun 2025 18:09:26 -0400	[thread overview]
Message-ID: <aD9ylkFDWqapFjey@nand.local> (raw)
In-Reply-To: <pull.1977.v4.git.git.1748931650166.gitgitgadget@gmail.com>

On Tue, Jun 03, 2025 at 06:20:49AM +0000, Lidong Yan via GitGitGadget wrote:
> From: Lidong Yan <502024330056@smail.nju.edu.cn>
>
> In pack-bitmap.c:find_boundary_objects(), the roots_bitmap is only freed
> if cascade_pseudo_merges_1() fails. Since cascade_pseudo_merges_1() only
> use roots_bitmap as a mutable reference but not takes roots_bitmap's
> ownership. Once cascade_pseudo_merges_1 succeed(), roots_bitmap leaks.
> And this leak currently lacks a dedicated test to detect it.
>
> To fix this leak, remove if cascade_pseudo_merges_1() succeed check and
> always calling bitmap_free(roots_bitmap);

This sentence might be more clear if it were written as:

    To fix this leak, unconditionally free the roots_bitmap regardless
    of whether or not cascade_pseudo_merges_1() succeeds.

> To trigger this leak, we need a pseudo-merge whose size is equal to
> or smaller than roots_bitmap (which corresponds to the set of "haves"
> commits in prepare_bitmap_walk()). To do this, we can create two
> commits: A and B. Add A to the pseudo-merge list and perform a traversal
> over the range A..B. In this scenario, the "haves" set will be {A},
> and cascade_pseudo_merges_1() will succeed, thereby exposing the leak
> due to the missing roots_bitmap cleanup.

I don't think this is quite right. Calling cascade_pseudo_merges_1()
succeeds (and returns a non-zero value) when one or more pseudo-merges
are satisfied. A pseudo-merge is satisfied here when its parents bitmap
is a *subset* of the roots_bitmap, not when it has a smaller size.

The precise definition of one bitmap being a subset of another can be
found in ewah/bitmap.c::ewah_bitamp_is_subset(). But in general one
bitmap is a subset of the other if the set of bit positions with value
"1" from one is a subset of the same set from the other bitmap.

I think that's what you meant by "smaller", but I think it's worth
clarifying here.

> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index ac6d62b980c..8727f316de9 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -1363,8 +1363,8 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
>  			bitmap_set(roots_bitmap, pos);
>  		}
>
> -		if (!cascade_pseudo_merges_1(bitmap_git, cb.base, roots_bitmap))
> -			bitmap_free(roots_bitmap);
> +		cascade_pseudo_merges_1(bitmap_git, cb.base, roots_bitmap);
> +		bitmap_free(roots_bitmap);

Makes sense.

> diff --git a/t/t5333-pseudo-merge-bitmaps.sh b/t/t5333-pseudo-merge-bitmaps.sh
> index 56674db562f..e665001a410 100755
> --- a/t/t5333-pseudo-merge-bitmaps.sh
> +++ b/t/t5333-pseudo-merge-bitmaps.sh
> @@ -445,4 +445,24 @@ test_expect_success 'pseudo-merge closure' '
>  	)
>  '
>
> +test_expect_success 'use pseudo-merge in boundary traversal' '
> +	git init pseudo-merge-boundary-traversal &&
> +	(
> +		cd pseudo-merge-boundary-traversal &&
> +
> +		git config bitmapPseudoMerge.test.pattern refs/ &&
> +		git config bitmapPseudoMerge.test.threshold now &&

Setting the unstable threshold here should be unnecessary, since the
unstable portion of the group only includes matching commits beyond the
threshold that *don't* already have a bitmap. Since "A" is the only
commit at the time you write the bitmap below, it will always be
selected, and thus never appear in the unstable portion of a
pseudo-merge group.

> +		git config bitmapPseudoMerge.test.stableThreshold now &&

This one is technically unnecessary, but only because test_commit starts
at the $test_tick value, which is very far in the past (beyond the
default value of 1.month.ago).

> +		test_commit A &&
> +		git repack -adb &&
> +		test_commit B &&
> +
> +		echo '1' >expect &&

Please do not use single-quotes in a test script. It happens to work in
this instance, but it is easy to break.

> +		GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL=1 \
> +			git rev-list --count --use-bitmap-index HEAD~1..HEAD >actual &&

This test needs to use the boundary-based bitmap traversal routines, but
I'm unclear on why you're using the GIT_TEST_-environment variable to
enable them.

Is there a reason that we can't rely on the usual repository
configuration here? I would have expected something like this (which
should apply cleanly on top of your patch):

--- 8< ---
diff --git a/t/t5333-pseudo-merge-bitmaps.sh b/t/t5333-pseudo-merge-bitmaps.sh
index e665001a41..491ef404ea 100755
--- a/t/t5333-pseudo-merge-bitmaps.sh
+++ b/t/t5333-pseudo-merge-bitmaps.sh
@@ -453,14 +453,14 @@ test_expect_success 'use pseudo-merge in boundary traversal' '
 		git config bitmapPseudoMerge.test.pattern refs/ &&
 		git config bitmapPseudoMerge.test.threshold now &&
 		git config bitmapPseudoMerge.test.stableThreshold now &&
+		git config pack.useBitmapBoundaryTraversal true &&

 		test_commit A &&
 		git repack -adb &&
 		test_commit B &&

-		echo '1' >expect &&
-		GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL=1 \
-			git rev-list --count --use-bitmap-index HEAD~1..HEAD >actual &&
+		echo 1 >expect &&
+		git rev-list --count --use-bitmap-index HEAD~1..HEAD >actual &&
 		test_cmp expect actual
 	)
 '
--- >8 ---

> +		test_cmp expect actual

Hmm. I suppose, although it feels a little clunky to me to write
something like "echo 1 >expect". I would imagine that you'd do something
like:

    test 1 -eq $(git rev-list --count --use-bitmap-index HEAD~1..HEAD)

instead. Or if you wanted to split them off into separate lines, you
could do:

    nr=$(git rev-list --count --use-bitmap-index HEAD~1..HEAD) &&
    test 1 -eq "$nr"

Thanks,
Taylor

  reply	other threads:[~2025-06-03 22:09 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-25  5:09 [PATCH] pack-bitmap: remove checks before bitmap_free Lidong Yan via GitGitGadget
2025-05-26  6:49 ` Patrick Steinhardt
2025-05-26 16:05   ` lidongyan
2025-05-30 18:14 ` [PATCH v2 0/2] " Lidong Yan via GitGitGadget
2025-05-30 18:14   ` [PATCH v2 1/2] " Lidong Yan via GitGitGadget
2025-05-30 18:14   ` [PATCH v2 2/2] t5333: test memory leak when use pseudo-merge in boundary traversal Lidong Yan via GitGitGadget
2025-05-30 21:42     ` Junio C Hamano
2025-05-30 21:50       ` Eric Sunshine
2025-05-31  3:18         ` lidongyan
2025-05-30 21:06   ` [PATCH v2 0/2] pack-bitmap: remove checks before bitmap_free Junio C Hamano
2025-06-03  1:46   ` [PATCH v3] " Lidong Yan via GitGitGadget
2025-06-03  6:12     ` Junio C Hamano
2025-06-03  6:22       ` lidongyan
2025-06-03 15:14         ` Junio C Hamano
2025-06-03 15:32           ` lidongyan
2025-06-04 12:32             ` Junio C Hamano
2025-06-04 12:43               ` lidongyan
2025-06-04 14:49                 ` Junio C Hamano
2025-06-03  6:20     ` [PATCH v4] " Lidong Yan via GitGitGadget
2025-06-03 22:09       ` Taylor Blau [this message]
2025-06-04  2:50         ` lidongyan
2025-06-05  6:24       ` [PATCH v5] " Lidong Yan via GitGitGadget
2025-06-05 15:29         ` Junio C Hamano
2025-06-10  5:58           ` lidongyan
2025-06-05 15:53         ` [PATCH v6] " Lidong Yan via GitGitGadget
2025-06-06  1:28           ` Junio C Hamano
2025-06-06  5:49             ` lidongyan
2025-06-09  8:18           ` [PATCH v7] " Lidong Yan via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aD9ylkFDWqapFjey@nand.local \
    --to=me@ttaylorr.com \
    --cc=502024330056@smail.nju.edu.cn \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=ps@pks.im \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).