git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed
@ 2025-05-12 12:22 Lidong Yan via GitGitGadget
  2025-05-12 13:13 ` Jeff King
  2025-05-20  9:23 ` [PATCH v2 0/3] pack-bitmap: fix memory leak if load_bitmap_entries_v1 failed Lidong Yan via GitGitGadget
  0 siblings, 2 replies; 45+ messages in thread
From: Lidong Yan via GitGitGadget @ 2025-05-12 12:22 UTC (permalink / raw)
  To: git; +Cc: Lidong Yan, Lidong Yan

From: Lidong Yan <502024330056@smail.nju.edu.cn>

In pack-bitmap.c:load_bitmap_entries_v1, the function `read_bitmap_1`
allocates a bitmap and reads index data into it. However, if any of
the validation checks following the allocation fail, the allocated bitmap
is not freed, resulting in a memory leak. To avoid this, the validation
checks should be performed before the bitmap is allocated.

Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
---
    pack-bitmap: fix memory leak if load_bitmap_entries_v1 failed
    
    In pack-bitmap.c:load_bitmap_entries_v1, the function read_bitmap_1
    allocates a bitmap and reads index data into it. However, if any of the
    validation checks following the allocation fail, the allocated bitmap is
    not freed, resulting in a memory leak. To avoid this, the validation
    checks should be performed before the bitmap is allocated.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1962%2Fbrandb97%2Ffix-pack-bitmap-leak-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1962/brandb97/fix-pack-bitmap-leak-v1
Pull-Request: https://github.com/git/git/pull/1962

 pack-bitmap.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index b9f1d866046..ac6d62b980c 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -388,10 +388,6 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 			return error(_("corrupt ewah bitmap: commit index %u out of range"),
 				     (unsigned)commit_idx_pos);
 
-		bitmap = read_bitmap_1(index);
-		if (!bitmap)
-			return -1;
-
 		if (xor_offset > MAX_XOR_OFFSET || xor_offset > i)
 			return error(_("corrupted bitmap pack index"));
 
@@ -402,6 +398,10 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 				return error(_("invalid XOR offset in bitmap pack index"));
 		}
 
+		bitmap = read_bitmap_1(index);
+		if (!bitmap)
+			return -1;
+
 		recent_bitmaps[i % MAX_XOR_OFFSET] = store_bitmap(
 			index, bitmap, &oid, xor_bitmap, flags);
 	}

base-commit: 6f84262c44a89851c3ae5a6e4c1a9d06b2068d75
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH] pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed
  2025-05-12 12:22 [PATCH] pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed Lidong Yan via GitGitGadget
@ 2025-05-12 13:13 ` Jeff King
  2025-05-13 17:47   ` Taylor Blau
  2025-05-20  9:23 ` [PATCH v2 0/3] pack-bitmap: fix memory leak if load_bitmap_entries_v1 failed Lidong Yan via GitGitGadget
  1 sibling, 1 reply; 45+ messages in thread
From: Jeff King @ 2025-05-12 13:13 UTC (permalink / raw)
  To: Lidong Yan via GitGitGadget; +Cc: git, Lidong Yan

On Mon, May 12, 2025 at 12:22:10PM +0000, Lidong Yan via GitGitGadget wrote:

> From: Lidong Yan <502024330056@smail.nju.edu.cn>
> 
> In pack-bitmap.c:load_bitmap_entries_v1, the function `read_bitmap_1`
> allocates a bitmap and reads index data into it. However, if any of
> the validation checks following the allocation fail, the allocated bitmap
> is not freed, resulting in a memory leak. To avoid this, the validation
> checks should be performed before the bitmap is allocated.

Thanks, this looks correct to me.

> @@ -388,10 +388,6 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
>  			return error(_("corrupt ewah bitmap: commit index %u out of range"),
>  				     (unsigned)commit_idx_pos);
>  
> -		bitmap = read_bitmap_1(index);
> -		if (!bitmap)
> -			return -1;
> -
>  		if (xor_offset > MAX_XOR_OFFSET || xor_offset > i)
>  			return error(_("corrupted bitmap pack index"));

I noticed that this code is also within a loop, so we could still return
early on the next loop iteration. But by that point we will have called
store_bitmap() on the result, so we only have to worry about leaking the
bitmap from the current loop iteration.

-Peff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed
  2025-05-12 13:13 ` Jeff King
@ 2025-05-13 17:47   ` Taylor Blau
  2025-05-14 13:18     ` Junio C Hamano
  2025-05-14 18:03     ` Jeff King
  0 siblings, 2 replies; 45+ messages in thread
From: Taylor Blau @ 2025-05-13 17:47 UTC (permalink / raw)
  To: Jeff King; +Cc: Lidong Yan via GitGitGadget, git, Lidong Yan

On Mon, May 12, 2025 at 09:13:15AM -0400, Jeff King wrote:
> On Mon, May 12, 2025 at 12:22:10PM +0000, Lidong Yan via GitGitGadget wrote:
>
> > From: Lidong Yan <502024330056@smail.nju.edu.cn>
> >
> > In pack-bitmap.c:load_bitmap_entries_v1, the function `read_bitmap_1`
> > allocates a bitmap and reads index data into it. However, if any of
> > the validation checks following the allocation fail, the allocated bitmap
> > is not freed, resulting in a memory leak. To avoid this, the validation
> > checks should be performed before the bitmap is allocated.
>
> Thanks, this looks correct to me.

It looks correct to me as well, and is a strict improvement. But I think
there is a leak outside of this function as well that is not touched by
this patch.

> > @@ -388,10 +388,6 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
> >  			return error(_("corrupt ewah bitmap: commit index %u out of range"),
> >  				     (unsigned)commit_idx_pos);
> >
> > -		bitmap = read_bitmap_1(index);
> > -		if (!bitmap)
> > -			return -1;
> > -
> >  		if (xor_offset > MAX_XOR_OFFSET || xor_offset > i)
> >  			return error(_("corrupted bitmap pack index"));
>
> I noticed that this code is also within a loop, so we could still return
> early on the next loop iteration. But by that point we will have called
> store_bitmap() on the result, so we only have to worry about leaking the
> bitmap from the current loop iteration.

That's right, though I think there is still a leak here.

After going through the "failed" label, load_bitmap() will return -1,
and its caller (either prepare_bitmap_walk() or prepare_bitmap_git())
will then call free_bitmap_index().

That function would have done:

    struct stored_bitmap *sb;
    kh_foreach_value(b->bitmaps, sb {
      ewah_pool_free(sb->root);
      free(sb);
    });

, but won't since load_bitmap() already called kh_destroy_oid_map() and
NULL'd the "bitmaps" pointer from within its "failed" label.

So I think if you got part of the way through loading bitmap entries and
then failed, you would leak all of the previous entries that you were
able to load successfully.

I suspect the fix looks something like:

--- 8< ---
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 5299f49d59..7f28532a69 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -631,41 +631,28 @@ static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git,
 	bitmap_git->ext_index.positions = kh_init_oid_pos();

 	if (load_reverse_index(r, bitmap_git))
-		goto failed;
+		return -1;

 	if (!(bitmap_git->commits = read_bitmap_1(bitmap_git)) ||
 		!(bitmap_git->trees = read_bitmap_1(bitmap_git)) ||
 		!(bitmap_git->blobs = read_bitmap_1(bitmap_git)) ||
 		!(bitmap_git->tags = read_bitmap_1(bitmap_git)))
-		goto failed;
+		return -1;

 	if (!bitmap_git->table_lookup && load_bitmap_entries_v1(bitmap_git) < 0)
-		goto failed;
+		return -1;

 	if (bitmap_git->base) {
 		if (!bitmap_is_midx(bitmap_git))
 			BUG("non-MIDX bitmap has non-NULL base bitmap index");
 		if (load_bitmap(r, bitmap_git->base, 1) < 0)
-			goto failed;
+			return -1;
 	}

 	if (!recursing)
 		load_all_type_bitmaps(bitmap_git);

 	return 0;
-
-failed:
-	munmap(bitmap_git->map, bitmap_git->map_size);
-	bitmap_git->map = NULL;
-	bitmap_git->map_size = 0;
-
-	kh_destroy_oid_map(bitmap_git->bitmaps);
-	bitmap_git->bitmaps = NULL;
-
-	kh_destroy_oid_pos(bitmap_git->ext_index.positions);
-	bitmap_git->ext_index.positions = NULL;
-
-	return -1;
 }

 static int open_pack_bitmap(struct repository *r,
--- >8 ---

, since all callers of load_bitmap() will themselves call
free_bitmap_index(), so there is no need for us to open-code a portion
of that function's implementation ourselves.

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH] pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed
  2025-05-13 17:47   ` Taylor Blau
@ 2025-05-14 13:18     ` Junio C Hamano
  2025-05-14 18:03     ` Jeff King
  1 sibling, 0 replies; 45+ messages in thread
From: Junio C Hamano @ 2025-05-14 13:18 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jeff King, Lidong Yan via GitGitGadget, git, Lidong Yan

Taylor Blau <me@ttaylorr.com> writes:

> After going through the "failed" label, load_bitmap() will return -1,
> and its caller (either prepare_bitmap_walk() or prepare_bitmap_git())
> will then call free_bitmap_index().
>
> That function would have done:
>
>     struct stored_bitmap *sb;
>     kh_foreach_value(b->bitmaps, sb {
>       ewah_pool_free(sb->root);
>       free(sb);
>     });
>
> , but won't since load_bitmap() already called kh_destroy_oid_map() and
> NULL'd the "bitmaps" pointer from within its "failed" label.

Yikes.

> So I think if you got part of the way through loading bitmap entries and
> then failed, you would leak all of the previous entries that you were
> able to load successfully.
>
> I suspect the fix looks something like:
> ...
> --- 8< ---
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 5299f49d59..7f28532a69 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -631,41 +631,28 @@ static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git,
>  	bitmap_git->ext_index.positions = kh_init_oid_pos();
>
>  	if (load_reverse_index(r, bitmap_git))
> -		goto failed;
> +		return -1;

(a lot of changes that simplifies the code snipped)

> -failed:
> -	munmap(bitmap_git->map, bitmap_git->map_size);
> -	bitmap_git->map = NULL;
> -	bitmap_git->map_size = 0;
> -
> -	kh_destroy_oid_map(bitmap_git->bitmaps);
> -	bitmap_git->bitmaps = NULL;
> -
> -	kh_destroy_oid_pos(bitmap_git->ext_index.positions);
> -	bitmap_git->ext_index.positions = NULL;
> -
> -	return -1;
>  }
>
>  static int open_pack_bitmap(struct repository *r,
> --- >8 ---
>
> , since all callers of load_bitmap() will themselves call
> free_bitmap_index(), so there is no need for us to open-code a portion
> of that function's implementation ourselves.

It is rare for a fix to be removing and simplifying this much code
;-)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed
  2025-05-13 17:47   ` Taylor Blau
  2025-05-14 13:18     ` Junio C Hamano
@ 2025-05-14 18:03     ` Jeff King
  2025-05-15  1:37       ` lidongyan
  1 sibling, 1 reply; 45+ messages in thread
From: Jeff King @ 2025-05-14 18:03 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Lidong Yan via GitGitGadget, git, Lidong Yan

On Tue, May 13, 2025 at 01:47:21PM -0400, Taylor Blau wrote:

> > > In pack-bitmap.c:load_bitmap_entries_v1, the function `read_bitmap_1`
> > > allocates a bitmap and reads index data into it. However, if any of
> > > the validation checks following the allocation fail, the allocated bitmap
> > > is not freed, resulting in a memory leak. To avoid this, the validation
> > > checks should be performed before the bitmap is allocated.
> >
> > Thanks, this looks correct to me.
> 
> It looks correct to me as well, and is a strict improvement. But I think
> there is a leak outside of this function as well that is not touched by
> this patch.

Good catch, and your analysis looks correct to me. I don't think that
changes anything for this patch, which is fixing a more "inner" issue of
the allocated memory hitting store_bitmap() at all.

So I think this can graduate independently, and then you can prepare
your fix on top (but no rush).

It would be nice if we triggered these cases in the test suite so that
LSan could confirm that all leaks are covered. But I suspect it may not
be worth the effort to craft a bitmap file that is broken in such
particular ways.

> I suspect the fix looks something like:
> [...]
> , since all callers of load_bitmap() will themselves call
> free_bitmap_index(), so there is no need for us to open-code a portion
> of that function's implementation ourselves.

Deleting that extra code would be doubly satisfying.

-Peff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed
  2025-05-14 18:03     ` Jeff King
@ 2025-05-15  1:37       ` lidongyan
  0 siblings, 0 replies; 45+ messages in thread
From: lidongyan @ 2025-05-15  1:37 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, Lidong Yan via GitGitGadget, git

On 15/5/2025 at 02:03,Jeff King <peff@peff.net> 写道:

> It would be nice if we triggered these cases in the test suite so that
> LSan could confirm that all leaks are covered. But I suspect it may not
> be worth the effort to craft a bitmap file that is broken in such
> particular ways.

I'd like to try coming up with some test cases for this.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 0/3] pack-bitmap: fix memory leak if load_bitmap_entries_v1 failed
  2025-05-12 12:22 [PATCH] pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed Lidong Yan via GitGitGadget
  2025-05-12 13:13 ` Jeff King
@ 2025-05-20  9:23 ` Lidong Yan via GitGitGadget
  2025-05-20  9:23   ` [PATCH v2 1/3] pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed Lidong Yan via GitGitGadget
                     ` (3 more replies)
  1 sibling, 4 replies; 45+ messages in thread
From: Lidong Yan via GitGitGadget @ 2025-05-20  9:23 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Taylor Blau, Lidong Yan

In pack-bitmap.c:load_bitmap_entries_v1, the function read_bitmap_1
allocates a bitmap and reads index data into it. However, if any of the
validation checks following the allocation fail, the allocated bitmap is not
freed, resulting in a memory leak. To avoid this, the validation checks
should be performed before the bitmap is allocated.

Lidong Yan (2):
  pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed
  pack-bitmap: add loading corrupt bitmap_index test

Taylor Blau (1):
  pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed

 pack-bitmap.c           | 29 +++++++-----------------
 t/t5310-pack-bitmaps.sh | 50 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 58 insertions(+), 21 deletions(-)


base-commit: cb96e1697ad6e54d11fc920c95f82977f8e438f8
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1962%2Fbrandb97%2Ffix-pack-bitmap-leak-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1962/brandb97/fix-pack-bitmap-leak-v2
Pull-Request: https://github.com/git/git/pull/1962

Range-diff vs v1:

 1:  00168766edf = 1:  130c3dc5dcd pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed
 -:  ----------- > 2:  b515c278a8f pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed
 -:  ----------- > 3:  5be22d563af pack-bitmap: add loading corrupt bitmap_index test

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 1/3] pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed
  2025-05-20  9:23 ` [PATCH v2 0/3] pack-bitmap: fix memory leak if load_bitmap_entries_v1 failed Lidong Yan via GitGitGadget
@ 2025-05-20  9:23   ` Lidong Yan via GitGitGadget
  2025-05-20  9:23   ` [PATCH v2 2/3] " Taylor Blau via GitGitGadget
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 45+ messages in thread
From: Lidong Yan via GitGitGadget @ 2025-05-20  9:23 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Taylor Blau, Lidong Yan, Lidong Yan

From: Lidong Yan <502024330056@smail.nju.edu.cn>

In pack-bitmap.c:load_bitmap_entries_v1, the function `read_bitmap_1`
allocates a bitmap and reads index data into it. However, if any of
the validation checks following the allocation fail, the allocated bitmap
is not freed, resulting in a memory leak. To avoid this, the validation
checks should be performed before the bitmap is allocated.

Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
---
 pack-bitmap.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index b9f1d866046b..ac6d62b980c5 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -388,10 +388,6 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 			return error(_("corrupt ewah bitmap: commit index %u out of range"),
 				     (unsigned)commit_idx_pos);
 
-		bitmap = read_bitmap_1(index);
-		if (!bitmap)
-			return -1;
-
 		if (xor_offset > MAX_XOR_OFFSET || xor_offset > i)
 			return error(_("corrupted bitmap pack index"));
 
@@ -402,6 +398,10 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 				return error(_("invalid XOR offset in bitmap pack index"));
 		}
 
+		bitmap = read_bitmap_1(index);
+		if (!bitmap)
+			return -1;
+
 		recent_bitmaps[i % MAX_XOR_OFFSET] = store_bitmap(
 			index, bitmap, &oid, xor_bitmap, flags);
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 2/3] pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed
  2025-05-20  9:23 ` [PATCH v2 0/3] pack-bitmap: fix memory leak if load_bitmap_entries_v1 failed Lidong Yan via GitGitGadget
  2025-05-20  9:23   ` [PATCH v2 1/3] pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed Lidong Yan via GitGitGadget
@ 2025-05-20  9:23   ` Taylor Blau via GitGitGadget
  2025-05-21 23:54     ` Taylor Blau
  2025-05-20  9:23   ` [PATCH v2 3/3] pack-bitmap: add loading corrupt bitmap_index test Lidong Yan via GitGitGadget
  2025-05-25  2:06   ` [PATCH v3 0/2] pack-bitmap: fix memory leak if load_bitmap_entries_v1 failed Lidong Yan via GitGitGadget
  3 siblings, 1 reply; 45+ messages in thread
From: Taylor Blau via GitGitGadget @ 2025-05-20  9:23 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Taylor Blau, Lidong Yan, Taylor Blau

From: Taylor Blau <me@ttaylorr.com>

After going through the "failed" label, load_bitmap() will return -1,
and its caller (either prepare_bitmap_walk() or prepare_bitmap_git())
will then call free_bitmap_index().

That function would have done:

    struct stored_bitmap *sb;
    kh_foreach_value(b->bitmaps, sb {
      ewah_pool_free(sb->root);
      free(sb);
    });

, but won't since load_bitmap() already called kh_destroy_oid_map() and
NULL'd the "bitmaps" pointer from within its "failed" label.

So I think if you got part of the way through loading bitmap entries and
then failed, you would leak all of the previous entries that you were
able to load successfully.

The solution is to remove the error handling code in load_bitmap(), because
its caller will always call free_bitmap_index() in case of an error.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 21 ++++-----------------
 1 file changed, 4 insertions(+), 17 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index ac6d62b980c5..fd19c2255163 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -630,41 +630,28 @@ static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git,
 	bitmap_git->ext_index.positions = kh_init_oid_pos();
 
 	if (load_reverse_index(r, bitmap_git))
-		goto failed;
+		return -1;
 
 	if (!(bitmap_git->commits = read_bitmap_1(bitmap_git)) ||
 		!(bitmap_git->trees = read_bitmap_1(bitmap_git)) ||
 		!(bitmap_git->blobs = read_bitmap_1(bitmap_git)) ||
 		!(bitmap_git->tags = read_bitmap_1(bitmap_git)))
-		goto failed;
+		return -1;
 
 	if (!bitmap_git->table_lookup && load_bitmap_entries_v1(bitmap_git) < 0)
-		goto failed;
+		return -1;
 
 	if (bitmap_git->base) {
 		if (!bitmap_is_midx(bitmap_git))
 			BUG("non-MIDX bitmap has non-NULL base bitmap index");
 		if (load_bitmap(r, bitmap_git->base, 1) < 0)
-			goto failed;
+			return -1;
 	}
 
 	if (!recursing)
 		load_all_type_bitmaps(bitmap_git);
 
 	return 0;
-
-failed:
-	munmap(bitmap_git->map, bitmap_git->map_size);
-	bitmap_git->map = NULL;
-	bitmap_git->map_size = 0;
-
-	kh_destroy_oid_map(bitmap_git->bitmaps);
-	bitmap_git->bitmaps = NULL;
-
-	kh_destroy_oid_pos(bitmap_git->ext_index.positions);
-	bitmap_git->ext_index.positions = NULL;
-
-	return -1;
 }
 
 static int open_pack_bitmap(struct repository *r,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 3/3] pack-bitmap: add loading corrupt bitmap_index test
  2025-05-20  9:23 ` [PATCH v2 0/3] pack-bitmap: fix memory leak if load_bitmap_entries_v1 failed Lidong Yan via GitGitGadget
  2025-05-20  9:23   ` [PATCH v2 1/3] pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed Lidong Yan via GitGitGadget
  2025-05-20  9:23   ` [PATCH v2 2/3] " Taylor Blau via GitGitGadget
@ 2025-05-20  9:23   ` Lidong Yan via GitGitGadget
  2025-05-22  0:08     ` Taylor Blau
  2025-05-25  2:06   ` [PATCH v3 0/2] pack-bitmap: fix memory leak if load_bitmap_entries_v1 failed Lidong Yan via GitGitGadget
  3 siblings, 1 reply; 45+ messages in thread
From: Lidong Yan via GitGitGadget @ 2025-05-20  9:23 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Taylor Blau, Lidong Yan, Lidong Yan

From: Lidong Yan <502024330056@smail.nju.edu.cn>

This patch add "load corrupt bitmap" test case in t5310-pack-bitmaps.sh.

This test case intentionally corrupt the "xor_offset" field of the first
entry. To find position of first entry in *.bitmap, we need to skip 4
ewah_bitmaps before entries. And I add a function `skip_ewah_bitmap()`
to do this.

Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
---
 t/t5310-pack-bitmaps.sh | 50 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index a62b463eaf09..537a507957bb 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -26,6 +26,18 @@ has_any () {
 	grep -Ff "$1" "$2"
 }
 
+skip_ewah_bitmap() {
+	local bitmap="$1" &&
+	local offset="$2" &&
+	local size= &&
+
+	offset=$(($offset + 4)) &&
+	size=0x$(od -An -v -t x1 -j $offset -N 4 $bitmap | tr -d ' \n') &&
+	size=$(($size * 8)) &&
+	offset=$(($offset + 4 + $size + 4)) &&
+	echo $offset
+}
+
 # Since name-hash values are stored in the .bitmap files, add a test
 # that checks that the name-hash calculations are stable across versions.
 # Not exhaustive, but these hashing algorithms would be hard to change
@@ -486,6 +498,44 @@ test_bitmap_cases () {
 			grep "ignoring extra bitmap" trace2.txt
 		)
 	'
+
+	# A `.bitmap` file has the following structure:
+	# | Header | Commits | Trees | Blobs | Tags | Entries... |
+	#
+	# - The header is 32 bytes long when using SHA-1.
+	# - Commits, Trees, Blobs, and Tags are all stored as EWAH bitmaps.
+	#
+	# This test intentionally corrupts the `xor_offset` field of the first entry
+	# to verify robustness against malformed bitmap data.
+	test_expect_success 'load corrupt bitmap' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
+
+			test_commit base &&
+
+			git repack -adb &&
+			bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
+			chmod +w "$bitmap" &&
+
+			hdr_sz=$((12 + $(test_oid rawsz))) &&
+			offset=$(skip_ewah_bitmap $bitmap $hdr_sz) &&
+			offset=$(skip_ewah_bitmap $bitmap $offset) &&
+			offset=$(skip_ewah_bitmap $bitmap $offset) &&
+			offset=$(skip_ewah_bitmap $bitmap $offset) &&
+			offset=$((offset + 4)) &&
+
+			printf '\161' |
+				dd of=$bitmap count=1 bs=1 conv=notrunc seek=$offset &&
+
+			git rev-list --count HEAD > expect &&
+			git rev-list --use-bitmap-index --count HEAD > actual &&
+			test_cmp expect actual
+		)
+	'
 }
 
 test_bitmap_cases
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 2/3] pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed
  2025-05-20  9:23   ` [PATCH v2 2/3] " Taylor Blau via GitGitGadget
@ 2025-05-21 23:54     ` Taylor Blau
  2025-05-22 15:15       ` lidongyan
  2025-05-22 21:22       ` Junio C Hamano
  0 siblings, 2 replies; 45+ messages in thread
From: Taylor Blau @ 2025-05-21 23:54 UTC (permalink / raw)
  To: Taylor Blau via GitGitGadget; +Cc: git, Jeff King, Lidong Yan

On Tue, May 20, 2025 at 09:23:09AM +0000, Taylor Blau via GitGitGadget wrote:
> Signed-off-by: Taylor Blau <me@ttaylorr.com>

This commit forges my Signed-off-by, but I am happy with the result
here.

I do think the series is structured a little awkwardly as a result of
adding this patch to it. That this and the previous patch have the
subject "pack-bitmap: fix memory leak if `load_bitmap_entries_v1`
failed" make the series not quite as clear as it could be.

I think there are a couple of things going on:

  - This patch is a bug fix that could be applied independently of the
    first one. The rationale there would be that we shouldn't be leaking
    the EWAH bitmaps in 'b->bitmaps', but we are as a result of NULL'ing
    the pointer in the "failed" label. That patch can stand alone.

  - The first patch (yours) is no longer fixing a leak, at least after
    this patch. But it does delay reading the bitmap until we have
    validated its XOR offset for sanity, which is a good thing mostly
    from a performance perspective.

I would probably swap the two patches around so that yours applies on
top of mine, and then rewords the patch message in yours to reflect that
it is no longer fixing a leak.

That all said, if you feel strongly that the structure is fine/better
as-is, I'd be more than happy to discuss it further.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/3] pack-bitmap: add loading corrupt bitmap_index test
  2025-05-20  9:23   ` [PATCH v2 3/3] pack-bitmap: add loading corrupt bitmap_index test Lidong Yan via GitGitGadget
@ 2025-05-22  0:08     ` Taylor Blau
  2025-05-22 15:05       ` lidongyan
  0 siblings, 1 reply; 45+ messages in thread
From: Taylor Blau @ 2025-05-22  0:08 UTC (permalink / raw)
  To: Lidong Yan via GitGitGadget; +Cc: git, Jeff King, Lidong Yan

On Tue, May 20, 2025 at 09:23:10AM +0000, Lidong Yan via GitGitGadget wrote:
> From: Lidong Yan <502024330056@smail.nju.edu.cn>
>
> This patch add "load corrupt bitmap" test case in t5310-pack-bitmaps.sh.
>
> This test case intentionally corrupt the "xor_offset" field of the first
> entry. To find position of first entry in *.bitmap, we need to skip 4
> ewah_bitmaps before entries. And I add a function `skip_ewah_bitmap()`
> to do this.

I'm going to avoid commenting on the message itself, since I think we
may be able to drop this patch entirely, see below.

> Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
> ---
>  t/t5310-pack-bitmaps.sh | 50 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 50 insertions(+)
>
> diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
> index a62b463eaf09..537a507957bb 100755
> --- a/t/t5310-pack-bitmaps.sh
> +++ b/t/t5310-pack-bitmaps.sh
> @@ -26,6 +26,18 @@ has_any () {
>  	grep -Ff "$1" "$2"
>  }
>
> +skip_ewah_bitmap() {
> +	local bitmap="$1" &&
> +	local offset="$2" &&
> +	local size= &&
> +
> +	offset=$(($offset + 4)) &&
> +	size=0x$(od -An -v -t x1 -j $offset -N 4 $bitmap | tr -d ' \n') &&
> +	size=$(($size * 8)) &&
> +	offset=$(($offset + 4 + $size + 4)) &&
> +	echo $offset
> +}
> +
>  # Since name-hash values are stored in the .bitmap files, add a test
>  # that checks that the name-hash calculations are stable across versions.
>  # Not exhaustive, but these hashing algorithms would be hard to change
> @@ -486,6 +498,44 @@ test_bitmap_cases () {
>  			grep "ignoring extra bitmap" trace2.txt
>  		)
>  	'
> +
> +	# A `.bitmap` file has the following structure:
> +	# | Header | Commits | Trees | Blobs | Tags | Entries... |
> +	#
> +	# - The header is 32 bytes long when using SHA-1.
> +	# - Commits, Trees, Blobs, and Tags are all stored as EWAH bitmaps.
> +	#
> +	# This test intentionally corrupts the `xor_offset` field of the first entry
> +	# to verify robustness against malformed bitmap data.
> +	test_expect_success 'load corrupt bitmap' '

I am not totally following what this case is supposed to be testing.
Let me think aloud for a moment...

> +		rm -fr repo &&
> +		git init repo &&
> +		test_when_finished "rm -fr repo" &&
> +		(
> +			cd repo &&
> +			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&

First we set up a temporary repository, change into it, and enable
bitmap lookup tables. Makes sense.

> +			test_commit base &&
> +
> +			git repack -adb &&
> +			bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
> +			chmod +w "$bitmap" &&

Then we make a commit, and write a bitmap containing the objects from
the commit we just made. Good.

> +			hdr_sz=$((12 + $(test_oid rawsz))) &&
> +			offset=$(skip_ewah_bitmap $bitmap $hdr_sz) &&
> +			offset=$(skip_ewah_bitmap $bitmap $offset) &&
> +			offset=$(skip_ewah_bitmap $bitmap $offset) &&
> +			offset=$(skip_ewah_bitmap $bitmap $offset) &&

Then we read past the header and four type bitmaps. Makes sense.

> +			offset=$((offset + 4)) &&

Now we land at the bitmap for the commit we just wrote.

(As an aside unrelated to this part of the test, this skip_ewah_bitmap()
function seems awfully fragile. I wonder if it would make more sense to
implement this as a test helper that can dump the offsets of EWAH
bitmaps in a *.bitmap file by object ID rather than trying to parse the
file ourselves?

We don't currently store an offset for each stored_bitmap that we
maintain, but doing so would be pretty straightforward (add it as a
field to the structure, and store the value of bitmap_git->map_pos from
immediately before reading the actual bitmap).)

> +			printf '\161' |
> +				dd of=$bitmap count=1 bs=1 conv=notrunc seek=$offset &&

OK. Now we break the XOR offset field of this bitmap by writing garbage
into it.

> +			git rev-list --count HEAD > expect &&
> +			git rev-list --use-bitmap-index --count HEAD > actual &&
> +			test_cmp expect actual

...and then we make sure that we still get the correct result.

Hmmph. I don't think this is quite testing what we want, since this test
passes with or without your first patch. And that makes sense, we have
tests elsewhere in this script that verify we can still fall back to
classic traversal when the bitmap index can't be read. (For some
examples, see: "truncated bitmap fails gracefully (ewah)" and "truncated
bitmap fails gracefully (cache)".)

I think what we're really testing here is the absence of a memory leak,
which we are as of 1fc7ddf35b (test-lib: unconditionally enable leak
checking, 2024-11-20). I wonder whether or not we need this test at all?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/3] pack-bitmap: add loading corrupt bitmap_index test
  2025-05-22  0:08     ` Taylor Blau
@ 2025-05-22 15:05       ` lidongyan
  2025-05-23  0:31         ` Taylor Blau
  0 siblings, 1 reply; 45+ messages in thread
From: lidongyan @ 2025-05-22 15:05 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Lidong Yan via GitGitGadget, git, Jeff King

2025年5月22日 08:08,Taylor Blau <me@ttaylorr.com> 写道:
> 
> On Tue, May 20, 2025 at 09:23:10AM +0000, Lidong Yan via GitGitGadget wrote:
>> From: Lidong Yan <502024330056@smail.nju.edu.cn>
>> 
>> This patch add "load corrupt bitmap" test case in t5310-pack-bitmaps.sh.
>> 
>> This test case intentionally corrupt the "xor_offset" field of the first
>> entry. To find position of first entry in *.bitmap, we need to skip 4
>> ewah_bitmaps before entries. And I add a function `skip_ewah_bitmap()`
>> to do this.
> 
> I'm going to avoid commenting on the message itself, since I think we
> may be able to drop this patch entirely, see below.
> 
>> Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
>> ---
>> t/t5310-pack-bitmaps.sh | 50 +++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 50 insertions(+)
>> 
>> diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
>> index a62b463eaf09..537a507957bb 100755
>> --- a/t/t5310-pack-bitmaps.sh
>> +++ b/t/t5310-pack-bitmaps.sh
>> @@ -26,6 +26,18 @@ has_any () {
>> grep -Ff "$1" "$2"
>> }
>> 
>> +skip_ewah_bitmap() {
>> + local bitmap="$1" &&
>> + local offset="$2" &&
>> + local size= &&
>> +
>> + offset=$(($offset + 4)) &&
>> + size=0x$(od -An -v -t x1 -j $offset -N 4 $bitmap | tr -d ' \n') &&
>> + size=$(($size * 8)) &&
>> + offset=$(($offset + 4 + $size + 4)) &&
>> + echo $offset
>> +}
>> +
>> # Since name-hash values are stored in the .bitmap files, add a test
>> # that checks that the name-hash calculations are stable across versions.
>> # Not exhaustive, but these hashing algorithms would be hard to change
>> @@ -486,6 +498,44 @@ test_bitmap_cases () {
>> grep "ignoring extra bitmap" trace2.txt
>> )
>> '
>> +
>> + # A `.bitmap` file has the following structure:
>> + # | Header | Commits | Trees | Blobs | Tags | Entries... |
>> + #
>> + # - The header is 32 bytes long when using SHA-1.
>> + # - Commits, Trees, Blobs, and Tags are all stored as EWAH bitmaps.
>> + #
>> + # This test intentionally corrupts the `xor_offset` field of the first entry
>> + # to verify robustness against malformed bitmap data.
>> + test_expect_success 'load corrupt bitmap' '
> 
> I am not totally following what this case is supposed to be testing.
> Let me think aloud for a moment...
> 
>> + rm -fr repo &&
>> + git init repo &&
>> + test_when_finished "rm -fr repo" &&
>> + (
>> + cd repo &&
>> + git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
> 
> First we set up a temporary repository, change into it, and enable
> bitmap lookup tables. Makes sense.
> 
>> + test_commit base &&
>> +
>> + git repack -adb &&
>> + bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
>> + chmod +w "$bitmap" &&
> 
> Then we make a commit, and write a bitmap containing the objects from
> the commit we just made. Good.
> 
>> + hdr_sz=$((12 + $(test_oid rawsz))) &&
>> + offset=$(skip_ewah_bitmap $bitmap $hdr_sz) &&
>> + offset=$(skip_ewah_bitmap $bitmap $offset) &&
>> + offset=$(skip_ewah_bitmap $bitmap $offset) &&
>> + offset=$(skip_ewah_bitmap $bitmap $offset) &&
> 
> Then we read past the header and four type bitmaps. Makes sense.
> 
>> + offset=$((offset + 4)) &&
> 
> Now we land at the bitmap for the commit we just wrote.
> 
> (As an aside unrelated to this part of the test, this skip_ewah_bitmap()
> function seems awfully fragile. I wonder if it would make more sense to
> implement this as a test helper that can dump the offsets of EWAH
> bitmaps in a *.bitmap file by object ID rather than trying to parse the
> file ourselves?
> 

I am actually replaying the pack-bitmap.c:prepare_bitmap() here. Also I have had
write a test helper version once. And since I want to use prepare_bitmap()
I have to put the code in pack-bitmap.c. It looks like this

diff --git a/pack-bitmap.c b/pack-bitmap.c
index b9f1d866046..9642a06b3fe 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -3022,6 +3022,71 @@ cleanup:
return ret;
}

+typedef void(corrupt_fn)(struct bitmap_index *);
+
+static int bitmap_corrupt_then_load(struct repository *r, corrupt_fn *do_corrupt)
+{
+ struct bitmap_index *bitmap_git;
+ unsigned char *map;
+
+ if (!(bitmap_git = prepare_bitmap_git(r)))
+     die(_("failed to prepare bitmap indexes"));
+ /*
+  * If the table lookup extension is not used,
+  * prepare_bitmap_git has already called load_bitmap_entries_v1(),
+  * making it impossible to corrupt the bitmap.
+  */
+ if (!bitmap_git->table_lookup)
+     return 0;
+
+ /*
+  * bitmap_git->map is read-only;
+  * to corrupt it, we need a writable memory block.
+  */
+ map = bitmap_git->map;
+ bitmap_git->map = xmalloc(bitmap_git->map_size);
+ if (!bitmap_git->map)
+     return 0;
+ memcpy(bitmap_git->map, map, bitmap_git->map_size);
+
+ do_corrupt(bitmap_git);
+ if (!load_bitmap_entries_v1(bitmap_git))
+     die(_("load corrupt bitmap successfully"));
+
+ free(bitmap_git->map);
+ bitmap_git->map = map;
+ free_bitmap_index(bitmap_git);
+
+ return 0;
+}
+
+static void do_corrupt_commit_pos(struct bitmap_index *bitmap_git)
+{
+ uint32_t *commit_pos_ptr;
+
+ commit_pos_ptr = (uint32_t *)(bitmap_git->map + bitmap_git->map_pos);
+ *commit_pos_ptr = (uint32_t)-1;
+}
+
+static void do_corrupt_xor_offset(struct bitmap_index *bitmap_git)
+{
+ uint8_t *xor_offset_ptr;
+
+ xor_offset_ptr = (uint8_t *)(bitmap_git->map + bitmap_git->map_pos +
+      sizeof(uint32_t));
+ *xor_offset_ptr = MAX_XOR_OFFSET + 1;
+}
+
+int test_bitmap_load_corrupt(struct repository *r)
+{
+ int res = 0;
+ if ((res = bitmap_corrupt_then_load(r, do_corrupt_commit_pos)))
+     return res;
+ if ((res = bitmap_corrupt_then_load(r, do_corrupt_xor_offset)))
+     return res;
+ return res;
+}
+
int rebuild_bitmap(const uint32_t *reposition,
   struct ewah_bitmap *source,
   struct bitmap *dest)

> We don't currently store an offset for each stored_bitmap that we
> maintain, but doing so would be pretty straightforward (add it as a
> field to the structure, and store the value of bitmap_git->map_pos from
> immediately before reading the actual bitmap).)
> 
>> + printf '\161' |
>> + dd of=$bitmap count=1 bs=1 conv=notrunc seek=$offset &&
> 
> OK. Now we break the XOR offset field of this bitmap by writing garbage
> into it.
> 
>> + git rev-list --count HEAD > expect &&
>> + git rev-list --use-bitmap-index --count HEAD > actual &&
>> + test_cmp expect actual
> 
> ...and then we make sure that we still get the correct result.
> 
> Hmmph. I don't think this is quite testing what we want, since this test
> passes with or without your first patch. And that makes sense, we have
> tests elsewhere in this script that verify we can still fall back to
> classic traversal when the bitmap index can't be read. (For some
> examples, see: "truncated bitmap fails gracefully (ewah)" and "truncated
> bitmap fails gracefully (cache)".)

I want to *test* for a memory leak here, not whether git can load a corrupt bitmap.
Since git ci linux-leak test runs each test script with ASAN_OPTIONS=detect_leaks=1, I’m 
including this test case specifically to check whether it triggers a crash when 
`SANITIZE_LEAK` is enabled. And I do find if without the first patch, leak sanitizer
running this test script would output error message.

> I think what we're really testing here is the absence of a memory leak,
> which we are as of 1fc7ddf35b (test-lib: unconditionally enable leak
> checking, 2024-11-20). I wonder whether or not we need this test at all?
> 
> Thanks,
> Taylor

I am not truly following what are you talking here. But If you think it’s unnecessary to
check for potential leaks in load_bitmap() or load_bitmap_entries_v1(). Or this test
script shouldn’t be put in this way. I’m happy to drop the final patch.

Thanks
Lidong Yan

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 2/3] pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed
  2025-05-21 23:54     ` Taylor Blau
@ 2025-05-22 15:15       ` lidongyan
  2025-05-22 21:22       ` Junio C Hamano
  1 sibling, 0 replies; 45+ messages in thread
From: lidongyan @ 2025-05-22 15:15 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Taylor Blau via GitGitGadget, git, Jeff King

2025年5月22日 07:54,Taylor Blau <me@ttaylorr.com> 写道:
> 
> On Tue, May 20, 2025 at 09:23:09AM +0000, Taylor Blau via GitGitGadget wrote:
>> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> 
> This commit forges my Signed-off-by, but I am happy with the result
> here.
> 
> I do think the series is structured a little awkwardly as a result of
> adding this patch to it. That this and the previous patch have the
> subject "pack-bitmap: fix memory leak if `load_bitmap_entries_v1`
> failed" make the series not quite as clear as it could be.
> 

Agreed. I’ve definitely learned a lot about how to write commit messages
 and cover letters through this process

> I think there are a couple of things going on:
> 
>  - This patch is a bug fix that could be applied independently of the
>    first one. The rationale there would be that we shouldn't be leaking
>    the EWAH bitmaps in 'b->bitmaps', but we are as a result of NULL'ing
>    the pointer in the "failed" label. That patch can stand alone.
> 
>  - The first patch (yours) is no longer fixing a leak, at least after
>    this patch. But it does delay reading the bitmap until we have
>    validated its XOR offset for sanity, which is a good thing mostly
>    from a performance perspective.
> 
> I would probably swap the two patches around so that yours applies on
> top of mine, and then rewords the patch message in yours to reflect that
> it is no longer fixing a leak.
> 
> That all said, if you feel strongly that the structure is fine/better
> as-is, I'd be more than happy to discuss it further.
> 
> Thanks,
> Taylor
> 

I think I can do this in third version, and I have to submit patch v3 after
we decide if patch v2 3/3 in this series should live or not. 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 2/3] pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed
  2025-05-21 23:54     ` Taylor Blau
  2025-05-22 15:15       ` lidongyan
@ 2025-05-22 21:22       ` Junio C Hamano
  1 sibling, 0 replies; 45+ messages in thread
From: Junio C Hamano @ 2025-05-22 21:22 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Taylor Blau via GitGitGadget, git, Jeff King, Lidong Yan

Taylor Blau <me@ttaylorr.com> writes:

> On Tue, May 20, 2025 at 09:23:09AM +0000, Taylor Blau via GitGitGadget wrote:
>> Signed-off-by: Taylor Blau <me@ttaylorr.com>
>
> This commit forges my Signed-off-by, but I am happy with the result
> here.
>
> I do think the series is structured a little awkwardly as a result of
> adding this patch to it. That this and the previous patch have the
> subject "pack-bitmap: fix memory leak if `load_bitmap_entries_v1`
> failed" make the series not quite as clear as it could be.
>
> I think there are a couple of things going on:
>
>   - This patch is a bug fix that could be applied independently of the
>     first one. The rationale there would be that we shouldn't be leaking
>     the EWAH bitmaps in 'b->bitmaps', but we are as a result of NULL'ing
>     the pointer in the "failed" label. That patch can stand alone.
>
>   - The first patch (yours) is no longer fixing a leak, at least after
>     this patch. But it does delay reading the bitmap until we have
>     validated its XOR offset for sanity, which is a good thing mostly
>     from a performance perspective.
>
> I would probably swap the two patches around so that yours applies on
> top of mine, and then rewords the patch message in yours to reflect that
> it is no longer fixing a leak.

Sounds like a plausible structure.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/3] pack-bitmap: add loading corrupt bitmap_index test
  2025-05-22 15:05       ` lidongyan
@ 2025-05-23  0:31         ` Taylor Blau
  2025-05-23  7:17           ` lidongyan
  0 siblings, 1 reply; 45+ messages in thread
From: Taylor Blau @ 2025-05-23  0:31 UTC (permalink / raw)
  To: lidongyan; +Cc: Lidong Yan via GitGitGadget, git, Jeff King

On Thu, May 22, 2025 at 11:05:56PM +0800, lidongyan wrote:
> > (As an aside unrelated to this part of the test, this skip_ewah_bitmap()
> > function seems awfully fragile. I wonder if it would make more sense to
> > implement this as a test helper that can dump the offsets of EWAH
> > bitmaps in a *.bitmap file by object ID rather than trying to parse the
> > file ourselves?
> >
>
> I am actually replaying the pack-bitmap.c:prepare_bitmap() here. Also I have had
> write a test helper version once. And since I want to use prepare_bitmap()
> I have to put the code in pack-bitmap.c. It looks like this
>
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index b9f1d866046..9642a06b3fe 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> [...]

Yeah, since the pack_bitmap struct is defined locally within the
pack-bitmap.c compilation unit, any test helper that performs any
non-trivial operation would likely need to be defined in that file.

The "test helper" code would be a little shim into the real
functionality within pack-bitmap.c. See the following for an example:

    - t/helper/test-bitmap.c::bitmap_list_commits()
    - pack-bitmap.c::test_bitmap_commits()

Here the former dispatches a single call to the latter, where all of the
real functionality is.

But the (elided) code below isn't quite what I was thinking. I think the
"write garbage data" part is fine as-is and can continue to be written
in shell. We have lots of examples of using dd to write garbage data
into files (see for e.g., the "corrupt_data()" function in t5319).

What I was thinking is the test helper would print (via some new mode,
or bolted onto "list-commits") line-delimited output like the following:

    $COMMIT_OID $BITMAP_OFFSET $FLAGS $XOR_OFFSET

or similar. Then you could use the output of that to determine the
location (replacing everything up to the actual "printf | dd
of=$bitmap ...", which is the most fragile in my opinion).

> > Hmmph. I don't think this is quite testing what we want, since this test
> > passes with or without your first patch. And that makes sense, we have
> > tests elsewhere in this script that verify we can still fall back to
> > classic traversal when the bitmap index can't be read. (For some
> > examples, see: "truncated bitmap fails gracefully (ewah)" and "truncated
> > bitmap fails gracefully (cache)".)
>
> I want to *test* for a memory leak here, not whether git can load a corrupt bitmap.
> Since git ci linux-leak test runs each test script with ASAN_OPTIONS=detect_leaks=1, I’m
> including this test case specifically to check whether it triggers a crash when
> `SANITIZE_LEAK` is enabled. And I do find if without the first patch, leak sanitizer
> running this test script would output error message.

Makes sense.

> > I think what we're really testing here is the absence of a memory leak,
> > which we are as of 1fc7ddf35b (test-lib: unconditionally enable leak
> > checking, 2024-11-20). I wonder whether or not we need this test at all?
> >
> > Thanks,
> > Taylor
>
> I am not truly following what are you talking here. But If you think it’s unnecessary to
> check for potential leaks in load_bitmap() or load_bitmap_entries_v1(). Or this test
> script shouldn’t be put in this way. I’m happy to drop the final patch.

I think the above scenario (writing a test that would have leaked memory
otherwise behind a SANITIZE_LEAK prerequisite) is reasonable.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/3] pack-bitmap: add loading corrupt bitmap_index test
  2025-05-23  0:31         ` Taylor Blau
@ 2025-05-23  7:17           ` lidongyan
  0 siblings, 0 replies; 45+ messages in thread
From: lidongyan @ 2025-05-23  7:17 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Lidong Yan via GitGitGadget, git, Jeff King

2025年5月23日 08:31,Taylor Blau <me@ttaylorr.com> 写道:
> But the (elided) code below isn't quite what I was thinking. I think the
> "write garbage data" part is fine as-is and can continue to be written
> in shell. We have lots of examples of using dd to write garbage data
> into files (see for e.g., the "corrupt_data()" function in t5319).
> 
> What I was thinking is the test helper would print (via some new mode,
> or bolted onto "list-commits") line-delimited output like the following:
> 
>    $COMMIT_OID $BITMAP_OFFSET $FLAGS $XOR_OFFSET
> 
> or similar. Then you could use the output of that to determine the
> location (replacing everything up to the actual "printf | dd
> of=$bitmap ...", which is the most fragile in my opinion).

Agreed, I would add a `test-tool bitmap dump-entries` helper which dumps
the output you suggest.

> I think the above scenario (writing a test that would have leaked memory
> otherwise behind a SANITIZE_LEAK prerequisite) is reasonable.

I will submit patch v3 with better structure and cover letter soon.

Thanks,
Lidong


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v3 0/2] pack-bitmap: fix memory leak if load_bitmap_entries_v1 failed
  2025-05-20  9:23 ` [PATCH v2 0/3] pack-bitmap: fix memory leak if load_bitmap_entries_v1 failed Lidong Yan via GitGitGadget
                     ` (2 preceding siblings ...)
  2025-05-20  9:23   ` [PATCH v2 3/3] pack-bitmap: add loading corrupt bitmap_index test Lidong Yan via GitGitGadget
@ 2025-05-25  2:06   ` Lidong Yan via GitGitGadget
  2025-05-25  2:06     ` [PATCH v3 1/2] pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed Taylor Blau via GitGitGadget
                       ` (2 more replies)
  3 siblings, 3 replies; 45+ messages in thread
From: Lidong Yan via GitGitGadget @ 2025-05-25  2:06 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Taylor Blau, Lidong Yan

In pack-bitmap.c:load_bitmap_entries_v1, the function read_bitmap_1
allocates a bitmap and reads index data into it. However, if any of the
validation checks following the allocation fail, the allocated bitmap is not
freed, resulting in a memory leak. To avoid this, the validation checks
should be performed before the bitmap is allocated.

Lidong Yan (1):
  pack-bitmap: add load corrupt bitmap test

Taylor Blau (1):
  pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed

 pack-bitmap.c           | 94 +++++++++++++++++++++++++++++++----------
 pack-bitmap.h           |  1 +
 t/helper/test-bitmap.c  |  8 ++++
 t/t5310-pack-bitmaps.sh | 27 ++++++++++++
 4 files changed, 107 insertions(+), 23 deletions(-)


base-commit: 845c48a16a7f7b2c44d8cb137b16a4a1f0140229
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1962%2Fbrandb97%2Ffix-pack-bitmap-leak-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1962/brandb97/fix-pack-bitmap-leak-v3
Pull-Request: https://github.com/git/git/pull/1962

Range-diff vs v2:

 1:  130c3dc5dcd < -:  ----------- pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed
 2:  b515c278a8f = 1:  cf87aad7c99 pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed
 3:  5be22d563af ! 2:  f5371d7daa9 pack-bitmap: add loading corrupt bitmap_index test
     @@ Metadata
      Author: Lidong Yan <502024330056@smail.nju.edu.cn>
      
       ## Commit message ##
     -    pack-bitmap: add loading corrupt bitmap_index test
     +    pack-bitmap: add load corrupt bitmap test
      
     -    This patch add "load corrupt bitmap" test case in t5310-pack-bitmaps.sh.
     +    This patch add test_bitmap_list_commits_offset() in patch-bitmap.c,
     +    a new test helper command `test-tool bitmap list-commits-offset`,
     +    and a `load corrupt bitmap` test case in t5310.
      
     -    This test case intentionally corrupt the "xor_offset" field of the first
     -    entry. To find position of first entry in *.bitmap, we need to skip 4
     -    ewah_bitmaps before entries. And I add a function `skip_ewah_bitmap()`
     -    to do this.
     +    The `load corrupt bitmap` test case intentionally corrupt the
     +    "xor_offset" field of the first entry. And the newly added helper
     +    can help to find position of "xor_offset" in bitmap file.
      
          Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
      
     - ## t/t5310-pack-bitmaps.sh ##
     -@@ t/t5310-pack-bitmaps.sh: has_any () {
     - 	grep -Ff "$1" "$2"
     + ## pack-bitmap.c ##
     +@@ pack-bitmap.c: struct stored_bitmap {
     + 	int flags;
     + };
     + 
     ++struct stored_bitmap_tag_pos {
     ++	struct stored_bitmap stored;
     ++	size_t map_pos;
     ++};
     ++
     + /*
     +  * The active bitmap index for a repository. By design, repositories only have
     +  * a single bitmap index available (the index for the biggest packfile in
     +@@ pack-bitmap.c: static int existing_bitmaps_hits_nr;
     + static int existing_bitmaps_misses_nr;
     + static int roots_with_bitmaps_nr;
     + static int roots_without_bitmaps_nr;
     ++static int tag_pos_on_bitmap;
     + 
     + static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st)
     + {
     +@@ pack-bitmap.c: static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
     + 					  struct ewah_bitmap *root,
     + 					  const struct object_id *oid,
     + 					  struct stored_bitmap *xor_with,
     +-					  int flags)
     ++					  int flags, size_t map_pos)
     + {
     + 	struct stored_bitmap *stored;
     ++	struct stored_bitmap_tag_pos *tagged;
     + 	khiter_t hash_pos;
     + 	int ret;
     + 
     +-	stored = xmalloc(sizeof(struct stored_bitmap));
     ++	tagged = xmalloc(tag_pos_on_bitmap ? sizeof(struct stored_bitmap_tag_pos) :
     ++					     sizeof(struct stored_bitmap));
     ++	stored = &tagged->stored;
     ++	if (tag_pos_on_bitmap)
     ++		tagged->map_pos = map_pos;
     + 	stored->root = root;
     + 	stored->xor = xor_with;
     + 	stored->flags = flags;
     +@@ pack-bitmap.c: static int load_bitmap_entries_v1(struct bitmap_index *index)
     + 		struct stored_bitmap *xor_bitmap = NULL;
     + 		uint32_t commit_idx_pos;
     + 		struct object_id oid;
     ++		size_t entry_map_pos;
     + 
     + 		if (index->map_size - index->map_pos < 6)
     + 			return error(_("corrupt ewah bitmap: truncated header for entry %d"), i);
     + 
     ++		entry_map_pos = index->map_pos;
     + 		commit_idx_pos = read_be32(index->map, &index->map_pos);
     + 		xor_offset = read_u8(index->map, &index->map_pos);
     + 		flags = read_u8(index->map, &index->map_pos);
     +@@ pack-bitmap.c: static int load_bitmap_entries_v1(struct bitmap_index *index)
     + 		if (!bitmap)
     + 			return -1;
     + 
     +-		recent_bitmaps[i % MAX_XOR_OFFSET] = store_bitmap(
     +-			index, bitmap, &oid, xor_bitmap, flags);
     ++		recent_bitmaps[i % MAX_XOR_OFFSET] =
     ++			store_bitmap(index, bitmap, &oid, xor_bitmap, flags,
     ++				     entry_map_pos);
     + 	}
     + 
     + 	return 0;
     +@@ pack-bitmap.c: static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
     + 	int xor_flags;
     + 	khiter_t hash_pos;
     + 	struct bitmap_lookup_table_xor_item *xor_item;
     ++	size_t entry_map_pos;
     + 
     + 	if (is_corrupt)
     + 		return NULL;
     +@@ pack-bitmap.c: static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
     + 			goto corrupt;
     + 		}
     + 
     ++		entry_map_pos = bitmap_git->map_pos;
     + 		bitmap_git->map_pos += sizeof(uint32_t) + sizeof(uint8_t);
     + 		xor_flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
     + 		bitmap = read_bitmap_1(bitmap_git);
     +@@ pack-bitmap.c: static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
     + 		if (!bitmap)
     + 			goto corrupt;
     + 
     +-		xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_item->oid, xor_bitmap, xor_flags);
     ++		xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_item->oid,
     ++					  xor_bitmap, xor_flags, entry_map_pos);
     + 		xor_items_nr--;
     + 	}
     + 
     +@@ pack-bitmap.c: static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
     + 	 * Instead, we can skip ahead and immediately read the flags and
     + 	 * ewah bitmap.
     + 	 */
     ++	entry_map_pos = bitmap_git->map_pos;
     + 	bitmap_git->map_pos += sizeof(uint32_t) + sizeof(uint8_t);
     + 	flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
     + 	bitmap = read_bitmap_1(bitmap_git);
     +@@ pack-bitmap.c: static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
     + 	if (!bitmap)
     + 		goto corrupt;
     + 
     +-	return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags);
     ++	return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags,
     ++			    entry_map_pos);
     + 
     + corrupt:
     + 	free(xor_items);
     +@@ pack-bitmap.c: int test_bitmap_commits(struct repository *r)
     + 	return 0;
       }
       
     -+skip_ewah_bitmap() {
     -+	local bitmap="$1" &&
     -+	local offset="$2" &&
     -+	local size= &&
     ++int test_bitmap_commits_offset(struct repository *r)
     ++{
     ++	struct object_id oid;
     ++	struct stored_bitmap_tag_pos *tagged;
     ++	struct bitmap_index *bitmap_git;
     ++	size_t commit_idx_pos_map_pos, xor_offset_map_pos, flag_map_pos,
     ++		ewah_bitmap_map_pos;
     ++
     ++	tag_pos_on_bitmap = 1;
     ++	bitmap_git = prepare_bitmap_git(r);
     ++	if (!bitmap_git)
     ++		die(_("failed to load bitmap indexes"));
      +
     -+	offset=$(($offset + 4)) &&
     -+	size=0x$(od -An -v -t x1 -j $offset -N 4 $bitmap | tr -d ' \n') &&
     -+	size=$(($size * 8)) &&
     -+	offset=$(($offset + 4 + $size + 4)) &&
     -+	echo $offset
     ++	/*
     ++	 * As this function is only used to print bitmap selected
     ++	 * commits, we don't have to read the commit table.
     ++	 */
     ++	if (bitmap_git->table_lookup) {
     ++		if (load_bitmap_entries_v1(bitmap_git) < 0)
     ++			die(_("failed to load bitmap indexes"));
     ++	}
     ++
     ++	kh_foreach (bitmap_git->bitmaps, oid, tagged, {
     ++		commit_idx_pos_map_pos = tagged->map_pos;
     ++		xor_offset_map_pos = tagged->map_pos + sizeof(uint32_t);
     ++		flag_map_pos = xor_offset_map_pos + sizeof(uint8_t);
     ++		ewah_bitmap_map_pos = flag_map_pos + sizeof(uint8_t);
     ++
     ++		printf_ln("%s %"PRIuMAX" %"PRIuMAX" %"PRIuMAX" %"PRIuMAX,
     ++			  oid_to_hex(&oid),
     ++			  (uintmax_t)commit_idx_pos_map_pos,
     ++			  (uintmax_t)xor_offset_map_pos,
     ++			  (uintmax_t)flag_map_pos,
     ++			  (uintmax_t)ewah_bitmap_map_pos);
     ++	})
     ++		;
     ++
     ++	free_bitmap_index(bitmap_git);
     ++
     ++	return 0;
     ++}
     ++
     + int test_bitmap_hashes(struct repository *r)
     + {
     + 	struct bitmap_index *bitmap_git = prepare_bitmap_git(r);
     +
     + ## pack-bitmap.h ##
     +@@ pack-bitmap.h: void traverse_bitmap_commit_list(struct bitmap_index *,
     + 				 show_reachable_fn show_reachable);
     + void test_bitmap_walk(struct rev_info *revs);
     + int test_bitmap_commits(struct repository *r);
     ++int test_bitmap_commits_offset(struct repository *r);
     + int test_bitmap_hashes(struct repository *r);
     + int test_bitmap_pseudo_merges(struct repository *r);
     + int test_bitmap_pseudo_merge_commits(struct repository *r, uint32_t n);
     +
     + ## t/helper/test-bitmap.c ##
     +@@ t/helper/test-bitmap.c: static int bitmap_list_commits(void)
     + 	return test_bitmap_commits(the_repository);
     + }
     + 
     ++static int bitmap_list_commits_offset(void)
     ++{
     ++	return test_bitmap_commits_offset(the_repository);
      +}
      +
     - # Since name-hash values are stored in the .bitmap files, add a test
     - # that checks that the name-hash calculations are stable across versions.
     - # Not exhaustive, but these hashing algorithms would be hard to change
     + static int bitmap_dump_hashes(void)
     + {
     + 	return test_bitmap_hashes(the_repository);
     +@@ t/helper/test-bitmap.c: int cmd__bitmap(int argc, const char **argv)
     + 
     + 	if (argc == 2 && !strcmp(argv[1], "list-commits"))
     + 		return bitmap_list_commits();
     ++	if (argc == 2 && !strcmp(argv[1], "list-commits-offset"))
     ++		return bitmap_list_commits_offset();
     + 	if (argc == 2 && !strcmp(argv[1], "dump-hashes"))
     + 		return bitmap_dump_hashes();
     + 	if (argc == 2 && !strcmp(argv[1], "dump-pseudo-merges"))
     +@@ t/helper/test-bitmap.c: int cmd__bitmap(int argc, const char **argv)
     + 		return bitmap_dump_pseudo_merge_objects(atoi(argv[2]));
     + 
     + 	usage("\ttest-tool bitmap list-commits\n"
     ++	      "\ttest-tool bitmap list-commits-offset\n"
     + 	      "\ttest-tool bitmap dump-hashes\n"
     + 	      "\ttest-tool bitmap dump-pseudo-merges\n"
     + 	      "\ttest-tool bitmap dump-pseudo-merge-commits <n>\n"
     +
     + ## t/t5310-pack-bitmaps.sh ##
      @@ t/t5310-pack-bitmaps.sh: test_bitmap_cases () {
       			grep "ignoring extra bitmap" trace2.txt
       		)
       	'
      +
     -+	# A `.bitmap` file has the following structure:
     -+	# | Header | Commits | Trees | Blobs | Tags | Entries... |
     -+	#
     -+	# - The header is 32 bytes long when using SHA-1.
     -+	# - Commits, Trees, Blobs, and Tags are all stored as EWAH bitmaps.
     -+	#
     -+	# This test intentionally corrupts the `xor_offset` field of the first entry
     -+	# to verify robustness against malformed bitmap data.
      +	test_expect_success 'load corrupt bitmap' '
      +		rm -fr repo &&
      +		git init repo &&
     @@ t/t5310-pack-bitmaps.sh: test_bitmap_cases () {
      +
      +			git repack -adb &&
      +			bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
     -+			chmod +w "$bitmap" &&
     -+
     -+			hdr_sz=$((12 + $(test_oid rawsz))) &&
     -+			offset=$(skip_ewah_bitmap $bitmap $hdr_sz) &&
     -+			offset=$(skip_ewah_bitmap $bitmap $offset) &&
     -+			offset=$(skip_ewah_bitmap $bitmap $offset) &&
     -+			offset=$(skip_ewah_bitmap $bitmap $offset) &&
     -+			offset=$((offset + 4)) &&
     ++			chmod +w $bitmap &&
      +
     ++			read oid commit_off xor_off flag_off ewah_off <<-EOF &&
     ++				$(test-tool bitmap list-commits-offset | head -n 1)
     ++			EOF
      +			printf '\161' |
     -+				dd of=$bitmap count=1 bs=1 conv=notrunc seek=$offset &&
     ++				dd of=$bitmap count=1 bs=1 conv=notrunc seek=$xor_off &&
     ++
      +
      +			git rev-list --count HEAD > expect &&
      +			git rev-list --use-bitmap-index --count HEAD > actual &&

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v3 1/2] pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed
  2025-05-25  2:06   ` [PATCH v3 0/2] pack-bitmap: fix memory leak if load_bitmap_entries_v1 failed Lidong Yan via GitGitGadget
@ 2025-05-25  2:06     ` Taylor Blau via GitGitGadget
  2025-05-25  2:06     ` [PATCH v3 2/2] pack-bitmap: add load corrupt bitmap test Lidong Yan via GitGitGadget
  2025-05-25  2:43     ` [PATCH v4 0/2] pack-bitmap: fix memory leak if load_bitmap failed Lidong Yan via GitGitGadget
  2 siblings, 0 replies; 45+ messages in thread
From: Taylor Blau via GitGitGadget @ 2025-05-25  2:06 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Taylor Blau, Lidong Yan, Taylor Blau

From: Taylor Blau <me@ttaylorr.com>

After going through the "failed" label, load_bitmap() will return -1,
and its caller (either prepare_bitmap_walk() or prepare_bitmap_git())
will then call free_bitmap_index().

That function would have done:

    struct stored_bitmap *sb;
    kh_foreach_value(b->bitmaps, sb {
      ewah_pool_free(sb->root);
      free(sb);
    });

, but won't since load_bitmap() already called kh_destroy_oid_map() and
NULL'd the "bitmaps" pointer from within its "failed" label.

So I think if you got part of the way through loading bitmap entries and
then failed, you would leak all of the previous entries that you were
able to load successfully.

The solution is to remove the error handling code in load_bitmap(), because
its caller will always call free_bitmap_index() in case of an error.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 21 ++++-----------------
 1 file changed, 4 insertions(+), 17 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index ac6d62b980c5..fd19c2255163 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -630,41 +630,28 @@ static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git,
 	bitmap_git->ext_index.positions = kh_init_oid_pos();
 
 	if (load_reverse_index(r, bitmap_git))
-		goto failed;
+		return -1;
 
 	if (!(bitmap_git->commits = read_bitmap_1(bitmap_git)) ||
 		!(bitmap_git->trees = read_bitmap_1(bitmap_git)) ||
 		!(bitmap_git->blobs = read_bitmap_1(bitmap_git)) ||
 		!(bitmap_git->tags = read_bitmap_1(bitmap_git)))
-		goto failed;
+		return -1;
 
 	if (!bitmap_git->table_lookup && load_bitmap_entries_v1(bitmap_git) < 0)
-		goto failed;
+		return -1;
 
 	if (bitmap_git->base) {
 		if (!bitmap_is_midx(bitmap_git))
 			BUG("non-MIDX bitmap has non-NULL base bitmap index");
 		if (load_bitmap(r, bitmap_git->base, 1) < 0)
-			goto failed;
+			return -1;
 	}
 
 	if (!recursing)
 		load_all_type_bitmaps(bitmap_git);
 
 	return 0;
-
-failed:
-	munmap(bitmap_git->map, bitmap_git->map_size);
-	bitmap_git->map = NULL;
-	bitmap_git->map_size = 0;
-
-	kh_destroy_oid_map(bitmap_git->bitmaps);
-	bitmap_git->bitmaps = NULL;
-
-	kh_destroy_oid_pos(bitmap_git->ext_index.positions);
-	bitmap_git->ext_index.positions = NULL;
-
-	return -1;
 }
 
 static int open_pack_bitmap(struct repository *r,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v3 2/2] pack-bitmap: add load corrupt bitmap test
  2025-05-25  2:06   ` [PATCH v3 0/2] pack-bitmap: fix memory leak if load_bitmap_entries_v1 failed Lidong Yan via GitGitGadget
  2025-05-25  2:06     ` [PATCH v3 1/2] pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed Taylor Blau via GitGitGadget
@ 2025-05-25  2:06     ` Lidong Yan via GitGitGadget
  2025-05-25  2:43     ` [PATCH v4 0/2] pack-bitmap: fix memory leak if load_bitmap failed Lidong Yan via GitGitGadget
  2 siblings, 0 replies; 45+ messages in thread
From: Lidong Yan via GitGitGadget @ 2025-05-25  2:06 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Taylor Blau, Lidong Yan, Lidong Yan

From: Lidong Yan <502024330056@smail.nju.edu.cn>

This patch add test_bitmap_list_commits_offset() in patch-bitmap.c,
a new test helper command `test-tool bitmap list-commits-offset`,
and a `load corrupt bitmap` test case in t5310.

The `load corrupt bitmap` test case intentionally corrupt the
"xor_offset" field of the first entry. And the newly added helper
can help to find position of "xor_offset" in bitmap file.

Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
---
 pack-bitmap.c           | 73 +++++++++++++++++++++++++++++++++++++----
 pack-bitmap.h           |  1 +
 t/helper/test-bitmap.c  |  8 +++++
 t/t5310-pack-bitmaps.sh | 27 +++++++++++++++
 4 files changed, 103 insertions(+), 6 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index fd19c2255163..39c1c1bc4ce1 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -34,6 +34,11 @@ struct stored_bitmap {
 	int flags;
 };
 
+struct stored_bitmap_tag_pos {
+	struct stored_bitmap stored;
+	size_t map_pos;
+};
+
 /*
  * The active bitmap index for a repository. By design, repositories only have
  * a single bitmap index available (the index for the biggest packfile in
@@ -148,6 +153,7 @@ static int existing_bitmaps_hits_nr;
 static int existing_bitmaps_misses_nr;
 static int roots_with_bitmaps_nr;
 static int roots_without_bitmaps_nr;
+static int tag_pos_on_bitmap;
 
 static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st)
 {
@@ -314,13 +320,18 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
 					  struct ewah_bitmap *root,
 					  const struct object_id *oid,
 					  struct stored_bitmap *xor_with,
-					  int flags)
+					  int flags, size_t map_pos)
 {
 	struct stored_bitmap *stored;
+	struct stored_bitmap_tag_pos *tagged;
 	khiter_t hash_pos;
 	int ret;
 
-	stored = xmalloc(sizeof(struct stored_bitmap));
+	tagged = xmalloc(tag_pos_on_bitmap ? sizeof(struct stored_bitmap_tag_pos) :
+					     sizeof(struct stored_bitmap));
+	stored = &tagged->stored;
+	if (tag_pos_on_bitmap)
+		tagged->map_pos = map_pos;
 	stored->root = root;
 	stored->xor = xor_with;
 	stored->flags = flags;
@@ -376,10 +387,12 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 		struct stored_bitmap *xor_bitmap = NULL;
 		uint32_t commit_idx_pos;
 		struct object_id oid;
+		size_t entry_map_pos;
 
 		if (index->map_size - index->map_pos < 6)
 			return error(_("corrupt ewah bitmap: truncated header for entry %d"), i);
 
+		entry_map_pos = index->map_pos;
 		commit_idx_pos = read_be32(index->map, &index->map_pos);
 		xor_offset = read_u8(index->map, &index->map_pos);
 		flags = read_u8(index->map, &index->map_pos);
@@ -402,8 +415,9 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 		if (!bitmap)
 			return -1;
 
-		recent_bitmaps[i % MAX_XOR_OFFSET] = store_bitmap(
-			index, bitmap, &oid, xor_bitmap, flags);
+		recent_bitmaps[i % MAX_XOR_OFFSET] =
+			store_bitmap(index, bitmap, &oid, xor_bitmap, flags,
+				     entry_map_pos);
 	}
 
 	return 0;
@@ -869,6 +883,7 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
 	int xor_flags;
 	khiter_t hash_pos;
 	struct bitmap_lookup_table_xor_item *xor_item;
+	size_t entry_map_pos;
 
 	if (is_corrupt)
 		return NULL;
@@ -928,6 +943,7 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
 			goto corrupt;
 		}
 
+		entry_map_pos = bitmap_git->map_pos;
 		bitmap_git->map_pos += sizeof(uint32_t) + sizeof(uint8_t);
 		xor_flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
 		bitmap = read_bitmap_1(bitmap_git);
@@ -935,7 +951,8 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
 		if (!bitmap)
 			goto corrupt;
 
-		xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_item->oid, xor_bitmap, xor_flags);
+		xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_item->oid,
+					  xor_bitmap, xor_flags, entry_map_pos);
 		xor_items_nr--;
 	}
 
@@ -969,6 +986,7 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
 	 * Instead, we can skip ahead and immediately read the flags and
 	 * ewah bitmap.
 	 */
+	entry_map_pos = bitmap_git->map_pos;
 	bitmap_git->map_pos += sizeof(uint32_t) + sizeof(uint8_t);
 	flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
 	bitmap = read_bitmap_1(bitmap_git);
@@ -976,7 +994,8 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
 	if (!bitmap)
 		goto corrupt;
 
-	return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags);
+	return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags,
+			    entry_map_pos);
 
 corrupt:
 	free(xor_items);
@@ -2856,6 +2875,48 @@ int test_bitmap_commits(struct repository *r)
 	return 0;
 }
 
+int test_bitmap_commits_offset(struct repository *r)
+{
+	struct object_id oid;
+	struct stored_bitmap_tag_pos *tagged;
+	struct bitmap_index *bitmap_git;
+	size_t commit_idx_pos_map_pos, xor_offset_map_pos, flag_map_pos,
+		ewah_bitmap_map_pos;
+
+	tag_pos_on_bitmap = 1;
+	bitmap_git = prepare_bitmap_git(r);
+	if (!bitmap_git)
+		die(_("failed to load bitmap indexes"));
+
+	/*
+	 * As this function is only used to print bitmap selected
+	 * commits, we don't have to read the commit table.
+	 */
+	if (bitmap_git->table_lookup) {
+		if (load_bitmap_entries_v1(bitmap_git) < 0)
+			die(_("failed to load bitmap indexes"));
+	}
+
+	kh_foreach (bitmap_git->bitmaps, oid, tagged, {
+		commit_idx_pos_map_pos = tagged->map_pos;
+		xor_offset_map_pos = tagged->map_pos + sizeof(uint32_t);
+		flag_map_pos = xor_offset_map_pos + sizeof(uint8_t);
+		ewah_bitmap_map_pos = flag_map_pos + sizeof(uint8_t);
+
+		printf_ln("%s %"PRIuMAX" %"PRIuMAX" %"PRIuMAX" %"PRIuMAX,
+			  oid_to_hex(&oid),
+			  (uintmax_t)commit_idx_pos_map_pos,
+			  (uintmax_t)xor_offset_map_pos,
+			  (uintmax_t)flag_map_pos,
+			  (uintmax_t)ewah_bitmap_map_pos);
+	})
+		;
+
+	free_bitmap_index(bitmap_git);
+
+	return 0;
+}
+
 int test_bitmap_hashes(struct repository *r)
 {
 	struct bitmap_index *bitmap_git = prepare_bitmap_git(r);
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 382d39499af2..96880ba3d72d 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -81,6 +81,7 @@ void traverse_bitmap_commit_list(struct bitmap_index *,
 				 show_reachable_fn show_reachable);
 void test_bitmap_walk(struct rev_info *revs);
 int test_bitmap_commits(struct repository *r);
+int test_bitmap_commits_offset(struct repository *r);
 int test_bitmap_hashes(struct repository *r);
 int test_bitmap_pseudo_merges(struct repository *r);
 int test_bitmap_pseudo_merge_commits(struct repository *r, uint32_t n);
diff --git a/t/helper/test-bitmap.c b/t/helper/test-bitmap.c
index 3f23f2107268..65a1ab29192b 100644
--- a/t/helper/test-bitmap.c
+++ b/t/helper/test-bitmap.c
@@ -10,6 +10,11 @@ static int bitmap_list_commits(void)
 	return test_bitmap_commits(the_repository);
 }
 
+static int bitmap_list_commits_offset(void)
+{
+	return test_bitmap_commits_offset(the_repository);
+}
+
 static int bitmap_dump_hashes(void)
 {
 	return test_bitmap_hashes(the_repository);
@@ -36,6 +41,8 @@ int cmd__bitmap(int argc, const char **argv)
 
 	if (argc == 2 && !strcmp(argv[1], "list-commits"))
 		return bitmap_list_commits();
+	if (argc == 2 && !strcmp(argv[1], "list-commits-offset"))
+		return bitmap_list_commits_offset();
 	if (argc == 2 && !strcmp(argv[1], "dump-hashes"))
 		return bitmap_dump_hashes();
 	if (argc == 2 && !strcmp(argv[1], "dump-pseudo-merges"))
@@ -46,6 +53,7 @@ int cmd__bitmap(int argc, const char **argv)
 		return bitmap_dump_pseudo_merge_objects(atoi(argv[2]));
 
 	usage("\ttest-tool bitmap list-commits\n"
+	      "\ttest-tool bitmap list-commits-offset\n"
 	      "\ttest-tool bitmap dump-hashes\n"
 	      "\ttest-tool bitmap dump-pseudo-merges\n"
 	      "\ttest-tool bitmap dump-pseudo-merge-commits <n>\n"
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index a62b463eaf09..ef4c5fbaae83 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -486,6 +486,33 @@ test_bitmap_cases () {
 			grep "ignoring extra bitmap" trace2.txt
 		)
 	'
+
+	test_expect_success 'load corrupt bitmap' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
+
+			test_commit base &&
+
+			git repack -adb &&
+			bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
+			chmod +w $bitmap &&
+
+			read oid commit_off xor_off flag_off ewah_off <<-EOF &&
+				$(test-tool bitmap list-commits-offset | head -n 1)
+			EOF
+			printf '\161' |
+				dd of=$bitmap count=1 bs=1 conv=notrunc seek=$xor_off &&
+
+
+			git rev-list --count HEAD > expect &&
+			git rev-list --use-bitmap-index --count HEAD > actual &&
+			test_cmp expect actual
+		)
+	'
 }
 
 test_bitmap_cases
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 0/2] pack-bitmap: fix memory leak if load_bitmap failed
  2025-05-25  2:06   ` [PATCH v3 0/2] pack-bitmap: fix memory leak if load_bitmap_entries_v1 failed Lidong Yan via GitGitGadget
  2025-05-25  2:06     ` [PATCH v3 1/2] pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed Taylor Blau via GitGitGadget
  2025-05-25  2:06     ` [PATCH v3 2/2] pack-bitmap: add load corrupt bitmap test Lidong Yan via GitGitGadget
@ 2025-05-25  2:43     ` Lidong Yan via GitGitGadget
  2025-05-25  2:43       ` [PATCH v4 1/2] pack-bitmap: fix memory leak if load_bitmap() failed Taylor Blau via GitGitGadget
                         ` (2 more replies)
  2 siblings, 3 replies; 45+ messages in thread
From: Lidong Yan via GitGitGadget @ 2025-05-25  2:43 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Taylor Blau, Lidong Yan

This patch prevents pack-bitmap.c:load_bitmap() from nulling
bitmap_git->bitmap when loading failed thus eliminates memory leak. This
patch also add a test case in t5310 which use clang leak sanitizer to detect
whether leak happens when loading failed.

Lidong Yan (1):
  pack-bitmap: add load corrupt bitmap test

Taylor Blau (1):
  pack-bitmap: fix memory leak if load_bitmap() failed

 pack-bitmap.c           | 94 +++++++++++++++++++++++++++++++----------
 pack-bitmap.h           |  1 +
 t/helper/test-bitmap.c  |  8 ++++
 t/t5310-pack-bitmaps.sh | 27 ++++++++++++
 4 files changed, 107 insertions(+), 23 deletions(-)


base-commit: 845c48a16a7f7b2c44d8cb137b16a4a1f0140229
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1962%2Fbrandb97%2Ffix-pack-bitmap-leak-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1962/brandb97/fix-pack-bitmap-leak-v4
Pull-Request: https://github.com/git/git/pull/1962

Range-diff vs v3:

 1:  cf87aad7c99 ! 1:  b6b3a83a224 pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed
     @@ Metadata
      Author: Taylor Blau <me@ttaylorr.com>
      
       ## Commit message ##
     -    pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed
     +    pack-bitmap: fix memory leak if load_bitmap() failed
      
          After going through the "failed" label, load_bitmap() will return -1,
          and its caller (either prepare_bitmap_walk() or prepare_bitmap_git())
 2:  f5371d7daa9 = 2:  7876d9a9014 pack-bitmap: add load corrupt bitmap test

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v4 1/2] pack-bitmap: fix memory leak if load_bitmap() failed
  2025-05-25  2:43     ` [PATCH v4 0/2] pack-bitmap: fix memory leak if load_bitmap failed Lidong Yan via GitGitGadget
@ 2025-05-25  2:43       ` Taylor Blau via GitGitGadget
  2025-05-29 15:33         ` Junio C Hamano
  2025-05-25  2:43       ` [PATCH v4 2/2] pack-bitmap: add load corrupt bitmap test Lidong Yan via GitGitGadget
  2025-06-03  3:14       ` [PATCH v5 0/3] pack-bitmap: fix memory leak if load_bitmap failed Lidong Yan via GitGitGadget
  2 siblings, 1 reply; 45+ messages in thread
From: Taylor Blau via GitGitGadget @ 2025-05-25  2:43 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Taylor Blau, Lidong Yan, Taylor Blau

From: Taylor Blau <me@ttaylorr.com>

After going through the "failed" label, load_bitmap() will return -1,
and its caller (either prepare_bitmap_walk() or prepare_bitmap_git())
will then call free_bitmap_index().

That function would have done:

    struct stored_bitmap *sb;
    kh_foreach_value(b->bitmaps, sb {
      ewah_pool_free(sb->root);
      free(sb);
    });

, but won't since load_bitmap() already called kh_destroy_oid_map() and
NULL'd the "bitmaps" pointer from within its "failed" label.

So I think if you got part of the way through loading bitmap entries and
then failed, you would leak all of the previous entries that you were
able to load successfully.

The solution is to remove the error handling code in load_bitmap(), because
its caller will always call free_bitmap_index() in case of an error.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 21 ++++-----------------
 1 file changed, 4 insertions(+), 17 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index ac6d62b980c5..fd19c2255163 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -630,41 +630,28 @@ static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git,
 	bitmap_git->ext_index.positions = kh_init_oid_pos();
 
 	if (load_reverse_index(r, bitmap_git))
-		goto failed;
+		return -1;
 
 	if (!(bitmap_git->commits = read_bitmap_1(bitmap_git)) ||
 		!(bitmap_git->trees = read_bitmap_1(bitmap_git)) ||
 		!(bitmap_git->blobs = read_bitmap_1(bitmap_git)) ||
 		!(bitmap_git->tags = read_bitmap_1(bitmap_git)))
-		goto failed;
+		return -1;
 
 	if (!bitmap_git->table_lookup && load_bitmap_entries_v1(bitmap_git) < 0)
-		goto failed;
+		return -1;
 
 	if (bitmap_git->base) {
 		if (!bitmap_is_midx(bitmap_git))
 			BUG("non-MIDX bitmap has non-NULL base bitmap index");
 		if (load_bitmap(r, bitmap_git->base, 1) < 0)
-			goto failed;
+			return -1;
 	}
 
 	if (!recursing)
 		load_all_type_bitmaps(bitmap_git);
 
 	return 0;
-
-failed:
-	munmap(bitmap_git->map, bitmap_git->map_size);
-	bitmap_git->map = NULL;
-	bitmap_git->map_size = 0;
-
-	kh_destroy_oid_map(bitmap_git->bitmaps);
-	bitmap_git->bitmaps = NULL;
-
-	kh_destroy_oid_pos(bitmap_git->ext_index.positions);
-	bitmap_git->ext_index.positions = NULL;
-
-	return -1;
 }
 
 static int open_pack_bitmap(struct repository *r,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 2/2] pack-bitmap: add load corrupt bitmap test
  2025-05-25  2:43     ` [PATCH v4 0/2] pack-bitmap: fix memory leak if load_bitmap failed Lidong Yan via GitGitGadget
  2025-05-25  2:43       ` [PATCH v4 1/2] pack-bitmap: fix memory leak if load_bitmap() failed Taylor Blau via GitGitGadget
@ 2025-05-25  2:43       ` Lidong Yan via GitGitGadget
  2025-05-29 15:45         ` Junio C Hamano
  2025-05-29 21:20         ` Taylor Blau
  2025-06-03  3:14       ` [PATCH v5 0/3] pack-bitmap: fix memory leak if load_bitmap failed Lidong Yan via GitGitGadget
  2 siblings, 2 replies; 45+ messages in thread
From: Lidong Yan via GitGitGadget @ 2025-05-25  2:43 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Taylor Blau, Lidong Yan, Lidong Yan

From: Lidong Yan <502024330056@smail.nju.edu.cn>

This patch add test_bitmap_list_commits_offset() in patch-bitmap.c,
a new test helper command `test-tool bitmap list-commits-offset`,
and a `load corrupt bitmap` test case in t5310.

The `load corrupt bitmap` test case intentionally corrupt the
"xor_offset" field of the first entry. And the newly added helper
can help to find position of "xor_offset" in bitmap file.

Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
---
 pack-bitmap.c           | 73 +++++++++++++++++++++++++++++++++++++----
 pack-bitmap.h           |  1 +
 t/helper/test-bitmap.c  |  8 +++++
 t/t5310-pack-bitmaps.sh | 27 +++++++++++++++
 4 files changed, 103 insertions(+), 6 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index fd19c2255163..39c1c1bc4ce1 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -34,6 +34,11 @@ struct stored_bitmap {
 	int flags;
 };
 
+struct stored_bitmap_tag_pos {
+	struct stored_bitmap stored;
+	size_t map_pos;
+};
+
 /*
  * The active bitmap index for a repository. By design, repositories only have
  * a single bitmap index available (the index for the biggest packfile in
@@ -148,6 +153,7 @@ static int existing_bitmaps_hits_nr;
 static int existing_bitmaps_misses_nr;
 static int roots_with_bitmaps_nr;
 static int roots_without_bitmaps_nr;
+static int tag_pos_on_bitmap;
 
 static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st)
 {
@@ -314,13 +320,18 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
 					  struct ewah_bitmap *root,
 					  const struct object_id *oid,
 					  struct stored_bitmap *xor_with,
-					  int flags)
+					  int flags, size_t map_pos)
 {
 	struct stored_bitmap *stored;
+	struct stored_bitmap_tag_pos *tagged;
 	khiter_t hash_pos;
 	int ret;
 
-	stored = xmalloc(sizeof(struct stored_bitmap));
+	tagged = xmalloc(tag_pos_on_bitmap ? sizeof(struct stored_bitmap_tag_pos) :
+					     sizeof(struct stored_bitmap));
+	stored = &tagged->stored;
+	if (tag_pos_on_bitmap)
+		tagged->map_pos = map_pos;
 	stored->root = root;
 	stored->xor = xor_with;
 	stored->flags = flags;
@@ -376,10 +387,12 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 		struct stored_bitmap *xor_bitmap = NULL;
 		uint32_t commit_idx_pos;
 		struct object_id oid;
+		size_t entry_map_pos;
 
 		if (index->map_size - index->map_pos < 6)
 			return error(_("corrupt ewah bitmap: truncated header for entry %d"), i);
 
+		entry_map_pos = index->map_pos;
 		commit_idx_pos = read_be32(index->map, &index->map_pos);
 		xor_offset = read_u8(index->map, &index->map_pos);
 		flags = read_u8(index->map, &index->map_pos);
@@ -402,8 +415,9 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 		if (!bitmap)
 			return -1;
 
-		recent_bitmaps[i % MAX_XOR_OFFSET] = store_bitmap(
-			index, bitmap, &oid, xor_bitmap, flags);
+		recent_bitmaps[i % MAX_XOR_OFFSET] =
+			store_bitmap(index, bitmap, &oid, xor_bitmap, flags,
+				     entry_map_pos);
 	}
 
 	return 0;
@@ -869,6 +883,7 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
 	int xor_flags;
 	khiter_t hash_pos;
 	struct bitmap_lookup_table_xor_item *xor_item;
+	size_t entry_map_pos;
 
 	if (is_corrupt)
 		return NULL;
@@ -928,6 +943,7 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
 			goto corrupt;
 		}
 
+		entry_map_pos = bitmap_git->map_pos;
 		bitmap_git->map_pos += sizeof(uint32_t) + sizeof(uint8_t);
 		xor_flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
 		bitmap = read_bitmap_1(bitmap_git);
@@ -935,7 +951,8 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
 		if (!bitmap)
 			goto corrupt;
 
-		xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_item->oid, xor_bitmap, xor_flags);
+		xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_item->oid,
+					  xor_bitmap, xor_flags, entry_map_pos);
 		xor_items_nr--;
 	}
 
@@ -969,6 +986,7 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
 	 * Instead, we can skip ahead and immediately read the flags and
 	 * ewah bitmap.
 	 */
+	entry_map_pos = bitmap_git->map_pos;
 	bitmap_git->map_pos += sizeof(uint32_t) + sizeof(uint8_t);
 	flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
 	bitmap = read_bitmap_1(bitmap_git);
@@ -976,7 +994,8 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
 	if (!bitmap)
 		goto corrupt;
 
-	return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags);
+	return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags,
+			    entry_map_pos);
 
 corrupt:
 	free(xor_items);
@@ -2856,6 +2875,48 @@ int test_bitmap_commits(struct repository *r)
 	return 0;
 }
 
+int test_bitmap_commits_offset(struct repository *r)
+{
+	struct object_id oid;
+	struct stored_bitmap_tag_pos *tagged;
+	struct bitmap_index *bitmap_git;
+	size_t commit_idx_pos_map_pos, xor_offset_map_pos, flag_map_pos,
+		ewah_bitmap_map_pos;
+
+	tag_pos_on_bitmap = 1;
+	bitmap_git = prepare_bitmap_git(r);
+	if (!bitmap_git)
+		die(_("failed to load bitmap indexes"));
+
+	/*
+	 * As this function is only used to print bitmap selected
+	 * commits, we don't have to read the commit table.
+	 */
+	if (bitmap_git->table_lookup) {
+		if (load_bitmap_entries_v1(bitmap_git) < 0)
+			die(_("failed to load bitmap indexes"));
+	}
+
+	kh_foreach (bitmap_git->bitmaps, oid, tagged, {
+		commit_idx_pos_map_pos = tagged->map_pos;
+		xor_offset_map_pos = tagged->map_pos + sizeof(uint32_t);
+		flag_map_pos = xor_offset_map_pos + sizeof(uint8_t);
+		ewah_bitmap_map_pos = flag_map_pos + sizeof(uint8_t);
+
+		printf_ln("%s %"PRIuMAX" %"PRIuMAX" %"PRIuMAX" %"PRIuMAX,
+			  oid_to_hex(&oid),
+			  (uintmax_t)commit_idx_pos_map_pos,
+			  (uintmax_t)xor_offset_map_pos,
+			  (uintmax_t)flag_map_pos,
+			  (uintmax_t)ewah_bitmap_map_pos);
+	})
+		;
+
+	free_bitmap_index(bitmap_git);
+
+	return 0;
+}
+
 int test_bitmap_hashes(struct repository *r)
 {
 	struct bitmap_index *bitmap_git = prepare_bitmap_git(r);
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 382d39499af2..96880ba3d72d 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -81,6 +81,7 @@ void traverse_bitmap_commit_list(struct bitmap_index *,
 				 show_reachable_fn show_reachable);
 void test_bitmap_walk(struct rev_info *revs);
 int test_bitmap_commits(struct repository *r);
+int test_bitmap_commits_offset(struct repository *r);
 int test_bitmap_hashes(struct repository *r);
 int test_bitmap_pseudo_merges(struct repository *r);
 int test_bitmap_pseudo_merge_commits(struct repository *r, uint32_t n);
diff --git a/t/helper/test-bitmap.c b/t/helper/test-bitmap.c
index 3f23f2107268..65a1ab29192b 100644
--- a/t/helper/test-bitmap.c
+++ b/t/helper/test-bitmap.c
@@ -10,6 +10,11 @@ static int bitmap_list_commits(void)
 	return test_bitmap_commits(the_repository);
 }
 
+static int bitmap_list_commits_offset(void)
+{
+	return test_bitmap_commits_offset(the_repository);
+}
+
 static int bitmap_dump_hashes(void)
 {
 	return test_bitmap_hashes(the_repository);
@@ -36,6 +41,8 @@ int cmd__bitmap(int argc, const char **argv)
 
 	if (argc == 2 && !strcmp(argv[1], "list-commits"))
 		return bitmap_list_commits();
+	if (argc == 2 && !strcmp(argv[1], "list-commits-offset"))
+		return bitmap_list_commits_offset();
 	if (argc == 2 && !strcmp(argv[1], "dump-hashes"))
 		return bitmap_dump_hashes();
 	if (argc == 2 && !strcmp(argv[1], "dump-pseudo-merges"))
@@ -46,6 +53,7 @@ int cmd__bitmap(int argc, const char **argv)
 		return bitmap_dump_pseudo_merge_objects(atoi(argv[2]));
 
 	usage("\ttest-tool bitmap list-commits\n"
+	      "\ttest-tool bitmap list-commits-offset\n"
 	      "\ttest-tool bitmap dump-hashes\n"
 	      "\ttest-tool bitmap dump-pseudo-merges\n"
 	      "\ttest-tool bitmap dump-pseudo-merge-commits <n>\n"
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index a62b463eaf09..ef4c5fbaae83 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -486,6 +486,33 @@ test_bitmap_cases () {
 			grep "ignoring extra bitmap" trace2.txt
 		)
 	'
+
+	test_expect_success 'load corrupt bitmap' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
+
+			test_commit base &&
+
+			git repack -adb &&
+			bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
+			chmod +w $bitmap &&
+
+			read oid commit_off xor_off flag_off ewah_off <<-EOF &&
+				$(test-tool bitmap list-commits-offset | head -n 1)
+			EOF
+			printf '\161' |
+				dd of=$bitmap count=1 bs=1 conv=notrunc seek=$xor_off &&
+
+
+			git rev-list --count HEAD > expect &&
+			git rev-list --use-bitmap-index --count HEAD > actual &&
+			test_cmp expect actual
+		)
+	'
 }
 
 test_bitmap_cases
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 1/2] pack-bitmap: fix memory leak if load_bitmap() failed
  2025-05-25  2:43       ` [PATCH v4 1/2] pack-bitmap: fix memory leak if load_bitmap() failed Taylor Blau via GitGitGadget
@ 2025-05-29 15:33         ` Junio C Hamano
  2025-05-29 19:57           ` Taylor Blau
  2025-05-30  3:50           ` lidongyan
  0 siblings, 2 replies; 45+ messages in thread
From: Junio C Hamano @ 2025-05-29 15:33 UTC (permalink / raw)
  To: Taylor Blau via GitGitGadget; +Cc: git, Jeff King, Taylor Blau, Lidong Yan

"Taylor Blau via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Taylor Blau <me@ttaylorr.com>
>
> After going through the "failed" label, load_bitmap() will return -1,
> and its caller (either prepare_bitmap_walk() or prepare_bitmap_git())
> will then call free_bitmap_index().
> ...
> The solution is to remove the error handling code in load_bitmap(), because
> its caller will always call free_bitmap_index() in case of an error.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---

As this is Lidong relaying <aCOFqYdnPp1Lne4Y@nand.local> that Taylor
sent to the list, shouldn't Lidong's sign-off be after Taylor's?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 2/2] pack-bitmap: add load corrupt bitmap test
  2025-05-25  2:43       ` [PATCH v4 2/2] pack-bitmap: add load corrupt bitmap test Lidong Yan via GitGitGadget
@ 2025-05-29 15:45         ` Junio C Hamano
  2025-05-29 21:21           ` Taylor Blau
  2025-05-30  3:53           ` lidongyan
  2025-05-29 21:20         ` Taylor Blau
  1 sibling, 2 replies; 45+ messages in thread
From: Junio C Hamano @ 2025-05-29 15:45 UTC (permalink / raw)
  To: Lidong Yan via GitGitGadget; +Cc: git, Jeff King, Taylor Blau, Lidong Yan

"Lidong Yan via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Lidong Yan <502024330056@smail.nju.edu.cn>
>
> This patch add test_bitmap_list_commits_offset() in patch-bitmap.c,

"pack-bitmap.c"?

> a new test helper command `test-tool bitmap list-commits-offset`,
> and a `load corrupt bitmap` test case in t5310.
>
> The `load corrupt bitmap` test case intentionally corrupt the
> "xor_offset" field of the first entry. And the newly added helper
> can help to find position of "xor_offset" in bitmap file.

[the structure of a log message]

The usual way to compose a log message of this project is to

 - Give an observation on how the current system works in the
   present tense (so no need to say "Currently X is Y", or
   "Previously X was Y" to describe the state before your change;
   just "X is Y" is enough), and discuss what you perceive as a
   problem in it.

 - Propose a solution (optional---often, problem description
   trivially leads to an obvious solution in reader's minds).

 - Give commands to somebody editing the codebase to "make it so".

in this order.

The proposed log message lacks the motivation and only talks about
what the patch does.  We add a test-only code in a file, intermixed
with production code.  Let's explain why it is the best arrangement.

> Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
> ---
>  pack-bitmap.c           | 73 +++++++++++++++++++++++++++++++++++++----
>  pack-bitmap.h           |  1 +
>  t/helper/test-bitmap.c  |  8 +++++
>  t/t5310-pack-bitmaps.sh | 27 +++++++++++++++
>  4 files changed, 103 insertions(+), 6 deletions(-)

After the second round of the series, no review comments seem to
have been sent to the list.  Is everybody happy with the latest
iteration?

Thanks.

> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index fd19c2255163..39c1c1bc4ce1 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -34,6 +34,11 @@ struct stored_bitmap {
>  	int flags;
>  };
>  
> +struct stored_bitmap_tag_pos {
> +	struct stored_bitmap stored;
> +	size_t map_pos;
> +};
> +
>  /*
>   * The active bitmap index for a repository. By design, repositories only have
>   * a single bitmap index available (the index for the biggest packfile in
> @@ -148,6 +153,7 @@ static int existing_bitmaps_hits_nr;
>  static int existing_bitmaps_misses_nr;
>  static int roots_with_bitmaps_nr;
>  static int roots_without_bitmaps_nr;
> +static int tag_pos_on_bitmap;
>  
>  static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st)
>  {
> @@ -314,13 +320,18 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
>  					  struct ewah_bitmap *root,
>  					  const struct object_id *oid,
>  					  struct stored_bitmap *xor_with,
> -					  int flags)
> +					  int flags, size_t map_pos)
>  {
>  	struct stored_bitmap *stored;
> +	struct stored_bitmap_tag_pos *tagged;
>  	khiter_t hash_pos;
>  	int ret;
>  
> -	stored = xmalloc(sizeof(struct stored_bitmap));
> +	tagged = xmalloc(tag_pos_on_bitmap ? sizeof(struct stored_bitmap_tag_pos) :
> +					     sizeof(struct stored_bitmap));
> +	stored = &tagged->stored;
> +	if (tag_pos_on_bitmap)
> +		tagged->map_pos = map_pos;
>  	stored->root = root;
>  	stored->xor = xor_with;
>  	stored->flags = flags;
> @@ -376,10 +387,12 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
>  		struct stored_bitmap *xor_bitmap = NULL;
>  		uint32_t commit_idx_pos;
>  		struct object_id oid;
> +		size_t entry_map_pos;
>  
>  		if (index->map_size - index->map_pos < 6)
>  			return error(_("corrupt ewah bitmap: truncated header for entry %d"), i);
>  
> +		entry_map_pos = index->map_pos;
>  		commit_idx_pos = read_be32(index->map, &index->map_pos);
>  		xor_offset = read_u8(index->map, &index->map_pos);
>  		flags = read_u8(index->map, &index->map_pos);
> @@ -402,8 +415,9 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
>  		if (!bitmap)
>  			return -1;
>  
> -		recent_bitmaps[i % MAX_XOR_OFFSET] = store_bitmap(
> -			index, bitmap, &oid, xor_bitmap, flags);
> +		recent_bitmaps[i % MAX_XOR_OFFSET] =
> +			store_bitmap(index, bitmap, &oid, xor_bitmap, flags,
> +				     entry_map_pos);
>  	}
>  
>  	return 0;
> @@ -869,6 +883,7 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
>  	int xor_flags;
>  	khiter_t hash_pos;
>  	struct bitmap_lookup_table_xor_item *xor_item;
> +	size_t entry_map_pos;
>  
>  	if (is_corrupt)
>  		return NULL;
> @@ -928,6 +943,7 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
>  			goto corrupt;
>  		}
>  
> +		entry_map_pos = bitmap_git->map_pos;
>  		bitmap_git->map_pos += sizeof(uint32_t) + sizeof(uint8_t);
>  		xor_flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
>  		bitmap = read_bitmap_1(bitmap_git);
> @@ -935,7 +951,8 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
>  		if (!bitmap)
>  			goto corrupt;
>  
> -		xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_item->oid, xor_bitmap, xor_flags);
> +		xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_item->oid,
> +					  xor_bitmap, xor_flags, entry_map_pos);
>  		xor_items_nr--;
>  	}
>  
> @@ -969,6 +986,7 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
>  	 * Instead, we can skip ahead and immediately read the flags and
>  	 * ewah bitmap.
>  	 */
> +	entry_map_pos = bitmap_git->map_pos;
>  	bitmap_git->map_pos += sizeof(uint32_t) + sizeof(uint8_t);
>  	flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
>  	bitmap = read_bitmap_1(bitmap_git);
> @@ -976,7 +994,8 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
>  	if (!bitmap)
>  		goto corrupt;
>  
> -	return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags);
> +	return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags,
> +			    entry_map_pos);
>  
>  corrupt:
>  	free(xor_items);
> @@ -2856,6 +2875,48 @@ int test_bitmap_commits(struct repository *r)
>  	return 0;
>  }
>  
> +int test_bitmap_commits_offset(struct repository *r)
> +{
> +	struct object_id oid;
> +	struct stored_bitmap_tag_pos *tagged;
> +	struct bitmap_index *bitmap_git;
> +	size_t commit_idx_pos_map_pos, xor_offset_map_pos, flag_map_pos,
> +		ewah_bitmap_map_pos;
> +
> +	tag_pos_on_bitmap = 1;
> +	bitmap_git = prepare_bitmap_git(r);
> +	if (!bitmap_git)
> +		die(_("failed to load bitmap indexes"));
> +
> +	/*
> +	 * As this function is only used to print bitmap selected
> +	 * commits, we don't have to read the commit table.
> +	 */
> +	if (bitmap_git->table_lookup) {
> +		if (load_bitmap_entries_v1(bitmap_git) < 0)
> +			die(_("failed to load bitmap indexes"));
> +	}
> +
> +	kh_foreach (bitmap_git->bitmaps, oid, tagged, {
> +		commit_idx_pos_map_pos = tagged->map_pos;
> +		xor_offset_map_pos = tagged->map_pos + sizeof(uint32_t);
> +		flag_map_pos = xor_offset_map_pos + sizeof(uint8_t);
> +		ewah_bitmap_map_pos = flag_map_pos + sizeof(uint8_t);
> +
> +		printf_ln("%s %"PRIuMAX" %"PRIuMAX" %"PRIuMAX" %"PRIuMAX,
> +			  oid_to_hex(&oid),
> +			  (uintmax_t)commit_idx_pos_map_pos,
> +			  (uintmax_t)xor_offset_map_pos,
> +			  (uintmax_t)flag_map_pos,
> +			  (uintmax_t)ewah_bitmap_map_pos);
> +	})
> +		;
> +
> +	free_bitmap_index(bitmap_git);
> +
> +	return 0;
> +}
> +
>  int test_bitmap_hashes(struct repository *r)
>  {
>  	struct bitmap_index *bitmap_git = prepare_bitmap_git(r);
> diff --git a/pack-bitmap.h b/pack-bitmap.h
> index 382d39499af2..96880ba3d72d 100644
> --- a/pack-bitmap.h
> +++ b/pack-bitmap.h
> @@ -81,6 +81,7 @@ void traverse_bitmap_commit_list(struct bitmap_index *,
>  				 show_reachable_fn show_reachable);
>  void test_bitmap_walk(struct rev_info *revs);
>  int test_bitmap_commits(struct repository *r);
> +int test_bitmap_commits_offset(struct repository *r);
>  int test_bitmap_hashes(struct repository *r);
>  int test_bitmap_pseudo_merges(struct repository *r);
>  int test_bitmap_pseudo_merge_commits(struct repository *r, uint32_t n);
> diff --git a/t/helper/test-bitmap.c b/t/helper/test-bitmap.c
> index 3f23f2107268..65a1ab29192b 100644
> --- a/t/helper/test-bitmap.c
> +++ b/t/helper/test-bitmap.c
> @@ -10,6 +10,11 @@ static int bitmap_list_commits(void)
>  	return test_bitmap_commits(the_repository);
>  }
>  
> +static int bitmap_list_commits_offset(void)
> +{
> +	return test_bitmap_commits_offset(the_repository);
> +}
> +
>  static int bitmap_dump_hashes(void)
>  {
>  	return test_bitmap_hashes(the_repository);
> @@ -36,6 +41,8 @@ int cmd__bitmap(int argc, const char **argv)
>  
>  	if (argc == 2 && !strcmp(argv[1], "list-commits"))
>  		return bitmap_list_commits();
> +	if (argc == 2 && !strcmp(argv[1], "list-commits-offset"))
> +		return bitmap_list_commits_offset();
>  	if (argc == 2 && !strcmp(argv[1], "dump-hashes"))
>  		return bitmap_dump_hashes();
>  	if (argc == 2 && !strcmp(argv[1], "dump-pseudo-merges"))
> @@ -46,6 +53,7 @@ int cmd__bitmap(int argc, const char **argv)
>  		return bitmap_dump_pseudo_merge_objects(atoi(argv[2]));
>  
>  	usage("\ttest-tool bitmap list-commits\n"
> +	      "\ttest-tool bitmap list-commits-offset\n"
>  	      "\ttest-tool bitmap dump-hashes\n"
>  	      "\ttest-tool bitmap dump-pseudo-merges\n"
>  	      "\ttest-tool bitmap dump-pseudo-merge-commits <n>\n"
> diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
> index a62b463eaf09..ef4c5fbaae83 100755
> --- a/t/t5310-pack-bitmaps.sh
> +++ b/t/t5310-pack-bitmaps.sh
> @@ -486,6 +486,33 @@ test_bitmap_cases () {
>  			grep "ignoring extra bitmap" trace2.txt
>  		)
>  	'
> +
> +	test_expect_success 'load corrupt bitmap' '
> +		rm -fr repo &&
> +		git init repo &&
> +		test_when_finished "rm -fr repo" &&
> +		(
> +			cd repo &&
> +			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
> +
> +			test_commit base &&
> +
> +			git repack -adb &&
> +			bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
> +			chmod +w $bitmap &&
> +
> +			read oid commit_off xor_off flag_off ewah_off <<-EOF &&
> +				$(test-tool bitmap list-commits-offset | head -n 1)
> +			EOF
> +			printf '\161' |
> +				dd of=$bitmap count=1 bs=1 conv=notrunc seek=$xor_off &&
> +
> +
> +			git rev-list --count HEAD > expect &&
> +			git rev-list --use-bitmap-index --count HEAD > actual &&
> +			test_cmp expect actual
> +		)
> +	'
>  }
>  
>  test_bitmap_cases

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 1/2] pack-bitmap: fix memory leak if load_bitmap() failed
  2025-05-29 15:33         ` Junio C Hamano
@ 2025-05-29 19:57           ` Taylor Blau
  2025-05-29 22:04             ` Junio C Hamano
  2025-05-30  3:50           ` lidongyan
  1 sibling, 1 reply; 45+ messages in thread
From: Taylor Blau @ 2025-05-29 19:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau via GitGitGadget, git, Jeff King, Lidong Yan

On Thu, May 29, 2025 at 08:33:29AM -0700, Junio C Hamano wrote:
> "Taylor Blau via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Taylor Blau <me@ttaylorr.com>
> >
> > After going through the "failed" label, load_bitmap() will return -1,
> > and its caller (either prepare_bitmap_walk() or prepare_bitmap_git())
> > will then call free_bitmap_index().
> > ...
> > The solution is to remove the error handling code in load_bitmap(), because
> > its caller will always call free_bitmap_index() in case of an error.
> >
> > Signed-off-by: Taylor Blau <me@ttaylorr.com>
> > ---
>
> As this is Lidong relaying <aCOFqYdnPp1Lne4Y@nand.local> that Taylor
> sent to the list, shouldn't Lidong's sign-off be after Taylor's?

I've always assumed the answer here was "yes", but I don't know that our
documentation suggests the same.

In c11c3b5681 (Documentation/SubmittingPatches: What's Acked-by and
Tested-by?, 2008-02-03) you added:

    Notice that you can place your own Signed-off-by: line when
    forwarding somebody else's patch [...]. Indeed you are encouraged
    to do so.  [...]

and that text survives into the current version of SubmittingPatches.
So I think that while our documentation encourages people to add their
own S-o-b to others' patches sent on their behalf, it doesn't
explicitly require it.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 2/2] pack-bitmap: add load corrupt bitmap test
  2025-05-25  2:43       ` [PATCH v4 2/2] pack-bitmap: add load corrupt bitmap test Lidong Yan via GitGitGadget
  2025-05-29 15:45         ` Junio C Hamano
@ 2025-05-29 21:20         ` Taylor Blau
  2025-05-30  4:03           ` lidongyan
  1 sibling, 1 reply; 45+ messages in thread
From: Taylor Blau @ 2025-05-29 21:20 UTC (permalink / raw)
  To: Lidong Yan via GitGitGadget; +Cc: git, Jeff King, Lidong Yan

On Sun, May 25, 2025 at 02:43:03AM +0000, Lidong Yan via GitGitGadget wrote:
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index fd19c2255163..39c1c1bc4ce1 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -34,6 +34,11 @@ struct stored_bitmap {
>  	int flags;
>  };
>
> +struct stored_bitmap_tag_pos {
> +	struct stored_bitmap stored;
> +	size_t map_pos;
> +};
> +

Hmm. I was expecting you to add a new member to the stored_bitmap
structure, not a new structure entirely. Let's read on...

>  /*
>   * The active bitmap index for a repository. By design, repositories only have
>   * a single bitmap index available (the index for the biggest packfile in
> @@ -148,6 +153,7 @@ static int existing_bitmaps_hits_nr;
>  static int existing_bitmaps_misses_nr;
>  static int roots_with_bitmaps_nr;
>  static int roots_without_bitmaps_nr;
> +static int tag_pos_on_bitmap;

Why are we only sometimes tagging bitmaps with their position?

>
>  static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st)
>  {
> @@ -314,13 +320,18 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
>  					  struct ewah_bitmap *root,
>  					  const struct object_id *oid,
>  					  struct stored_bitmap *xor_with,
> -					  int flags)
> +					  int flags, size_t map_pos)
>  {
>  	struct stored_bitmap *stored;
> +	struct stored_bitmap_tag_pos *tagged;

OK.

>  	khiter_t hash_pos;
>  	int ret;
>
> -	stored = xmalloc(sizeof(struct stored_bitmap));
> +	tagged = xmalloc(tag_pos_on_bitmap ? sizeof(struct stored_bitmap_tag_pos) :
> +					     sizeof(struct stored_bitmap));
> +	stored = &tagged->stored;
> +	if (tag_pos_on_bitmap)
> +		tagged->map_pos = map_pos;

I am quite worried about this portion of the diff.

Here you allocate memory for "tagged" which is a stored_bitmap_tag_pos.
But the amount of bytes you allocate depends on whether the global
variable tag_pos_on_bitmap is set or not. If it isn't, then you don't
allocate enough memory here to hold an entire stored_bitmap_tag_pos
structure.

I think within this function you're OK, since you only write into that
field when tag_pos_on_bitmap is set. But this seems like a recipe for
disaster if you ever try to read or write into the tagged->map_pos field
when tag_pos_on_bitmap *isn't* set.

This happens to work because of where the pointer to the stored_bitmap
structure lives within the stored_bitmap_tag_pos structure. But this
seems *extremely* fragile to only save 4 bytes of allocated memory per
bitmap. Even on a repository with ~1,000 bitmaps (which is rare from my
experience), you're only saving ~3.91 KiB.

I would expect something more like the following (based on top of your
patch here):

--- 8< ---
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 39c1c1bc4c..4c3829dba9 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -31,12 +31,8 @@ struct stored_bitmap {
 	struct object_id oid;
 	struct ewah_bitmap *root;
 	struct stored_bitmap *xor;
-	int flags;
-};
-
-struct stored_bitmap_tag_pos {
-	struct stored_bitmap stored;
 	size_t map_pos;
+	int flags;
 };

 /*
@@ -153,7 +149,6 @@ static int existing_bitmaps_hits_nr;
 static int existing_bitmaps_misses_nr;
 static int roots_with_bitmaps_nr;
 static int roots_without_bitmaps_nr;
-static int tag_pos_on_bitmap;

 static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st)
 {
@@ -323,17 +318,13 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
 					  int flags, size_t map_pos)
 {
 	struct stored_bitmap *stored;
-	struct stored_bitmap_tag_pos *tagged;
 	khiter_t hash_pos;
 	int ret;

-	tagged = xmalloc(tag_pos_on_bitmap ? sizeof(struct stored_bitmap_tag_pos) :
-					     sizeof(struct stored_bitmap));
-	stored = &tagged->stored;
-	if (tag_pos_on_bitmap)
-		tagged->map_pos = map_pos;
+	stored = xmalloc(sizeof(struct stored_bitmap));
 	stored->root = root;
 	stored->xor = xor_with;
+	stored->map_pos = map_pos;
 	stored->flags = flags;
 	oidcpy(&stored->oid, oid);

@@ -2878,12 +2869,11 @@ int test_bitmap_commits(struct repository *r)
 int test_bitmap_commits_offset(struct repository *r)
 {
 	struct object_id oid;
-	struct stored_bitmap_tag_pos *tagged;
+	struct stored_bitmap *bitmap;
 	struct bitmap_index *bitmap_git;
 	size_t commit_idx_pos_map_pos, xor_offset_map_pos, flag_map_pos,
 		ewah_bitmap_map_pos;

-	tag_pos_on_bitmap = 1;
 	bitmap_git = prepare_bitmap_git(r);
 	if (!bitmap_git)
 		die(_("failed to load bitmap indexes"));
@@ -2897,9 +2887,9 @@ int test_bitmap_commits_offset(struct repository *r)
 			die(_("failed to load bitmap indexes"));
 	}

-	kh_foreach (bitmap_git->bitmaps, oid, tagged, {
-		commit_idx_pos_map_pos = tagged->map_pos;
-		xor_offset_map_pos = tagged->map_pos + sizeof(uint32_t);
+	kh_foreach (bitmap_git->bitmaps, oid, bitmap, {
+		commit_idx_pos_map_pos = bitmap->map_pos;
+		xor_offset_map_pos = bitmap->map_pos + sizeof(uint32_t);
 		flag_map_pos = xor_offset_map_pos + sizeof(uint8_t);
 		ewah_bitmap_map_pos = flag_map_pos + sizeof(uint8_t);
--- >8 ---

>  	stored->root = root;
>  	stored->xor = xor_with;
>  	stored->flags = flags;
> @@ -376,10 +387,12 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
>  		struct stored_bitmap *xor_bitmap = NULL;
>  		uint32_t commit_idx_pos;
>  		struct object_id oid;
> +		size_t entry_map_pos;
>
>  		if (index->map_size - index->map_pos < 6)
>  			return error(_("corrupt ewah bitmap: truncated header for entry %d"), i);
>
> +		entry_map_pos = index->map_pos;

Good. This is important since the read_be32() and read_u8() calls below
both adjust the value of index->map_pos past the beginning of the bitmap.

> @@ -869,6 +883,7 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
>  	int xor_flags;
>  	khiter_t hash_pos;
>  	struct bitmap_lookup_table_xor_item *xor_item;
> +	size_t entry_map_pos;
>
>  	if (is_corrupt)
>  		return NULL;
> @@ -928,6 +943,7 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
>  			goto corrupt;
>  		}
>
> +		entry_map_pos = bitmap_git->map_pos;

Same here.

> @@ -969,6 +986,7 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
>  	 * Instead, we can skip ahead and immediately read the flags and
>  	 * ewah bitmap.
>  	 */
> +	entry_map_pos = bitmap_git->map_pos;

And here.

> +int test_bitmap_commits_offset(struct repository *r)
> +{
> +	struct object_id oid;
> +	struct stored_bitmap_tag_pos *tagged;
> +	struct bitmap_index *bitmap_git;
> +	size_t commit_idx_pos_map_pos, xor_offset_map_pos, flag_map_pos,
> +		ewah_bitmap_map_pos;
> +
> +	tag_pos_on_bitmap = 1;
> +	bitmap_git = prepare_bitmap_git(r);
> +	if (!bitmap_git)
> +		die(_("failed to load bitmap indexes"));
> +

If we either forgot to set this variable here or did so after calling
prepare_bitmap_git(), then we wouldn't allocate enough memory to store
the map_pos field in the stored_bitmap_tag_pos structure. When we then
would try and read that field below, we'd read garbage heap data outside
of our structure.

> +	/*
> +	 * As this function is only used to print bitmap selected
> +	 * commits, we don't have to read the commit table.
> +	 */
> +	if (bitmap_git->table_lookup) {
> +		if (load_bitmap_entries_v1(bitmap_git) < 0)
> +			die(_("failed to load bitmap indexes"));
> +	}

This comment suggests that we can avoid reading the commit table
altogether. Indeed, calling load_bitmap_entries_v1() here does that,
since it is not called when loading a bitmap that has a lookup table.

So I think the behavior here is correct, but the comment is misleading.
I suspect that the confusion would be resolved by instead writing:

    /*
     * Since this function needs to know the position of each individual
     * bitmap, bypass the commit lookup table (if one exists) by forcing
     * the bitmap to eagerly load its entries.
     */

I think this is copy-and-paste from 28cd730680 (pack-bitmap: prepare to
read lookup table extension, 2022-08-14) via the 'test_bitmap_commits()'
function immediately above this one. I think both would benefit from
some clean-up, since this comment is equally misleading in that
function.

For your purposes, I would either:

 - remove or (preferably) reword the comment in your new function,
   leaving the one in test_bitmap_commits() as-is, or

 - reword the comment in test_bitmap_commits() to be more like the one
   above, via a preparatory commit, and then introduce the new function
   using the same wording.

Between the two, I think the latter is preferable.

As an aside, I think that for bitmaps that do have a commit lookup
table, you could go slightly faster here by walking over that portion of
the *.bitmap file, since it directly encodes the information you're
interested in here. But I would avoid doing that, since it too seems
brittle and I would like to avoid having two separate spots that each
implement reading the commit table format.

> +	kh_foreach (bitmap_git->bitmaps, oid, tagged, {
> +		commit_idx_pos_map_pos = tagged->map_pos;

OK, and here's where we pull out the actual position of the selected
commit's bitmap.

> +		xor_offset_map_pos = tagged->map_pos + sizeof(uint32_t);
> +		flag_map_pos = xor_offset_map_pos + sizeof(uint8_t);
> +		ewah_bitmap_map_pos = flag_map_pos + sizeof(uint8_t);
> +
> +		printf_ln("%s %"PRIuMAX" %"PRIuMAX" %"PRIuMAX" %"PRIuMAX,
> +			  oid_to_hex(&oid),
> +			  (uintmax_t)commit_idx_pos_map_pos,
> +			  (uintmax_t)xor_offset_map_pos,
> +			  (uintmax_t)flag_map_pos,
> +			  (uintmax_t)ewah_bitmap_map_pos);

Hmm. We print more information here than just the map_pos. This is
brittle if the on-disk format changes (e.g., to store the XOR offsets in
some other part of the bitmap). But hopefully future updates to the
bitmap format will come with updates to this function as well ;-).

It feels somewhat unsatisfying to print output like:

    $COMMIT_OID <map_pos> <map_pos+4> <map_pos+5> <map_pos+6>

, but I think it makes sense here for a couple of reasons:

 - If we just print the <map_pos>, then the test is responsible for
   knowing the distance between that and the XOR offset, which extends
   the brittleness to the test code

 - likewise, if we print out just the map_pos and the positions of the
   XOR offset, it feels strange to omit the others.

> diff --git a/pack-bitmap.h b/pack-bitmap.h
> index 382d39499af2..96880ba3d72d 100644
> --- a/pack-bitmap.h
> +++ b/pack-bitmap.h
> @@ -81,6 +81,7 @@ void traverse_bitmap_commit_list(struct bitmap_index *,
>  				 show_reachable_fn show_reachable);
>  void test_bitmap_walk(struct rev_info *revs);
>  int test_bitmap_commits(struct repository *r);
> +int test_bitmap_commits_offset(struct repository *r);
>  int test_bitmap_hashes(struct repository *r);
>  int test_bitmap_pseudo_merges(struct repository *r);
>  int test_bitmap_pseudo_merge_commits(struct repository *r, uint32_t n);
> diff --git a/t/helper/test-bitmap.c b/t/helper/test-bitmap.c
> index 3f23f2107268..65a1ab29192b 100644
> --- a/t/helper/test-bitmap.c
> +++ b/t/helper/test-bitmap.c
> @@ -10,6 +10,11 @@ static int bitmap_list_commits(void)
>  	return test_bitmap_commits(the_repository);
>  }
>
> +static int bitmap_list_commits_offset(void)
> +{
> +	return test_bitmap_commits_offset(the_repository);
> +}
> +
>  static int bitmap_dump_hashes(void)
>  {
>  	return test_bitmap_hashes(the_repository);
> @@ -36,6 +41,8 @@ int cmd__bitmap(int argc, const char **argv)
>
>  	if (argc == 2 && !strcmp(argv[1], "list-commits"))
>  		return bitmap_list_commits();
> +	if (argc == 2 && !strcmp(argv[1], "list-commits-offset"))
> +		return bitmap_list_commits_offset();

All of the scaffolding here looks good.

This new mode reads a little awkwardly to me (but may not to others, in
which case I am happy to back away from the following suggestion). Can
we either call this 'list-commits-with-offset' or 'list-commit-offsets'?
I have a vague preference towards the former since the new mode has a
string prefix matching the existing mode.

> +	test_expect_success 'load corrupt bitmap' '
> +		rm -fr repo &&
> +		git init repo &&
> +		test_when_finished "rm -fr repo" &&
> +		(
> +			cd repo &&
> +			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
> +
> +			test_commit base &&
> +
> +			git repack -adb &&
> +			bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
> +			chmod +w $bitmap &&
> +
> +			read oid commit_off xor_off flag_off ewah_off <<-EOF &&
> +				$(test-tool bitmap list-commits-offset | head -n 1)

We avoid putting 'git' or 'test-tool' on the left-hand side of a pipe,
since we want to avoid squelching any errors / segfaults from our code.

How about (assuming the rename above):

    test-tool bitmap list-commit-offsets >offsets &&
    xor_off="$(head -n1 offsets | awk '{print $3}')" &&
    ...

?

> +			printf '\161' |
> +				dd of=$bitmap count=1 bs=1 conv=notrunc seek=$xor_off &&
> +
> +
> +			git rev-list --count HEAD > expect &&
> +			git rev-list --use-bitmap-index --count HEAD > actual &&

Using --count can mask failures in the bitmap code since it only guards
you against getting the wrong number of objects, but doesn't guard you
against getting the right amount of objects in a permuted order. I think
we'd want just 'git rev-list --objects' here, but note that you have to
use '--no-object-names' on the non-bitmap side, since rev-list does not
print out paths when using bitmaps[^1].

How about:

    git rev-list --objects --no-object-names HEAD >expect.raw &&
    git rev-list --objects --use-bitmap-index --no-object-names HEAD \
      >actual.raw &&

    sort expect.raw >expect &&
    sort actual.raw >actual &&

    test_cmp expect actual

(Note that you also have to sort the output here since rev-list does not
output objects in a meaningful order when using bitmaps.)

Thanks,
Taylor

[^1]: Not that it matters here since we don't expect to load the bitmap
  anyway, but it's worth doing regardless.

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 2/2] pack-bitmap: add load corrupt bitmap test
  2025-05-29 15:45         ` Junio C Hamano
@ 2025-05-29 21:21           ` Taylor Blau
  2025-05-30  3:53           ` lidongyan
  1 sibling, 0 replies; 45+ messages in thread
From: Taylor Blau @ 2025-05-29 21:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Lidong Yan via GitGitGadget, git, Jeff King, Lidong Yan

On Thu, May 29, 2025 at 08:45:40AM -0700, Junio C Hamano wrote:
> > Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
> > ---
> >  pack-bitmap.c           | 73 +++++++++++++++++++++++++++++++++++++----
> >  pack-bitmap.h           |  1 +
> >  t/helper/test-bitmap.c  |  8 +++++
> >  t/t5310-pack-bitmaps.sh | 27 +++++++++++++++
> >  4 files changed, 103 insertions(+), 6 deletions(-)
>
> After the second round of the series, no review comments seem to
> have been sent to the list.  Is everybody happy with the latest
> iteration?

Sorry for missing this one from earlier this week. I left a few comments
on the latest round. I think we are getting there, but I do not feel
comfortable merging down the series just yet.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 1/2] pack-bitmap: fix memory leak if load_bitmap() failed
  2025-05-29 19:57           ` Taylor Blau
@ 2025-05-29 22:04             ` Junio C Hamano
  0 siblings, 0 replies; 45+ messages in thread
From: Junio C Hamano @ 2025-05-29 22:04 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Taylor Blau via GitGitGadget, git, Jeff King, Lidong Yan

Taylor Blau <me@ttaylorr.com> writes:

> In c11c3b5681 (Documentation/SubmittingPatches: What's Acked-by and
> Tested-by?, 2008-02-03) you added:
>
>     Notice that you can place your own Signed-off-by: line when
>     forwarding somebody else's patch [...]. Indeed you are encouraged
>     to do so.  [...]
>
> and that text survives into the current version of SubmittingPatches.
> So I think that while our documentation encourages people to add their
> own S-o-b to others' patches sent on their behalf, it doesn't
> explicitly require it.

It would not hurt that much if I pretended that I ignored what
Lidong forwarded and picked up your patch directly from the list,
only because what was forwarded was public.  If it were a privately
shared patch, sign-off is required, so we'd need to tighten the
language in the SubmittingPatches document, I think.

Thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 1/2] pack-bitmap: fix memory leak if load_bitmap() failed
  2025-05-29 15:33         ` Junio C Hamano
  2025-05-29 19:57           ` Taylor Blau
@ 2025-05-30  3:50           ` lidongyan
  1 sibling, 0 replies; 45+ messages in thread
From: lidongyan @ 2025-05-30  3:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau via GitGitGadget, git, Jeff King, Taylor Blau

Got it, I will add my sign-off.

> 2025年5月29日 23:33,Junio C Hamano <gitster@pobox.com> 写道:
> 
> "Taylor Blau via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Taylor Blau <me@ttaylorr.com>
>> 
>> After going through the "failed" label, load_bitmap() will return -1,
>> and its caller (either prepare_bitmap_walk() or prepare_bitmap_git())
>> will then call free_bitmap_index().
>> ...
>> The solution is to remove the error handling code in load_bitmap(), because
>> its caller will always call free_bitmap_index() in case of an error.
>> 
>> Signed-off-by: Taylor Blau <me@ttaylorr.com>
>> ---
> 
> As this is Lidong relaying <aCOFqYdnPp1Lne4Y@nand.local> that Taylor
> sent to the list, shouldn't Lidong's sign-off be after Taylor's?
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 2/2] pack-bitmap: add load corrupt bitmap test
  2025-05-29 15:45         ` Junio C Hamano
  2025-05-29 21:21           ` Taylor Blau
@ 2025-05-30  3:53           ` lidongyan
  1 sibling, 0 replies; 45+ messages in thread
From: lidongyan @ 2025-05-30  3:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Lidong Yan via GitGitGadget, git, Jeff King, Taylor Blau



> 2025年5月29日 23:45,Junio C Hamano <gitster@pobox.com> 写道:
> 
> "Lidong Yan via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Lidong Yan <502024330056@smail.nju.edu.cn>
>> 
>> This patch add test_bitmap_list_commits_offset() in patch-bitmap.c,
> 
> "pack-bitmap.c"?
> 
>> a new test helper command `test-tool bitmap list-commits-offset`,
>> and a `load corrupt bitmap` test case in t5310.
>> 
>> The `load corrupt bitmap` test case intentionally corrupt the
>> "xor_offset" field of the first entry. And the newly added helper
>> can help to find position of "xor_offset" in bitmap file.
> 
> [the structure of a log message]
> 
> The usual way to compose a log message of this project is to
> 
> - Give an observation on how the current system works in the
>   present tense (so no need to say "Currently X is Y", or
>   "Previously X was Y" to describe the state before your change;
>   just "X is Y" is enough), and discuss what you perceive as a
>   problem in it.
> 
> - Propose a solution (optional---often, problem description
>   trivially leads to an obvious solution in reader's minds).
> 
> - Give commands to somebody editing the codebase to "make it so".
> 
> in this order.
> 
> The proposed log message lacks the motivation and only talks about
> what the patch does.  We add a test-only code in a file, intermixed
> with production code.  Let's explain why it is the best arrangement.

I see. I am trying to validate these patch series and test further patch won’t leak
memory under the condition that bitmap is corrupted, Anyway I will pay attention
to motivation in my following log messages.

> 
>> Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
>> ---
>> pack-bitmap.c           | 73 +++++++++++++++++++++++++++++++++++++----
>> pack-bitmap.h           |  1 +
>> t/helper/test-bitmap.c  |  8 +++++
>> t/t5310-pack-bitmaps.sh | 27 +++++++++++++++
>> 4 files changed, 103 insertions(+), 6 deletions(-)
> 
> After the second round of the series, no review comments seem to
> have been sent to the list.  Is everybody happy with the latest
> iteration?
> 
> Thanks.
> 
>> diff --git a/pack-bitmap.c b/pack-bitmap.c
>> index fd19c2255163..39c1c1bc4ce1 100644
>> --- a/pack-bitmap.c
>> +++ b/pack-bitmap.c
>> @@ -34,6 +34,11 @@ struct stored_bitmap {
>> int flags;
>> };
>> 
>> +struct stored_bitmap_tag_pos {
>> + struct stored_bitmap stored;
>> + size_t map_pos;
>> +};
>> +
>> /*
>>  * The active bitmap index for a repository. By design, repositories only have
>>  * a single bitmap index available (the index for the biggest packfile in
>> @@ -148,6 +153,7 @@ static int existing_bitmaps_hits_nr;
>> static int existing_bitmaps_misses_nr;
>> static int roots_with_bitmaps_nr;
>> static int roots_without_bitmaps_nr;
>> +static int tag_pos_on_bitmap;
>> 
>> static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st)
>> {
>> @@ -314,13 +320,18 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
>>  struct ewah_bitmap *root,
>>  const struct object_id *oid,
>>  struct stored_bitmap *xor_with,
>> -  int flags)
>> +  int flags, size_t map_pos)
>> {
>> struct stored_bitmap *stored;
>> + struct stored_bitmap_tag_pos *tagged;
>> khiter_t hash_pos;
>> int ret;
>> 
>> - stored = xmalloc(sizeof(struct stored_bitmap));
>> + tagged = xmalloc(tag_pos_on_bitmap ? sizeof(struct stored_bitmap_tag_pos) :
>> +     sizeof(struct stored_bitmap));
>> + stored = &tagged->stored;
>> + if (tag_pos_on_bitmap)
>> + tagged->map_pos = map_pos;
>> stored->root = root;
>> stored->xor = xor_with;
>> stored->flags = flags;
>> @@ -376,10 +387,12 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
>> struct stored_bitmap *xor_bitmap = NULL;
>> uint32_t commit_idx_pos;
>> struct object_id oid;
>> + size_t entry_map_pos;
>> 
>> if (index->map_size - index->map_pos < 6)
>> return error(_("corrupt ewah bitmap: truncated header for entry %d"), i);
>> 
>> + entry_map_pos = index->map_pos;
>> commit_idx_pos = read_be32(index->map, &index->map_pos);
>> xor_offset = read_u8(index->map, &index->map_pos);
>> flags = read_u8(index->map, &index->map_pos);
>> @@ -402,8 +415,9 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
>> if (!bitmap)
>> return -1;
>> 
>> - recent_bitmaps[i % MAX_XOR_OFFSET] = store_bitmap(
>> - index, bitmap, &oid, xor_bitmap, flags);
>> + recent_bitmaps[i % MAX_XOR_OFFSET] =
>> + store_bitmap(index, bitmap, &oid, xor_bitmap, flags,
>> +     entry_map_pos);
>> }
>> 
>> return 0;
>> @@ -869,6 +883,7 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
>> int xor_flags;
>> khiter_t hash_pos;
>> struct bitmap_lookup_table_xor_item *xor_item;
>> + size_t entry_map_pos;
>> 
>> if (is_corrupt)
>> return NULL;
>> @@ -928,6 +943,7 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
>> goto corrupt;
>> }
>> 
>> + entry_map_pos = bitmap_git->map_pos;
>> bitmap_git->map_pos += sizeof(uint32_t) + sizeof(uint8_t);
>> xor_flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
>> bitmap = read_bitmap_1(bitmap_git);
>> @@ -935,7 +951,8 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
>> if (!bitmap)
>> goto corrupt;
>> 
>> - xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_item->oid, xor_bitmap, xor_flags);
>> + xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_item->oid,
>> +  xor_bitmap, xor_flags, entry_map_pos);
>> xor_items_nr--;
>> }
>> 
>> @@ -969,6 +986,7 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
>> * Instead, we can skip ahead and immediately read the flags and
>> * ewah bitmap.
>> */
>> + entry_map_pos = bitmap_git->map_pos;
>> bitmap_git->map_pos += sizeof(uint32_t) + sizeof(uint8_t);
>> flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
>> bitmap = read_bitmap_1(bitmap_git);
>> @@ -976,7 +994,8 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
>> if (!bitmap)
>> goto corrupt;
>> 
>> - return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags);
>> + return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags,
>> +    entry_map_pos);
>> 
>> corrupt:
>> free(xor_items);
>> @@ -2856,6 +2875,48 @@ int test_bitmap_commits(struct repository *r)
>> return 0;
>> }
>> 
>> +int test_bitmap_commits_offset(struct repository *r)
>> +{
>> + struct object_id oid;
>> + struct stored_bitmap_tag_pos *tagged;
>> + struct bitmap_index *bitmap_git;
>> + size_t commit_idx_pos_map_pos, xor_offset_map_pos, flag_map_pos,
>> + ewah_bitmap_map_pos;
>> +
>> + tag_pos_on_bitmap = 1;
>> + bitmap_git = prepare_bitmap_git(r);
>> + if (!bitmap_git)
>> + die(_("failed to load bitmap indexes"));
>> +
>> + /*
>> + * As this function is only used to print bitmap selected
>> + * commits, we don't have to read the commit table.
>> + */
>> + if (bitmap_git->table_lookup) {
>> + if (load_bitmap_entries_v1(bitmap_git) < 0)
>> + die(_("failed to load bitmap indexes"));
>> + }
>> +
>> + kh_foreach (bitmap_git->bitmaps, oid, tagged, {
>> + commit_idx_pos_map_pos = tagged->map_pos;
>> + xor_offset_map_pos = tagged->map_pos + sizeof(uint32_t);
>> + flag_map_pos = xor_offset_map_pos + sizeof(uint8_t);
>> + ewah_bitmap_map_pos = flag_map_pos + sizeof(uint8_t);
>> +
>> + printf_ln("%s %"PRIuMAX" %"PRIuMAX" %"PRIuMAX" %"PRIuMAX,
>> +  oid_to_hex(&oid),
>> +  (uintmax_t)commit_idx_pos_map_pos,
>> +  (uintmax_t)xor_offset_map_pos,
>> +  (uintmax_t)flag_map_pos,
>> +  (uintmax_t)ewah_bitmap_map_pos);
>> + })
>> + ;
>> +
>> + free_bitmap_index(bitmap_git);
>> +
>> + return 0;
>> +}
>> +
>> int test_bitmap_hashes(struct repository *r)
>> {
>> struct bitmap_index *bitmap_git = prepare_bitmap_git(r);
>> diff --git a/pack-bitmap.h b/pack-bitmap.h
>> index 382d39499af2..96880ba3d72d 100644
>> --- a/pack-bitmap.h
>> +++ b/pack-bitmap.h
>> @@ -81,6 +81,7 @@ void traverse_bitmap_commit_list(struct bitmap_index *,
>> show_reachable_fn show_reachable);
>> void test_bitmap_walk(struct rev_info *revs);
>> int test_bitmap_commits(struct repository *r);
>> +int test_bitmap_commits_offset(struct repository *r);
>> int test_bitmap_hashes(struct repository *r);
>> int test_bitmap_pseudo_merges(struct repository *r);
>> int test_bitmap_pseudo_merge_commits(struct repository *r, uint32_t n);
>> diff --git a/t/helper/test-bitmap.c b/t/helper/test-bitmap.c
>> index 3f23f2107268..65a1ab29192b 100644
>> --- a/t/helper/test-bitmap.c
>> +++ b/t/helper/test-bitmap.c
>> @@ -10,6 +10,11 @@ static int bitmap_list_commits(void)
>> return test_bitmap_commits(the_repository);
>> }
>> 
>> +static int bitmap_list_commits_offset(void)
>> +{
>> + return test_bitmap_commits_offset(the_repository);
>> +}
>> +
>> static int bitmap_dump_hashes(void)
>> {
>> return test_bitmap_hashes(the_repository);
>> @@ -36,6 +41,8 @@ int cmd__bitmap(int argc, const char **argv)
>> 
>> if (argc == 2 && !strcmp(argv[1], "list-commits"))
>> return bitmap_list_commits();
>> + if (argc == 2 && !strcmp(argv[1], "list-commits-offset"))
>> + return bitmap_list_commits_offset();
>> if (argc == 2 && !strcmp(argv[1], "dump-hashes"))
>> return bitmap_dump_hashes();
>> if (argc == 2 && !strcmp(argv[1], "dump-pseudo-merges"))
>> @@ -46,6 +53,7 @@ int cmd__bitmap(int argc, const char **argv)
>> return bitmap_dump_pseudo_merge_objects(atoi(argv[2]));
>> 
>> usage("\ttest-tool bitmap list-commits\n"
>> +      "\ttest-tool bitmap list-commits-offset\n"
>>      "\ttest-tool bitmap dump-hashes\n"
>>      "\ttest-tool bitmap dump-pseudo-merges\n"
>>      "\ttest-tool bitmap dump-pseudo-merge-commits <n>\n"
>> diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
>> index a62b463eaf09..ef4c5fbaae83 100755
>> --- a/t/t5310-pack-bitmaps.sh
>> +++ b/t/t5310-pack-bitmaps.sh
>> @@ -486,6 +486,33 @@ test_bitmap_cases () {
>> grep "ignoring extra bitmap" trace2.txt
>> )
>> '
>> +
>> + test_expect_success 'load corrupt bitmap' '
>> + rm -fr repo &&
>> + git init repo &&
>> + test_when_finished "rm -fr repo" &&
>> + (
>> + cd repo &&
>> + git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
>> +
>> + test_commit base &&
>> +
>> + git repack -adb &&
>> + bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
>> + chmod +w $bitmap &&
>> +
>> + read oid commit_off xor_off flag_off ewah_off <<-EOF &&
>> + $(test-tool bitmap list-commits-offset | head -n 1)
>> + EOF
>> + printf '\161' |
>> + dd of=$bitmap count=1 bs=1 conv=notrunc seek=$xor_off &&
>> +
>> +
>> + git rev-list --count HEAD > expect &&
>> + git rev-list --use-bitmap-index --count HEAD > actual &&
>> + test_cmp expect actual
>> + )
>> + '
>> }
>> 
>> test_bitmap_cases
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 2/2] pack-bitmap: add load corrupt bitmap test
  2025-05-29 21:20         ` Taylor Blau
@ 2025-05-30  4:03           ` lidongyan
  0 siblings, 0 replies; 45+ messages in thread
From: lidongyan @ 2025-05-30  4:03 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Lidong Yan via GitGitGadget, git, Jeff King

2025年5月30日 05:20,Taylor Blau <me@ttaylorr.com> 写道:
> 
> On Sun, May 25, 2025 at 02:43:03AM +0000, Lidong Yan via GitGitGadget wrote:
>> diff --git a/pack-bitmap.c b/pack-bitmap.c
>> index fd19c2255163..39c1c1bc4ce1 100644
>> --- a/pack-bitmap.c
>> +++ b/pack-bitmap.c
>> @@ -34,6 +34,11 @@ struct stored_bitmap {
>> int flags;
>> };
>> 
>> +struct stored_bitmap_tag_pos {
>> + struct stored_bitmap stored;
>> + size_t map_pos;
>> +};
>> +
> 
> Hmm. I was expecting you to add a new member to the stored_bitmap
> structure, not a new structure entirely. Let's read on...
> 
>> /*
>>  * The active bitmap index for a repository. By design, repositories only have
>>  * a single bitmap index available (the index for the biggest packfile in
>> @@ -148,6 +153,7 @@ static int existing_bitmaps_hits_nr;
>> static int existing_bitmaps_misses_nr;
>> static int roots_with_bitmaps_nr;
>> static int roots_without_bitmaps_nr;
>> +static int tag_pos_on_bitmap;
> 
> Why are we only sometimes tagging bitmaps with their position?
> 
>> 
>> static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st)
>> {
>> @@ -314,13 +320,18 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
>>  struct ewah_bitmap *root,
>>  const struct object_id *oid,
>>  struct stored_bitmap *xor_with,
>> -  int flags)
>> +  int flags, size_t map_pos)
>> {
>> struct stored_bitmap *stored;
>> + struct stored_bitmap_tag_pos *tagged;
> 
> OK.
> 
>> khiter_t hash_pos;
>> int ret;
>> 
>> - stored = xmalloc(sizeof(struct stored_bitmap));
>> + tagged = xmalloc(tag_pos_on_bitmap ? sizeof(struct stored_bitmap_tag_pos) :
>> +     sizeof(struct stored_bitmap));
>> + stored = &tagged->stored;
>> + if (tag_pos_on_bitmap)
>> + tagged->map_pos = map_pos;
> 
> I am quite worried about this portion of the diff.
> 
> Here you allocate memory for "tagged" which is a stored_bitmap_tag_pos.
> But the amount of bytes you allocate depends on whether the global
> variable tag_pos_on_bitmap is set or not. If it isn't, then you don't
> allocate enough memory here to hold an entire stored_bitmap_tag_pos
> structure.
> 
> I think within this function you're OK, since you only write into that
> field when tag_pos_on_bitmap is set. But this seems like a recipe for
> disaster if you ever try to read or write into the tagged->map_pos field
> when tag_pos_on_bitmap *isn't* set.
> 
> This happens to work because of where the pointer to the stored_bitmap
> structure lives within the stored_bitmap_tag_pos structure. But this
> seems *extremely* fragile to only save 4 bytes of allocated memory per
> bitmap. Even on a repository with ~1,000 bitmaps (which is rare from my
> experience), you're only saving ~3.91 KiB.

Interesting calculation indeed — I hadn't thought about the actual memory
savings that way. I agree it’s not worth the added fragility just to save ~4 KiB
in such rare cases. I’ll go ahead and remove the struct stored_bitmap_tagged_pos.

> 
> I would expect something more like the following (based on top of your
> patch here):
> 
> --- 8< ---
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 39c1c1bc4c..4c3829dba9 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -31,12 +31,8 @@ struct stored_bitmap {
> struct object_id oid;
> struct ewah_bitmap *root;
> struct stored_bitmap *xor;
> - int flags;
> -};
> -
> -struct stored_bitmap_tag_pos {
> - struct stored_bitmap stored;
> size_t map_pos;
> + int flags;
> };
> 
> /*
> @@ -153,7 +149,6 @@ static int existing_bitmaps_hits_nr;
> static int existing_bitmaps_misses_nr;
> static int roots_with_bitmaps_nr;
> static int roots_without_bitmaps_nr;
> -static int tag_pos_on_bitmap;
> 
> static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st)
> {
> @@ -323,17 +318,13 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
>  int flags, size_t map_pos)
> {
> struct stored_bitmap *stored;
> - struct stored_bitmap_tag_pos *tagged;
> khiter_t hash_pos;
> int ret;
> 
> - tagged = xmalloc(tag_pos_on_bitmap ? sizeof(struct stored_bitmap_tag_pos) :
> -     sizeof(struct stored_bitmap));
> - stored = &tagged->stored;
> - if (tag_pos_on_bitmap)
> - tagged->map_pos = map_pos;
> + stored = xmalloc(sizeof(struct stored_bitmap));
> stored->root = root;
> stored->xor = xor_with;
> + stored->map_pos = map_pos;
> stored->flags = flags;
> oidcpy(&stored->oid, oid);
> 
> @@ -2878,12 +2869,11 @@ int test_bitmap_commits(struct repository *r)
> int test_bitmap_commits_offset(struct repository *r)
> {
> struct object_id oid;
> - struct stored_bitmap_tag_pos *tagged;
> + struct stored_bitmap *bitmap;
> struct bitmap_index *bitmap_git;
> size_t commit_idx_pos_map_pos, xor_offset_map_pos, flag_map_pos,
> ewah_bitmap_map_pos;
> 
> - tag_pos_on_bitmap = 1;
> bitmap_git = prepare_bitmap_git(r);
> if (!bitmap_git)
> die(_("failed to load bitmap indexes"));
> @@ -2897,9 +2887,9 @@ int test_bitmap_commits_offset(struct repository *r)
> die(_("failed to load bitmap indexes"));
> }
> 
> - kh_foreach (bitmap_git->bitmaps, oid, tagged, {
> - commit_idx_pos_map_pos = tagged->map_pos;
> - xor_offset_map_pos = tagged->map_pos + sizeof(uint32_t);
> + kh_foreach (bitmap_git->bitmaps, oid, bitmap, {
> + commit_idx_pos_map_pos = bitmap->map_pos;
> + xor_offset_map_pos = bitmap->map_pos + sizeof(uint32_t);
> flag_map_pos = xor_offset_map_pos + sizeof(uint8_t);
> ewah_bitmap_map_pos = flag_map_pos + sizeof(uint8_t);
> --- >8 ---
> 
>> stored->root = root;
>> stored->xor = xor_with;
>> stored->flags = flags;
>> @@ -376,10 +387,12 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
>> struct stored_bitmap *xor_bitmap = NULL;
>> uint32_t commit_idx_pos;
>> struct object_id oid;
>> + size_t entry_map_pos;
>> 
>> if (index->map_size - index->map_pos < 6)
>> return error(_("corrupt ewah bitmap: truncated header for entry %d"), i);
>> 
>> + entry_map_pos = index->map_pos;
> 
> Good. This is important since the read_be32() and read_u8() calls below
> both adjust the value of index->map_pos past the beginning of the bitmap.
> 
>> @@ -869,6 +883,7 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
>> int xor_flags;
>> khiter_t hash_pos;
>> struct bitmap_lookup_table_xor_item *xor_item;
>> + size_t entry_map_pos;
>> 
>> if (is_corrupt)
>> return NULL;
>> @@ -928,6 +943,7 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
>> goto corrupt;
>> }
>> 
>> + entry_map_pos = bitmap_git->map_pos;
> 
> Same here.
> 
>> @@ -969,6 +986,7 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
>> * Instead, we can skip ahead and immediately read the flags and
>> * ewah bitmap.
>> */
>> + entry_map_pos = bitmap_git->map_pos;
> 
> And here.
> 
>> +int test_bitmap_commits_offset(struct repository *r)
>> +{
>> + struct object_id oid;
>> + struct stored_bitmap_tag_pos *tagged;
>> + struct bitmap_index *bitmap_git;
>> + size_t commit_idx_pos_map_pos, xor_offset_map_pos, flag_map_pos,
>> + ewah_bitmap_map_pos;
>> +
>> + tag_pos_on_bitmap = 1;
>> + bitmap_git = prepare_bitmap_git(r);
>> + if (!bitmap_git)
>> + die(_("failed to load bitmap indexes"));
>> +
> 
> If we either forgot to set this variable here or did so after calling
> prepare_bitmap_git(), then we wouldn't allocate enough memory to store
> the map_pos field in the stored_bitmap_tag_pos structure. When we then
> would try and read that field below, we'd read garbage heap data outside
> of our structure.
> 
>> + /*
>> + * As this function is only used to print bitmap selected
>> + * commits, we don't have to read the commit table.
>> + */
>> + if (bitmap_git->table_lookup) {
>> + if (load_bitmap_entries_v1(bitmap_git) < 0)
>> + die(_("failed to load bitmap indexes"));
>> + }
> 
> This comment suggests that we can avoid reading the commit table
> altogether. Indeed, calling load_bitmap_entries_v1() here does that,
> since it is not called when loading a bitmap that has a lookup table.
> 
> So I think the behavior here is correct, but the comment is misleading.
> I suspect that the confusion would be resolved by instead writing:
> 
>    /*
>     * Since this function needs to know the position of each individual
>     * bitmap, bypass the commit lookup table (if one exists) by forcing
>     * the bitmap to eagerly load its entries.
>     */
> 
> I think this is copy-and-paste from 28cd730680 (pack-bitmap: prepare to
> read lookup table extension, 2022-08-14) via the 'test_bitmap_commits()'
> function immediately above this one. I think both would benefit from
> some clean-up, since this comment is equally misleading in that
> function.
> 
> For your purposes, I would either:
> 
> - remove or (preferably) reword the comment in your new function,
>   leaving the one in test_bitmap_commits() as-is, or
> 
> - reword the comment in test_bitmap_commits() to be more like the one
>   above, via a preparatory commit, and then introduce the new function
>   using the same wording.
> 
> Between the two, I think the latter is preferable.

I will add a new commit reword the comment before the last addt-test-case commit.

> 
> As an aside, I think that for bitmaps that do have a commit lookup
> table, you could go slightly faster here by walking over that portion of
> the *.bitmap file, since it directly encodes the information you're
> interested in here. But I would avoid doing that, since it too seems
> brittle and I would like to avoid having two separate spots that each
> implement reading the commit table format.
> 
>> + kh_foreach (bitmap_git->bitmaps, oid, tagged, {
>> + commit_idx_pos_map_pos = tagged->map_pos;
> 
> OK, and here's where we pull out the actual position of the selected
> commit's bitmap.
> 
>> + xor_offset_map_pos = tagged->map_pos + sizeof(uint32_t);
>> + flag_map_pos = xor_offset_map_pos + sizeof(uint8_t);
>> + ewah_bitmap_map_pos = flag_map_pos + sizeof(uint8_t);
>> +
>> + printf_ln("%s %"PRIuMAX" %"PRIuMAX" %"PRIuMAX" %"PRIuMAX,
>> +  oid_to_hex(&oid),
>> +  (uintmax_t)commit_idx_pos_map_pos,
>> +  (uintmax_t)xor_offset_map_pos,
>> +  (uintmax_t)flag_map_pos,
>> +  (uintmax_t)ewah_bitmap_map_pos);
> 
> Hmm. We print more information here than just the map_pos. This is
> brittle if the on-disk format changes (e.g., to store the XOR offsets in
> some other part of the bitmap). But hopefully future updates to the
> bitmap format will come with updates to this function as well ;-).
> 
> It feels somewhat unsatisfying to print output like:
> 
>    $COMMIT_OID <map_pos> <map_pos+4> <map_pos+5> <map_pos+6>
> 
> , but I think it makes sense here for a couple of reasons:
> 
> - If we just print the <map_pos>, then the test is responsible for
>   knowing the distance between that and the XOR offset, which extends
>   the brittleness to the test code
> 
> - likewise, if we print out just the map_pos and the positions of the
>   XOR offset, it feels strange to omit the others.
> 
>> diff --git a/pack-bitmap.h b/pack-bitmap.h
>> index 382d39499af2..96880ba3d72d 100644
>> --- a/pack-bitmap.h
>> +++ b/pack-bitmap.h
>> @@ -81,6 +81,7 @@ void traverse_bitmap_commit_list(struct bitmap_index *,
>> show_reachable_fn show_reachable);
>> void test_bitmap_walk(struct rev_info *revs);
>> int test_bitmap_commits(struct repository *r);
>> +int test_bitmap_commits_offset(struct repository *r);
>> int test_bitmap_hashes(struct repository *r);
>> int test_bitmap_pseudo_merges(struct repository *r);
>> int test_bitmap_pseudo_merge_commits(struct repository *r, uint32_t n);
>> diff --git a/t/helper/test-bitmap.c b/t/helper/test-bitmap.c
>> index 3f23f2107268..65a1ab29192b 100644
>> --- a/t/helper/test-bitmap.c
>> +++ b/t/helper/test-bitmap.c
>> @@ -10,6 +10,11 @@ static int bitmap_list_commits(void)
>> return test_bitmap_commits(the_repository);
>> }
>> 
>> +static int bitmap_list_commits_offset(void)
>> +{
>> + return test_bitmap_commits_offset(the_repository);
>> +}
>> +
>> static int bitmap_dump_hashes(void)
>> {
>> return test_bitmap_hashes(the_repository);
>> @@ -36,6 +41,8 @@ int cmd__bitmap(int argc, const char **argv)
>> 
>> if (argc == 2 && !strcmp(argv[1], "list-commits"))
>> return bitmap_list_commits();
>> + if (argc == 2 && !strcmp(argv[1], "list-commits-offset"))
>> + return bitmap_list_commits_offset();
> 
> All of the scaffolding here looks good.
> 
> This new mode reads a little awkwardly to me (but may not to others, in
> which case I am happy to back away from the following suggestion). Can
> we either call this 'list-commits-with-offset' or 'list-commit-offsets'?
> I have a vague preference towards the former since the new mode has a
> string prefix matching the existing mode.
> 
>> + test_expect_success 'load corrupt bitmap' '
>> + rm -fr repo &&
>> + git init repo &&
>> + test_when_finished "rm -fr repo" &&
>> + (
>> + cd repo &&
>> + git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
>> +
>> + test_commit base &&
>> +
>> + git repack -adb &&
>> + bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
>> + chmod +w $bitmap &&
>> +
>> + read oid commit_off xor_off flag_off ewah_off <<-EOF &&
>> + $(test-tool bitmap list-commits-offset | head -n 1)
> 
> We avoid putting 'git' or 'test-tool' on the left-hand side of a pipe,
> since we want to avoid squelching any errors / segfaults from our code.
> 
> How about (assuming the rename above):
> 
>    test-tool bitmap list-commit-offsets >offsets &&
>    xor_off="$(head -n1 offsets | awk '{print $3}')" &&
>    ...
> 
> ?
> 
>> + printf '\161' |
>> + dd of=$bitmap count=1 bs=1 conv=notrunc seek=$xor_off &&
>> +
>> +
>> + git rev-list --count HEAD > expect &&
>> + git rev-list --use-bitmap-index --count HEAD > actual &&
> 
> Using --count can mask failures in the bitmap code since it only guards
> you against getting the wrong number of objects, but doesn't guard you
> against getting the right amount of objects in a permuted order. I think
> we'd want just 'git rev-list --objects' here, but note that you have to
> use '--no-object-names' on the non-bitmap side, since rev-list does not
> print out paths when using bitmaps[^1].
> 
> How about:
> 
>    git rev-list --objects --no-object-names HEAD >expect.raw &&
>    git rev-list --objects --use-bitmap-index --no-object-names HEAD \
>> actual.raw &&
> 
>    sort expect.raw >expect &&
>    sort actual.raw >actual &&
> 
>    test_cmp expect actual
> 
> (Note that you also have to sort the output here since rev-list does not
> output objects in a meaningful order when using bitmaps.)
> 
> Thanks,
> Taylor
> 
> [^1]: Not that it matters here since we don't expect to load the bitmap
>  anyway, but it's worth doing regardless.
> 

Good advice, Thanks
Lidong

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v5 0/3] pack-bitmap: fix memory leak if load_bitmap failed
  2025-05-25  2:43     ` [PATCH v4 0/2] pack-bitmap: fix memory leak if load_bitmap failed Lidong Yan via GitGitGadget
  2025-05-25  2:43       ` [PATCH v4 1/2] pack-bitmap: fix memory leak if load_bitmap() failed Taylor Blau via GitGitGadget
  2025-05-25  2:43       ` [PATCH v4 2/2] pack-bitmap: add load corrupt bitmap test Lidong Yan via GitGitGadget
@ 2025-06-03  3:14       ` Lidong Yan via GitGitGadget
  2025-06-03  3:14         ` [PATCH v5 1/3] pack-bitmap: fix memory leak if load_bitmap() failed Taylor Blau via GitGitGadget
                           ` (4 more replies)
  2 siblings, 5 replies; 45+ messages in thread
From: Lidong Yan via GitGitGadget @ 2025-06-03  3:14 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Taylor Blau, Lidong Yan

This patch prevents pack-bitmap.c:load_bitmap() from nulling
bitmap_git->bitmap when loading failed thus eliminates memory leak. This
patch also add a test case in t5310 which use clang leak sanitizer to detect
whether leak happens when loading failed.

Lidong Yan (2):
  pack-bitmap: reword comments in test_bitmap_commits()
  pack-bitmap: add load corrupt bitmap test

Taylor Blau (1):
  pack-bitmap: fix memory leak if load_bitmap() failed

 pack-bitmap.c           | 88 ++++++++++++++++++++++++++++++-----------
 pack-bitmap.h           |  1 +
 t/helper/test-bitmap.c  |  8 ++++
 t/t5310-pack-bitmaps.sh | 30 ++++++++++++++
 4 files changed, 103 insertions(+), 24 deletions(-)


base-commit: 845c48a16a7f7b2c44d8cb137b16a4a1f0140229
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1962%2Fbrandb97%2Ffix-pack-bitmap-leak-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1962/brandb97/fix-pack-bitmap-leak-v5
Pull-Request: https://github.com/git/git/pull/1962

Range-diff vs v4:

 1:  b6b3a83a224 ! 1:  9ce2135df2a pack-bitmap: fix memory leak if load_bitmap() failed
     @@ Commit message
          its caller will always call free_bitmap_index() in case of an error.
      
          Signed-off-by: Taylor Blau <me@ttaylorr.com>
     +    Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
      
       ## pack-bitmap.c ##
      @@ pack-bitmap.c: static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git,
 -:  ----------- > 2:  a75d0a3cc7f pack-bitmap: reword comments in test_bitmap_commits()
 2:  7876d9a9014 ! 3:  05140e2171d pack-bitmap: add load corrupt bitmap test
     @@ Commit message
      
       ## pack-bitmap.c ##
      @@ pack-bitmap.c: struct stored_bitmap {
     + 	struct object_id oid;
     + 	struct ewah_bitmap *root;
     + 	struct stored_bitmap *xor;
     ++	size_t map_pos;
       	int flags;
       };
       
     -+struct stored_bitmap_tag_pos {
     -+	struct stored_bitmap stored;
     -+	size_t map_pos;
     -+};
     -+
     - /*
     -  * The active bitmap index for a repository. By design, repositories only have
     -  * a single bitmap index available (the index for the biggest packfile in
     -@@ pack-bitmap.c: static int existing_bitmaps_hits_nr;
     - static int existing_bitmaps_misses_nr;
     - static int roots_with_bitmaps_nr;
     - static int roots_without_bitmaps_nr;
     -+static int tag_pos_on_bitmap;
     - 
     - static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st)
     - {
      @@ pack-bitmap.c: static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
       					  struct ewah_bitmap *root,
       					  const struct object_id *oid,
     @@ pack-bitmap.c: static struct stored_bitmap *store_bitmap(struct bitmap_index *in
      +					  int flags, size_t map_pos)
       {
       	struct stored_bitmap *stored;
     -+	struct stored_bitmap_tag_pos *tagged;
       	khiter_t hash_pos;
       	int ret;
       
     --	stored = xmalloc(sizeof(struct stored_bitmap));
     -+	tagged = xmalloc(tag_pos_on_bitmap ? sizeof(struct stored_bitmap_tag_pos) :
     -+					     sizeof(struct stored_bitmap));
     -+	stored = &tagged->stored;
     -+	if (tag_pos_on_bitmap)
     -+		tagged->map_pos = map_pos;
     + 	stored = xmalloc(sizeof(struct stored_bitmap));
     ++	stored->map_pos = map_pos;
       	stored->root = root;
       	stored->xor = xor_with;
       	stored->flags = flags;
     @@ pack-bitmap.c: int test_bitmap_commits(struct repository *r)
       	return 0;
       }
       
     -+int test_bitmap_commits_offset(struct repository *r)
     ++int test_bitmap_commits_with_offset(struct repository *r)
      +{
      +	struct object_id oid;
     -+	struct stored_bitmap_tag_pos *tagged;
     ++	struct stored_bitmap *stored;
      +	struct bitmap_index *bitmap_git;
      +	size_t commit_idx_pos_map_pos, xor_offset_map_pos, flag_map_pos,
      +		ewah_bitmap_map_pos;
      +
     -+	tag_pos_on_bitmap = 1;
      +	bitmap_git = prepare_bitmap_git(r);
      +	if (!bitmap_git)
      +		die(_("failed to load bitmap indexes"));
      +
      +	/*
     -+	 * As this function is only used to print bitmap selected
     -+	 * commits, we don't have to read the commit table.
     ++	 * Since this function needs to know the position of each individual
     ++	 * bitmap, bypass the commit lookup table (if one exists) by forcing
     ++	 * the bitmap to eagerly load its entries.
      +	 */
      +	if (bitmap_git->table_lookup) {
      +		if (load_bitmap_entries_v1(bitmap_git) < 0)
      +			die(_("failed to load bitmap indexes"));
      +	}
      +
     -+	kh_foreach (bitmap_git->bitmaps, oid, tagged, {
     -+		commit_idx_pos_map_pos = tagged->map_pos;
     -+		xor_offset_map_pos = tagged->map_pos + sizeof(uint32_t);
     ++	kh_foreach (bitmap_git->bitmaps, oid, stored, {
     ++		commit_idx_pos_map_pos = stored->map_pos;
     ++		xor_offset_map_pos = stored->map_pos + sizeof(uint32_t);
      +		flag_map_pos = xor_offset_map_pos + sizeof(uint8_t);
      +		ewah_bitmap_map_pos = flag_map_pos + sizeof(uint8_t);
      +
     @@ pack-bitmap.h: void traverse_bitmap_commit_list(struct bitmap_index *,
       				 show_reachable_fn show_reachable);
       void test_bitmap_walk(struct rev_info *revs);
       int test_bitmap_commits(struct repository *r);
     -+int test_bitmap_commits_offset(struct repository *r);
     ++int test_bitmap_commits_with_offset(struct repository *r);
       int test_bitmap_hashes(struct repository *r);
       int test_bitmap_pseudo_merges(struct repository *r);
       int test_bitmap_pseudo_merge_commits(struct repository *r, uint32_t n);
     @@ t/helper/test-bitmap.c: static int bitmap_list_commits(void)
       	return test_bitmap_commits(the_repository);
       }
       
     -+static int bitmap_list_commits_offset(void)
     ++static int bitmap_list_commits_with_offset(void)
      +{
     -+	return test_bitmap_commits_offset(the_repository);
     ++	return test_bitmap_commits_with_offset(the_repository);
      +}
      +
       static int bitmap_dump_hashes(void)
     @@ t/helper/test-bitmap.c: int cmd__bitmap(int argc, const char **argv)
       
       	if (argc == 2 && !strcmp(argv[1], "list-commits"))
       		return bitmap_list_commits();
     -+	if (argc == 2 && !strcmp(argv[1], "list-commits-offset"))
     -+		return bitmap_list_commits_offset();
     ++	if (argc == 2 && !strcmp(argv[1], "list-commits-with-offset"))
     ++		return bitmap_list_commits_with_offset();
       	if (argc == 2 && !strcmp(argv[1], "dump-hashes"))
       		return bitmap_dump_hashes();
       	if (argc == 2 && !strcmp(argv[1], "dump-pseudo-merges"))
     @@ t/helper/test-bitmap.c: int cmd__bitmap(int argc, const char **argv)
       		return bitmap_dump_pseudo_merge_objects(atoi(argv[2]));
       
       	usage("\ttest-tool bitmap list-commits\n"
     -+	      "\ttest-tool bitmap list-commits-offset\n"
     ++	      "\ttest-tool bitmap list-commits-with-offset\n"
       	      "\ttest-tool bitmap dump-hashes\n"
       	      "\ttest-tool bitmap dump-pseudo-merges\n"
       	      "\ttest-tool bitmap dump-pseudo-merge-commits <n>\n"
     @@ t/t5310-pack-bitmaps.sh: test_bitmap_cases () {
      +			bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
      +			chmod +w $bitmap &&
      +
     -+			read oid commit_off xor_off flag_off ewah_off <<-EOF &&
     -+				$(test-tool bitmap list-commits-offset | head -n 1)
     -+			EOF
     ++			test-tool bitmap list-commits-with-offset >offsets &&
     ++			xor_off=$(head -n1 offsets | awk "{print \$3}") &&
      +			printf '\161' |
      +				dd of=$bitmap count=1 bs=1 conv=notrunc seek=$xor_off &&
      +
     ++			git rev-list --objects --no-object-names HEAD >expect.raw &&
     ++			git rev-list --objects --use-bitmap-index --no-object-names HEAD \
     ++				>actual.raw &&
     ++
     ++			sort expect.raw >expect &&
     ++			sort actual.raw >actual &&
      +
     -+			git rev-list --count HEAD > expect &&
     -+			git rev-list --use-bitmap-index --count HEAD > actual &&
     -+			test_cmp expect actual
     ++		    test_cmp expect actual
      +		)
      +	'
       }

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v5 1/3] pack-bitmap: fix memory leak if load_bitmap() failed
  2025-06-03  3:14       ` [PATCH v5 0/3] pack-bitmap: fix memory leak if load_bitmap failed Lidong Yan via GitGitGadget
@ 2025-06-03  3:14         ` Taylor Blau via GitGitGadget
  2025-06-03  3:14         ` [PATCH v5 2/3] pack-bitmap: reword comments in test_bitmap_commits() Lidong Yan via GitGitGadget
                           ` (3 subsequent siblings)
  4 siblings, 0 replies; 45+ messages in thread
From: Taylor Blau via GitGitGadget @ 2025-06-03  3:14 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Taylor Blau, Lidong Yan, Taylor Blau

From: Taylor Blau <me@ttaylorr.com>

After going through the "failed" label, load_bitmap() will return -1,
and its caller (either prepare_bitmap_walk() or prepare_bitmap_git())
will then call free_bitmap_index().

That function would have done:

    struct stored_bitmap *sb;
    kh_foreach_value(b->bitmaps, sb {
      ewah_pool_free(sb->root);
      free(sb);
    });

, but won't since load_bitmap() already called kh_destroy_oid_map() and
NULL'd the "bitmaps" pointer from within its "failed" label.

So I think if you got part of the way through loading bitmap entries and
then failed, you would leak all of the previous entries that you were
able to load successfully.

The solution is to remove the error handling code in load_bitmap(), because
its caller will always call free_bitmap_index() in case of an error.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
---
 pack-bitmap.c | 21 ++++-----------------
 1 file changed, 4 insertions(+), 17 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index ac6d62b980c5..fd19c2255163 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -630,41 +630,28 @@ static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git,
 	bitmap_git->ext_index.positions = kh_init_oid_pos();
 
 	if (load_reverse_index(r, bitmap_git))
-		goto failed;
+		return -1;
 
 	if (!(bitmap_git->commits = read_bitmap_1(bitmap_git)) ||
 		!(bitmap_git->trees = read_bitmap_1(bitmap_git)) ||
 		!(bitmap_git->blobs = read_bitmap_1(bitmap_git)) ||
 		!(bitmap_git->tags = read_bitmap_1(bitmap_git)))
-		goto failed;
+		return -1;
 
 	if (!bitmap_git->table_lookup && load_bitmap_entries_v1(bitmap_git) < 0)
-		goto failed;
+		return -1;
 
 	if (bitmap_git->base) {
 		if (!bitmap_is_midx(bitmap_git))
 			BUG("non-MIDX bitmap has non-NULL base bitmap index");
 		if (load_bitmap(r, bitmap_git->base, 1) < 0)
-			goto failed;
+			return -1;
 	}
 
 	if (!recursing)
 		load_all_type_bitmaps(bitmap_git);
 
 	return 0;
-
-failed:
-	munmap(bitmap_git->map, bitmap_git->map_size);
-	bitmap_git->map = NULL;
-	bitmap_git->map_size = 0;
-
-	kh_destroy_oid_map(bitmap_git->bitmaps);
-	bitmap_git->bitmaps = NULL;
-
-	kh_destroy_oid_pos(bitmap_git->ext_index.positions);
-	bitmap_git->ext_index.positions = NULL;
-
-	return -1;
 }
 
 static int open_pack_bitmap(struct repository *r,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 2/3] pack-bitmap: reword comments in test_bitmap_commits()
  2025-06-03  3:14       ` [PATCH v5 0/3] pack-bitmap: fix memory leak if load_bitmap failed Lidong Yan via GitGitGadget
  2025-06-03  3:14         ` [PATCH v5 1/3] pack-bitmap: fix memory leak if load_bitmap() failed Taylor Blau via GitGitGadget
@ 2025-06-03  3:14         ` Lidong Yan via GitGitGadget
  2025-06-03 22:13           ` Taylor Blau
  2025-06-03  3:14         ` [PATCH v5 3/3] pack-bitmap: add load corrupt bitmap test Lidong Yan via GitGitGadget
                           ` (2 subsequent siblings)
  4 siblings, 1 reply; 45+ messages in thread
From: Lidong Yan via GitGitGadget @ 2025-06-03  3:14 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Taylor Blau, Lidong Yan, Lidong Yan

From: Lidong Yan <502024330056@smail.nju.edu.cn>

In pack-bitmap.c:test_bitmap_commits(), it comments

    /*
     * As this function is only used to print bitmap selected
     * commits, we don't have to read the commit table.
     */

This suggests that we can avoid reading the commit table altogether.
However, this comment is misleading. The reason we load bitmap entries here
is because test_bitmap_commits() needs to print the commit IDs from the
bitmap, and we must read the bitmap entries to obtain those commit IDs.
So reword this comment.

Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
---
 pack-bitmap.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index fd19c2255163..e514c9da239b 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2839,8 +2839,9 @@ int test_bitmap_commits(struct repository *r)
 		die(_("failed to load bitmap indexes"));
 
 	/*
-	 * As this function is only used to print bitmap selected
-	 * commits, we don't have to read the commit table.
+	 * Since this function needs to print bitmap selected
+	 * commits, bypass the commit lookup table (if one exists)
+	 * by forcing the bitmap to eagerly load its entries.
 	 */
 	if (bitmap_git->table_lookup) {
 		if (load_bitmap_entries_v1(bitmap_git) < 0)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v5 3/3] pack-bitmap: add load corrupt bitmap test
  2025-06-03  3:14       ` [PATCH v5 0/3] pack-bitmap: fix memory leak if load_bitmap failed Lidong Yan via GitGitGadget
  2025-06-03  3:14         ` [PATCH v5 1/3] pack-bitmap: fix memory leak if load_bitmap() failed Taylor Blau via GitGitGadget
  2025-06-03  3:14         ` [PATCH v5 2/3] pack-bitmap: reword comments in test_bitmap_commits() Lidong Yan via GitGitGadget
@ 2025-06-03  3:14         ` Lidong Yan via GitGitGadget
  2025-06-03 22:14         ` [PATCH v5 0/3] pack-bitmap: fix memory leak if load_bitmap failed Taylor Blau
  2025-07-01  5:32         ` [PATCH v6 " Lidong Yan via GitGitGadget
  4 siblings, 0 replies; 45+ messages in thread
From: Lidong Yan via GitGitGadget @ 2025-06-03  3:14 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Taylor Blau, Lidong Yan, Lidong Yan

From: Lidong Yan <502024330056@smail.nju.edu.cn>

This patch add test_bitmap_list_commits_offset() in patch-bitmap.c,
a new test helper command `test-tool bitmap list-commits-offset`,
and a `load corrupt bitmap` test case in t5310.

The `load corrupt bitmap` test case intentionally corrupt the
"xor_offset" field of the first entry. And the newly added helper
can help to find position of "xor_offset" in bitmap file.

Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
---
 pack-bitmap.c           | 62 +++++++++++++++++++++++++++++++++++++----
 pack-bitmap.h           |  1 +
 t/helper/test-bitmap.c  |  8 ++++++
 t/t5310-pack-bitmaps.sh | 30 ++++++++++++++++++++
 4 files changed, 96 insertions(+), 5 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index e514c9da239b..0825129b58f3 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -31,6 +31,7 @@ struct stored_bitmap {
 	struct object_id oid;
 	struct ewah_bitmap *root;
 	struct stored_bitmap *xor;
+	size_t map_pos;
 	int flags;
 };
 
@@ -314,13 +315,14 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
 					  struct ewah_bitmap *root,
 					  const struct object_id *oid,
 					  struct stored_bitmap *xor_with,
-					  int flags)
+					  int flags, size_t map_pos)
 {
 	struct stored_bitmap *stored;
 	khiter_t hash_pos;
 	int ret;
 
 	stored = xmalloc(sizeof(struct stored_bitmap));
+	stored->map_pos = map_pos;
 	stored->root = root;
 	stored->xor = xor_with;
 	stored->flags = flags;
@@ -376,10 +378,12 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 		struct stored_bitmap *xor_bitmap = NULL;
 		uint32_t commit_idx_pos;
 		struct object_id oid;
+		size_t entry_map_pos;
 
 		if (index->map_size - index->map_pos < 6)
 			return error(_("corrupt ewah bitmap: truncated header for entry %d"), i);
 
+		entry_map_pos = index->map_pos;
 		commit_idx_pos = read_be32(index->map, &index->map_pos);
 		xor_offset = read_u8(index->map, &index->map_pos);
 		flags = read_u8(index->map, &index->map_pos);
@@ -402,8 +406,9 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 		if (!bitmap)
 			return -1;
 
-		recent_bitmaps[i % MAX_XOR_OFFSET] = store_bitmap(
-			index, bitmap, &oid, xor_bitmap, flags);
+		recent_bitmaps[i % MAX_XOR_OFFSET] =
+			store_bitmap(index, bitmap, &oid, xor_bitmap, flags,
+				     entry_map_pos);
 	}
 
 	return 0;
@@ -869,6 +874,7 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
 	int xor_flags;
 	khiter_t hash_pos;
 	struct bitmap_lookup_table_xor_item *xor_item;
+	size_t entry_map_pos;
 
 	if (is_corrupt)
 		return NULL;
@@ -928,6 +934,7 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
 			goto corrupt;
 		}
 
+		entry_map_pos = bitmap_git->map_pos;
 		bitmap_git->map_pos += sizeof(uint32_t) + sizeof(uint8_t);
 		xor_flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
 		bitmap = read_bitmap_1(bitmap_git);
@@ -935,7 +942,8 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
 		if (!bitmap)
 			goto corrupt;
 
-		xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_item->oid, xor_bitmap, xor_flags);
+		xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_item->oid,
+					  xor_bitmap, xor_flags, entry_map_pos);
 		xor_items_nr--;
 	}
 
@@ -969,6 +977,7 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
 	 * Instead, we can skip ahead and immediately read the flags and
 	 * ewah bitmap.
 	 */
+	entry_map_pos = bitmap_git->map_pos;
 	bitmap_git->map_pos += sizeof(uint32_t) + sizeof(uint8_t);
 	flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
 	bitmap = read_bitmap_1(bitmap_git);
@@ -976,7 +985,8 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
 	if (!bitmap)
 		goto corrupt;
 
-	return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags);
+	return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags,
+			    entry_map_pos);
 
 corrupt:
 	free(xor_items);
@@ -2857,6 +2867,48 @@ int test_bitmap_commits(struct repository *r)
 	return 0;
 }
 
+int test_bitmap_commits_with_offset(struct repository *r)
+{
+	struct object_id oid;
+	struct stored_bitmap *stored;
+	struct bitmap_index *bitmap_git;
+	size_t commit_idx_pos_map_pos, xor_offset_map_pos, flag_map_pos,
+		ewah_bitmap_map_pos;
+
+	bitmap_git = prepare_bitmap_git(r);
+	if (!bitmap_git)
+		die(_("failed to load bitmap indexes"));
+
+	/*
+	 * Since this function needs to know the position of each individual
+	 * bitmap, bypass the commit lookup table (if one exists) by forcing
+	 * the bitmap to eagerly load its entries.
+	 */
+	if (bitmap_git->table_lookup) {
+		if (load_bitmap_entries_v1(bitmap_git) < 0)
+			die(_("failed to load bitmap indexes"));
+	}
+
+	kh_foreach (bitmap_git->bitmaps, oid, stored, {
+		commit_idx_pos_map_pos = stored->map_pos;
+		xor_offset_map_pos = stored->map_pos + sizeof(uint32_t);
+		flag_map_pos = xor_offset_map_pos + sizeof(uint8_t);
+		ewah_bitmap_map_pos = flag_map_pos + sizeof(uint8_t);
+
+		printf_ln("%s %"PRIuMAX" %"PRIuMAX" %"PRIuMAX" %"PRIuMAX,
+			  oid_to_hex(&oid),
+			  (uintmax_t)commit_idx_pos_map_pos,
+			  (uintmax_t)xor_offset_map_pos,
+			  (uintmax_t)flag_map_pos,
+			  (uintmax_t)ewah_bitmap_map_pos);
+	})
+		;
+
+	free_bitmap_index(bitmap_git);
+
+	return 0;
+}
+
 int test_bitmap_hashes(struct repository *r)
 {
 	struct bitmap_index *bitmap_git = prepare_bitmap_git(r);
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 382d39499af2..1bd7a791e2a0 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -81,6 +81,7 @@ void traverse_bitmap_commit_list(struct bitmap_index *,
 				 show_reachable_fn show_reachable);
 void test_bitmap_walk(struct rev_info *revs);
 int test_bitmap_commits(struct repository *r);
+int test_bitmap_commits_with_offset(struct repository *r);
 int test_bitmap_hashes(struct repository *r);
 int test_bitmap_pseudo_merges(struct repository *r);
 int test_bitmap_pseudo_merge_commits(struct repository *r, uint32_t n);
diff --git a/t/helper/test-bitmap.c b/t/helper/test-bitmap.c
index 3f23f2107268..16a01669e414 100644
--- a/t/helper/test-bitmap.c
+++ b/t/helper/test-bitmap.c
@@ -10,6 +10,11 @@ static int bitmap_list_commits(void)
 	return test_bitmap_commits(the_repository);
 }
 
+static int bitmap_list_commits_with_offset(void)
+{
+	return test_bitmap_commits_with_offset(the_repository);
+}
+
 static int bitmap_dump_hashes(void)
 {
 	return test_bitmap_hashes(the_repository);
@@ -36,6 +41,8 @@ int cmd__bitmap(int argc, const char **argv)
 
 	if (argc == 2 && !strcmp(argv[1], "list-commits"))
 		return bitmap_list_commits();
+	if (argc == 2 && !strcmp(argv[1], "list-commits-with-offset"))
+		return bitmap_list_commits_with_offset();
 	if (argc == 2 && !strcmp(argv[1], "dump-hashes"))
 		return bitmap_dump_hashes();
 	if (argc == 2 && !strcmp(argv[1], "dump-pseudo-merges"))
@@ -46,6 +53,7 @@ int cmd__bitmap(int argc, const char **argv)
 		return bitmap_dump_pseudo_merge_objects(atoi(argv[2]));
 
 	usage("\ttest-tool bitmap list-commits\n"
+	      "\ttest-tool bitmap list-commits-with-offset\n"
 	      "\ttest-tool bitmap dump-hashes\n"
 	      "\ttest-tool bitmap dump-pseudo-merges\n"
 	      "\ttest-tool bitmap dump-pseudo-merge-commits <n>\n"
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index a62b463eaf09..df05d7419185 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -486,6 +486,36 @@ test_bitmap_cases () {
 			grep "ignoring extra bitmap" trace2.txt
 		)
 	'
+
+	test_expect_success 'load corrupt bitmap' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
+
+			test_commit base &&
+
+			git repack -adb &&
+			bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
+			chmod +w $bitmap &&
+
+			test-tool bitmap list-commits-with-offset >offsets &&
+			xor_off=$(head -n1 offsets | awk "{print \$3}") &&
+			printf '\161' |
+				dd of=$bitmap count=1 bs=1 conv=notrunc seek=$xor_off &&
+
+			git rev-list --objects --no-object-names HEAD >expect.raw &&
+			git rev-list --objects --use-bitmap-index --no-object-names HEAD \
+				>actual.raw &&
+
+			sort expect.raw >expect &&
+			sort actual.raw >actual &&
+
+		    test_cmp expect actual
+		)
+	'
 }
 
 test_bitmap_cases
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 2/3] pack-bitmap: reword comments in test_bitmap_commits()
  2025-06-03  3:14         ` [PATCH v5 2/3] pack-bitmap: reword comments in test_bitmap_commits() Lidong Yan via GitGitGadget
@ 2025-06-03 22:13           ` Taylor Blau
  0 siblings, 0 replies; 45+ messages in thread
From: Taylor Blau @ 2025-06-03 22:13 UTC (permalink / raw)
  To: Lidong Yan via GitGitGadget; +Cc: git, Jeff King, Lidong Yan

On Tue, Jun 03, 2025 at 03:14:03AM +0000, Lidong Yan via GitGitGadget wrote:
> From: Lidong Yan <502024330056@smail.nju.edu.cn>
>
> In pack-bitmap.c:test_bitmap_commits(), it comments
>
>     /*
>      * As this function is only used to print bitmap selected
>      * commits, we don't have to read the commit table.
>      */
>

There is no need to include the original comment here, since it is clear
from the patch below what you're referring to.

I don't think this alone is worth rerolling the series, but others may
feel differently.

> This suggests that we can avoid reading the commit table altogether.
> However, this comment is misleading. The reason we load bitmap entries here
> is because test_bitmap_commits() needs to print the commit IDs from the
> bitmap, and we must read the bitmap entries to obtain those commit IDs.
> So reword this comment.
>
> Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
> ---
>  pack-bitmap.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index fd19c2255163..e514c9da239b 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -2839,8 +2839,9 @@ int test_bitmap_commits(struct repository *r)
>  		die(_("failed to load bitmap indexes"));
>
>  	/*
> -	 * As this function is only used to print bitmap selected
> -	 * commits, we don't have to read the commit table.
> +	 * Since this function needs to print bitmap selected

The phrase "bitmap selected commits" is a little awkward. I might have
written either "the bitmapped commits", or "the set of commits which
have bitmaps".

> +	 * commits, bypass the commit lookup table (if one exists)
> +	 * by forcing the bitmap to eagerly load its entries.
>  	 */
>  	if (bitmap_git->table_lookup) {
>  		if (load_bitmap_entries_v1(bitmap_git) < 0)
> --
> gitgitgadget
>
Thanks,
Taylor

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v5 0/3] pack-bitmap: fix memory leak if load_bitmap failed
  2025-06-03  3:14       ` [PATCH v5 0/3] pack-bitmap: fix memory leak if load_bitmap failed Lidong Yan via GitGitGadget
                           ` (2 preceding siblings ...)
  2025-06-03  3:14         ` [PATCH v5 3/3] pack-bitmap: add load corrupt bitmap test Lidong Yan via GitGitGadget
@ 2025-06-03 22:14         ` Taylor Blau
  2025-07-01  5:32         ` [PATCH v6 " Lidong Yan via GitGitGadget
  4 siblings, 0 replies; 45+ messages in thread
From: Taylor Blau @ 2025-06-03 22:14 UTC (permalink / raw)
  To: Lidong Yan via GitGitGadget; +Cc: git, Jeff King, Lidong Yan

On Tue, Jun 03, 2025 at 03:14:01AM +0000, Lidong Yan via GitGitGadget wrote:
> Lidong Yan (2):
>   pack-bitmap: reword comments in test_bitmap_commits()
>   pack-bitmap: add load corrupt bitmap test
>
> Taylor Blau (1):
>   pack-bitmap: fix memory leak if load_bitmap() failed

This version looks pretty good to me. There's a pair of minor
suggestions that I left on the second patch, but otherwise I think the
result is ready to start merging down.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v6 0/3] pack-bitmap: fix memory leak if load_bitmap failed
  2025-06-03  3:14       ` [PATCH v5 0/3] pack-bitmap: fix memory leak if load_bitmap failed Lidong Yan via GitGitGadget
                           ` (3 preceding siblings ...)
  2025-06-03 22:14         ` [PATCH v5 0/3] pack-bitmap: fix memory leak if load_bitmap failed Taylor Blau
@ 2025-07-01  5:32         ` Lidong Yan via GitGitGadget
  2025-07-01  5:32           ` [PATCH v6 1/3] pack-bitmap: fix memory leak if load_bitmap() failed Taylor Blau via GitGitGadget
                             ` (3 more replies)
  4 siblings, 4 replies; 45+ messages in thread
From: Lidong Yan via GitGitGadget @ 2025-07-01  5:32 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Taylor Blau, Lidong Yan

Since it seems this patch has been inactive for some time, I have revised
the comments according to Taylor's feedback and submitted a new version.

This patch prevents pack-bitmap.c:load_bitmap() from nulling
bitmap_git->bitmap when loading failed. Thus eliminates memory leak. This
patch also add a test case in t5310 which use clang leak sanitizer to detect
whether leak happens when loading failed.

Lidong Yan (2):
  pack-bitmap: reword comments in test_bitmap_commits()
  pack-bitmap: add load corrupt bitmap test

Taylor Blau (1):
  pack-bitmap: fix memory leak if load_bitmap() failed

 pack-bitmap.c           | 88 ++++++++++++++++++++++++++++++-----------
 pack-bitmap.h           |  1 +
 t/helper/test-bitmap.c  |  8 ++++
 t/t5310-pack-bitmaps.sh | 30 ++++++++++++++
 4 files changed, 103 insertions(+), 24 deletions(-)


base-commit: f0135a9047ca37d4d117dcf21f7e3e89fad85d00
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1962%2Fbrandb97%2Ffix-pack-bitmap-leak-v6
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1962/brandb97/fix-pack-bitmap-leak-v6
Pull-Request: https://github.com/git/git/pull/1962

Range-diff vs v5:

 1:  9ce2135df2a ! 1:  3d70e14e415 pack-bitmap: fix memory leak if load_bitmap() failed
     @@ Commit message
              });
      
          , but won't since load_bitmap() already called kh_destroy_oid_map() and
     -    NULL'd the "bitmaps" pointer from within its "failed" label.
     -
     -    So I think if you got part of the way through loading bitmap entries and
     -    then failed, you would leak all of the previous entries that you were
     -    able to load successfully.
     +    NULL'd the "bitmaps" pointer from within its "failed" label. Thus if you
     +    got part of the way through loading bitmap entries and then failed, you
     +    would leak all of the previous entries that you were able to load
     +    successfully.
      
          The solution is to remove the error handling code in load_bitmap(), because
          its caller will always call free_bitmap_index() in case of an error.
 2:  a75d0a3cc7f ! 2:  6a082930ea3 pack-bitmap: reword comments in test_bitmap_commits()
     @@ Metadata
       ## Commit message ##
          pack-bitmap: reword comments in test_bitmap_commits()
      
     -    In pack-bitmap.c:test_bitmap_commits(), it comments
     -
     -        /*
     -         * As this function is only used to print bitmap selected
     -         * commits, we don't have to read the commit table.
     -         */
     -
     -    This suggests that we can avoid reading the commit table altogether.
     -    However, this comment is misleading. The reason we load bitmap entries here
     -    is because test_bitmap_commits() needs to print the commit IDs from the
     +    The comment in pack-bitmap.c:test_bitmap_commits(), suggests that
     +    we can avoid reading the commit table altogether. However, this
     +    comment is misleading. The reason we load bitmap entries here is
     +    because test_bitmap_commits() needs to print the commit IDs from the
          bitmap, and we must read the bitmap entries to obtain those commit IDs.
          So reword this comment.
      
     @@ pack-bitmap.c: int test_bitmap_commits(struct repository *r)
       	/*
      -	 * As this function is only used to print bitmap selected
      -	 * commits, we don't have to read the commit table.
     -+	 * Since this function needs to print bitmap selected
     ++	 * Since this function needs to print the bitmapped
      +	 * commits, bypass the commit lookup table (if one exists)
      +	 * by forcing the bitmap to eagerly load its entries.
       	 */
 3:  05140e2171d ! 3:  c1b5d030133 pack-bitmap: add load corrupt bitmap test
     @@ Metadata
       ## Commit message ##
          pack-bitmap: add load corrupt bitmap test
      
     -    This patch add test_bitmap_list_commits_offset() in patch-bitmap.c,
     -    a new test helper command `test-tool bitmap list-commits-offset`,
     -    and a `load corrupt bitmap` test case in t5310.
     -
     -    The `load corrupt bitmap` test case intentionally corrupt the
     -    "xor_offset" field of the first entry. And the newly added helper
     -    can help to find position of "xor_offset" in bitmap file.
     +    t5310 lacks a test to ensure git works correctly when commit bitmap
     +    data is corrupted. So this patch add test helper in pack-bitmap.c to
     +    list each commit bitmap position in bitmap file and `load corrupt bitmap`
     +    test case in t/t5310 to corrupt a commit bitmap before loading it.
      
          Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
      

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v6 1/3] pack-bitmap: fix memory leak if load_bitmap() failed
  2025-07-01  5:32         ` [PATCH v6 " Lidong Yan via GitGitGadget
@ 2025-07-01  5:32           ` Taylor Blau via GitGitGadget
  2025-07-01  5:32           ` [PATCH v6 2/3] pack-bitmap: reword comments in test_bitmap_commits() Lidong Yan via GitGitGadget
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 45+ messages in thread
From: Taylor Blau via GitGitGadget @ 2025-07-01  5:32 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Taylor Blau, Lidong Yan, Taylor Blau

From: Taylor Blau <me@ttaylorr.com>

After going through the "failed" label, load_bitmap() will return -1,
and its caller (either prepare_bitmap_walk() or prepare_bitmap_git())
will then call free_bitmap_index().

That function would have done:

    struct stored_bitmap *sb;
    kh_foreach_value(b->bitmaps, sb {
      ewah_pool_free(sb->root);
      free(sb);
    });

, but won't since load_bitmap() already called kh_destroy_oid_map() and
NULL'd the "bitmaps" pointer from within its "failed" label. Thus if you
got part of the way through loading bitmap entries and then failed, you
would leak all of the previous entries that you were able to load
successfully.

The solution is to remove the error handling code in load_bitmap(), because
its caller will always call free_bitmap_index() in case of an error.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
---
 pack-bitmap.c | 21 ++++-----------------
 1 file changed, 4 insertions(+), 17 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 8727f316de92..38588b4aec01 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -630,41 +630,28 @@ static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git,
 	bitmap_git->ext_index.positions = kh_init_oid_pos();
 
 	if (load_reverse_index(r, bitmap_git))
-		goto failed;
+		return -1;
 
 	if (!(bitmap_git->commits = read_bitmap_1(bitmap_git)) ||
 		!(bitmap_git->trees = read_bitmap_1(bitmap_git)) ||
 		!(bitmap_git->blobs = read_bitmap_1(bitmap_git)) ||
 		!(bitmap_git->tags = read_bitmap_1(bitmap_git)))
-		goto failed;
+		return -1;
 
 	if (!bitmap_git->table_lookup && load_bitmap_entries_v1(bitmap_git) < 0)
-		goto failed;
+		return -1;
 
 	if (bitmap_git->base) {
 		if (!bitmap_is_midx(bitmap_git))
 			BUG("non-MIDX bitmap has non-NULL base bitmap index");
 		if (load_bitmap(r, bitmap_git->base, 1) < 0)
-			goto failed;
+			return -1;
 	}
 
 	if (!recursing)
 		load_all_type_bitmaps(bitmap_git);
 
 	return 0;
-
-failed:
-	munmap(bitmap_git->map, bitmap_git->map_size);
-	bitmap_git->map = NULL;
-	bitmap_git->map_size = 0;
-
-	kh_destroy_oid_map(bitmap_git->bitmaps);
-	bitmap_git->bitmaps = NULL;
-
-	kh_destroy_oid_pos(bitmap_git->ext_index.positions);
-	bitmap_git->ext_index.positions = NULL;
-
-	return -1;
 }
 
 static int open_pack_bitmap(struct repository *r,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v6 2/3] pack-bitmap: reword comments in test_bitmap_commits()
  2025-07-01  5:32         ` [PATCH v6 " Lidong Yan via GitGitGadget
  2025-07-01  5:32           ` [PATCH v6 1/3] pack-bitmap: fix memory leak if load_bitmap() failed Taylor Blau via GitGitGadget
@ 2025-07-01  5:32           ` Lidong Yan via GitGitGadget
  2025-07-01  5:32           ` [PATCH v6 3/3] pack-bitmap: add load corrupt bitmap test Lidong Yan via GitGitGadget
  2025-07-07 22:53           ` [PATCH v6 0/3] pack-bitmap: fix memory leak if load_bitmap failed Junio C Hamano
  3 siblings, 0 replies; 45+ messages in thread
From: Lidong Yan via GitGitGadget @ 2025-07-01  5:32 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Taylor Blau, Lidong Yan, Lidong Yan

From: Lidong Yan <502024330056@smail.nju.edu.cn>

The comment in pack-bitmap.c:test_bitmap_commits(), suggests that
we can avoid reading the commit table altogether. However, this
comment is misleading. The reason we load bitmap entries here is
because test_bitmap_commits() needs to print the commit IDs from the
bitmap, and we must read the bitmap entries to obtain those commit IDs.
So reword this comment.

Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
---
 pack-bitmap.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 38588b4aec01..330f07609835 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2839,8 +2839,9 @@ int test_bitmap_commits(struct repository *r)
 		die(_("failed to load bitmap indexes"));
 
 	/*
-	 * As this function is only used to print bitmap selected
-	 * commits, we don't have to read the commit table.
+	 * Since this function needs to print the bitmapped
+	 * commits, bypass the commit lookup table (if one exists)
+	 * by forcing the bitmap to eagerly load its entries.
 	 */
 	if (bitmap_git->table_lookup) {
 		if (load_bitmap_entries_v1(bitmap_git) < 0)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v6 3/3] pack-bitmap: add load corrupt bitmap test
  2025-07-01  5:32         ` [PATCH v6 " Lidong Yan via GitGitGadget
  2025-07-01  5:32           ` [PATCH v6 1/3] pack-bitmap: fix memory leak if load_bitmap() failed Taylor Blau via GitGitGadget
  2025-07-01  5:32           ` [PATCH v6 2/3] pack-bitmap: reword comments in test_bitmap_commits() Lidong Yan via GitGitGadget
@ 2025-07-01  5:32           ` Lidong Yan via GitGitGadget
  2025-07-07 22:53           ` [PATCH v6 0/3] pack-bitmap: fix memory leak if load_bitmap failed Junio C Hamano
  3 siblings, 0 replies; 45+ messages in thread
From: Lidong Yan via GitGitGadget @ 2025-07-01  5:32 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Taylor Blau, Lidong Yan, Lidong Yan

From: Lidong Yan <502024330056@smail.nju.edu.cn>

t5310 lacks a test to ensure git works correctly when commit bitmap
data is corrupted. So this patch add test helper in pack-bitmap.c to
list each commit bitmap position in bitmap file and `load corrupt bitmap`
test case in t/t5310 to corrupt a commit bitmap before loading it.

Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
---
 pack-bitmap.c           | 62 +++++++++++++++++++++++++++++++++++++----
 pack-bitmap.h           |  1 +
 t/helper/test-bitmap.c  |  8 ++++++
 t/t5310-pack-bitmaps.sh | 30 ++++++++++++++++++++
 4 files changed, 96 insertions(+), 5 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 330f07609835..499d77a1d368 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -31,6 +31,7 @@ struct stored_bitmap {
 	struct object_id oid;
 	struct ewah_bitmap *root;
 	struct stored_bitmap *xor;
+	size_t map_pos;
 	int flags;
 };
 
@@ -314,13 +315,14 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
 					  struct ewah_bitmap *root,
 					  const struct object_id *oid,
 					  struct stored_bitmap *xor_with,
-					  int flags)
+					  int flags, size_t map_pos)
 {
 	struct stored_bitmap *stored;
 	khiter_t hash_pos;
 	int ret;
 
 	stored = xmalloc(sizeof(struct stored_bitmap));
+	stored->map_pos = map_pos;
 	stored->root = root;
 	stored->xor = xor_with;
 	stored->flags = flags;
@@ -376,10 +378,12 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 		struct stored_bitmap *xor_bitmap = NULL;
 		uint32_t commit_idx_pos;
 		struct object_id oid;
+		size_t entry_map_pos;
 
 		if (index->map_size - index->map_pos < 6)
 			return error(_("corrupt ewah bitmap: truncated header for entry %d"), i);
 
+		entry_map_pos = index->map_pos;
 		commit_idx_pos = read_be32(index->map, &index->map_pos);
 		xor_offset = read_u8(index->map, &index->map_pos);
 		flags = read_u8(index->map, &index->map_pos);
@@ -402,8 +406,9 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 		if (!bitmap)
 			return -1;
 
-		recent_bitmaps[i % MAX_XOR_OFFSET] = store_bitmap(
-			index, bitmap, &oid, xor_bitmap, flags);
+		recent_bitmaps[i % MAX_XOR_OFFSET] =
+			store_bitmap(index, bitmap, &oid, xor_bitmap, flags,
+				     entry_map_pos);
 	}
 
 	return 0;
@@ -869,6 +874,7 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
 	int xor_flags;
 	khiter_t hash_pos;
 	struct bitmap_lookup_table_xor_item *xor_item;
+	size_t entry_map_pos;
 
 	if (is_corrupt)
 		return NULL;
@@ -928,6 +934,7 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
 			goto corrupt;
 		}
 
+		entry_map_pos = bitmap_git->map_pos;
 		bitmap_git->map_pos += sizeof(uint32_t) + sizeof(uint8_t);
 		xor_flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
 		bitmap = read_bitmap_1(bitmap_git);
@@ -935,7 +942,8 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
 		if (!bitmap)
 			goto corrupt;
 
-		xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_item->oid, xor_bitmap, xor_flags);
+		xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_item->oid,
+					  xor_bitmap, xor_flags, entry_map_pos);
 		xor_items_nr--;
 	}
 
@@ -969,6 +977,7 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
 	 * Instead, we can skip ahead and immediately read the flags and
 	 * ewah bitmap.
 	 */
+	entry_map_pos = bitmap_git->map_pos;
 	bitmap_git->map_pos += sizeof(uint32_t) + sizeof(uint8_t);
 	flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
 	bitmap = read_bitmap_1(bitmap_git);
@@ -976,7 +985,8 @@ static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_
 	if (!bitmap)
 		goto corrupt;
 
-	return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags);
+	return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags,
+			    entry_map_pos);
 
 corrupt:
 	free(xor_items);
@@ -2857,6 +2867,48 @@ int test_bitmap_commits(struct repository *r)
 	return 0;
 }
 
+int test_bitmap_commits_with_offset(struct repository *r)
+{
+	struct object_id oid;
+	struct stored_bitmap *stored;
+	struct bitmap_index *bitmap_git;
+	size_t commit_idx_pos_map_pos, xor_offset_map_pos, flag_map_pos,
+		ewah_bitmap_map_pos;
+
+	bitmap_git = prepare_bitmap_git(r);
+	if (!bitmap_git)
+		die(_("failed to load bitmap indexes"));
+
+	/*
+	 * Since this function needs to know the position of each individual
+	 * bitmap, bypass the commit lookup table (if one exists) by forcing
+	 * the bitmap to eagerly load its entries.
+	 */
+	if (bitmap_git->table_lookup) {
+		if (load_bitmap_entries_v1(bitmap_git) < 0)
+			die(_("failed to load bitmap indexes"));
+	}
+
+	kh_foreach (bitmap_git->bitmaps, oid, stored, {
+		commit_idx_pos_map_pos = stored->map_pos;
+		xor_offset_map_pos = stored->map_pos + sizeof(uint32_t);
+		flag_map_pos = xor_offset_map_pos + sizeof(uint8_t);
+		ewah_bitmap_map_pos = flag_map_pos + sizeof(uint8_t);
+
+		printf_ln("%s %"PRIuMAX" %"PRIuMAX" %"PRIuMAX" %"PRIuMAX,
+			  oid_to_hex(&oid),
+			  (uintmax_t)commit_idx_pos_map_pos,
+			  (uintmax_t)xor_offset_map_pos,
+			  (uintmax_t)flag_map_pos,
+			  (uintmax_t)ewah_bitmap_map_pos);
+	})
+		;
+
+	free_bitmap_index(bitmap_git);
+
+	return 0;
+}
+
 int test_bitmap_hashes(struct repository *r)
 {
 	struct bitmap_index *bitmap_git = prepare_bitmap_git(r);
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 382d39499af2..1bd7a791e2a0 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -81,6 +81,7 @@ void traverse_bitmap_commit_list(struct bitmap_index *,
 				 show_reachable_fn show_reachable);
 void test_bitmap_walk(struct rev_info *revs);
 int test_bitmap_commits(struct repository *r);
+int test_bitmap_commits_with_offset(struct repository *r);
 int test_bitmap_hashes(struct repository *r);
 int test_bitmap_pseudo_merges(struct repository *r);
 int test_bitmap_pseudo_merge_commits(struct repository *r, uint32_t n);
diff --git a/t/helper/test-bitmap.c b/t/helper/test-bitmap.c
index 3f23f2107268..16a01669e414 100644
--- a/t/helper/test-bitmap.c
+++ b/t/helper/test-bitmap.c
@@ -10,6 +10,11 @@ static int bitmap_list_commits(void)
 	return test_bitmap_commits(the_repository);
 }
 
+static int bitmap_list_commits_with_offset(void)
+{
+	return test_bitmap_commits_with_offset(the_repository);
+}
+
 static int bitmap_dump_hashes(void)
 {
 	return test_bitmap_hashes(the_repository);
@@ -36,6 +41,8 @@ int cmd__bitmap(int argc, const char **argv)
 
 	if (argc == 2 && !strcmp(argv[1], "list-commits"))
 		return bitmap_list_commits();
+	if (argc == 2 && !strcmp(argv[1], "list-commits-with-offset"))
+		return bitmap_list_commits_with_offset();
 	if (argc == 2 && !strcmp(argv[1], "dump-hashes"))
 		return bitmap_dump_hashes();
 	if (argc == 2 && !strcmp(argv[1], "dump-pseudo-merges"))
@@ -46,6 +53,7 @@ int cmd__bitmap(int argc, const char **argv)
 		return bitmap_dump_pseudo_merge_objects(atoi(argv[2]));
 
 	usage("\ttest-tool bitmap list-commits\n"
+	      "\ttest-tool bitmap list-commits-with-offset\n"
 	      "\ttest-tool bitmap dump-hashes\n"
 	      "\ttest-tool bitmap dump-pseudo-merges\n"
 	      "\ttest-tool bitmap dump-pseudo-merge-commits <n>\n"
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index b6926f102708..6718fb98c057 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -495,6 +495,36 @@ test_bitmap_cases () {
 			grep "ignoring extra bitmap" trace2.txt
 		)
 	'
+
+	test_expect_success 'load corrupt bitmap' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
+
+			test_commit base &&
+
+			git repack -adb &&
+			bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
+			chmod +w $bitmap &&
+
+			test-tool bitmap list-commits-with-offset >offsets &&
+			xor_off=$(head -n1 offsets | awk "{print \$3}") &&
+			printf '\161' |
+				dd of=$bitmap count=1 bs=1 conv=notrunc seek=$xor_off &&
+
+			git rev-list --objects --no-object-names HEAD >expect.raw &&
+			git rev-list --objects --use-bitmap-index --no-object-names HEAD \
+				>actual.raw &&
+
+			sort expect.raw >expect &&
+			sort actual.raw >actual &&
+
+		    test_cmp expect actual
+		)
+	'
 }
 
 test_bitmap_cases
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v6 0/3] pack-bitmap: fix memory leak if load_bitmap failed
  2025-07-01  5:32         ` [PATCH v6 " Lidong Yan via GitGitGadget
                             ` (2 preceding siblings ...)
  2025-07-01  5:32           ` [PATCH v6 3/3] pack-bitmap: add load corrupt bitmap test Lidong Yan via GitGitGadget
@ 2025-07-07 22:53           ` Junio C Hamano
  2025-07-08 22:10             ` Taylor Blau
  3 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2025-07-07 22:53 UTC (permalink / raw)
  To: git; +Cc: Lidong Yan via GitGitGadget, Jeff King, Taylor Blau, Lidong Yan

"Lidong Yan via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Since it seems this patch has been inactive for some time, I have revised
> the comments according to Taylor's feedback and submitted a new version.
>
> This patch prevents pack-bitmap.c:load_bitmap() from nulling
> bitmap_git->bitmap when loading failed. Thus eliminates memory leak. This
> patch also add a test case in t5310 which use clang leak sanitizer to detect
> whether leak happens when loading failed.
>
> Lidong Yan (2):
>   pack-bitmap: reword comments in test_bitmap_commits()
>   pack-bitmap: add load corrupt bitmap test
>
> Taylor Blau (1):
>   pack-bitmap: fix memory leak if load_bitmap() failed

OK, now, how does this iteration look to folks?  We haven't heard
anybody say yet.  Is it ready to be marked for 'next' yet?

Thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v6 0/3] pack-bitmap: fix memory leak if load_bitmap failed
  2025-07-07 22:53           ` [PATCH v6 0/3] pack-bitmap: fix memory leak if load_bitmap failed Junio C Hamano
@ 2025-07-08 22:10             ` Taylor Blau
  2025-07-08 22:35               ` Junio C Hamano
  0 siblings, 1 reply; 45+ messages in thread
From: Taylor Blau @ 2025-07-08 22:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Lidong Yan via GitGitGadget, Jeff King, Lidong Yan

On Mon, Jul 07, 2025 at 03:53:10PM -0700, Junio C Hamano wrote:
> "Lidong Yan via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Since it seems this patch has been inactive for some time, I have revised
> > the comments according to Taylor's feedback and submitted a new version.
> >
> > This patch prevents pack-bitmap.c:load_bitmap() from nulling
> > bitmap_git->bitmap when loading failed. Thus eliminates memory leak. This
> > patch also add a test case in t5310 which use clang leak sanitizer to detect
> > whether leak happens when loading failed.
> >
> > Lidong Yan (2):
> >   pack-bitmap: reword comments in test_bitmap_commits()
> >   pack-bitmap: add load corrupt bitmap test
> >
> > Taylor Blau (1):
> >   pack-bitmap: fix memory leak if load_bitmap() failed
>
> OK, now, how does this iteration look to folks?  We haven't heard
> anybody say yet.  Is it ready to be marked for 'next' yet?

Oops, this fell off of my review queue. This version looks great to me.
Thanks, Lidong!

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v6 0/3] pack-bitmap: fix memory leak if load_bitmap failed
  2025-07-08 22:10             ` Taylor Blau
@ 2025-07-08 22:35               ` Junio C Hamano
  0 siblings, 0 replies; 45+ messages in thread
From: Junio C Hamano @ 2025-07-08 22:35 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Lidong Yan via GitGitGadget, Jeff King, Lidong Yan

Taylor Blau <me@ttaylorr.com> writes:

> On Mon, Jul 07, 2025 at 03:53:10PM -0700, Junio C Hamano wrote:
>> "Lidong Yan via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>> > Since it seems this patch has been inactive for some time, I have revised
>> > the comments according to Taylor's feedback and submitted a new version.
>> >
>> > This patch prevents pack-bitmap.c:load_bitmap() from nulling
>> > bitmap_git->bitmap when loading failed. Thus eliminates memory leak. This
>> > patch also add a test case in t5310 which use clang leak sanitizer to detect
>> > whether leak happens when loading failed.
>> >
>> > Lidong Yan (2):
>> >   pack-bitmap: reword comments in test_bitmap_commits()
>> >   pack-bitmap: add load corrupt bitmap test
>> >
>> > Taylor Blau (1):
>> >   pack-bitmap: fix memory leak if load_bitmap() failed
>>
>> OK, now, how does this iteration look to folks?  We haven't heard
>> anybody say yet.  Is it ready to be marked for 'next' yet?
>
> Oops, this fell off of my review queue. This version looks great to me.
> Thanks, Lidong!

Thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2025-07-08 22:35 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-12 12:22 [PATCH] pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed Lidong Yan via GitGitGadget
2025-05-12 13:13 ` Jeff King
2025-05-13 17:47   ` Taylor Blau
2025-05-14 13:18     ` Junio C Hamano
2025-05-14 18:03     ` Jeff King
2025-05-15  1:37       ` lidongyan
2025-05-20  9:23 ` [PATCH v2 0/3] pack-bitmap: fix memory leak if load_bitmap_entries_v1 failed Lidong Yan via GitGitGadget
2025-05-20  9:23   ` [PATCH v2 1/3] pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed Lidong Yan via GitGitGadget
2025-05-20  9:23   ` [PATCH v2 2/3] " Taylor Blau via GitGitGadget
2025-05-21 23:54     ` Taylor Blau
2025-05-22 15:15       ` lidongyan
2025-05-22 21:22       ` Junio C Hamano
2025-05-20  9:23   ` [PATCH v2 3/3] pack-bitmap: add loading corrupt bitmap_index test Lidong Yan via GitGitGadget
2025-05-22  0:08     ` Taylor Blau
2025-05-22 15:05       ` lidongyan
2025-05-23  0:31         ` Taylor Blau
2025-05-23  7:17           ` lidongyan
2025-05-25  2:06   ` [PATCH v3 0/2] pack-bitmap: fix memory leak if load_bitmap_entries_v1 failed Lidong Yan via GitGitGadget
2025-05-25  2:06     ` [PATCH v3 1/2] pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed Taylor Blau via GitGitGadget
2025-05-25  2:06     ` [PATCH v3 2/2] pack-bitmap: add load corrupt bitmap test Lidong Yan via GitGitGadget
2025-05-25  2:43     ` [PATCH v4 0/2] pack-bitmap: fix memory leak if load_bitmap failed Lidong Yan via GitGitGadget
2025-05-25  2:43       ` [PATCH v4 1/2] pack-bitmap: fix memory leak if load_bitmap() failed Taylor Blau via GitGitGadget
2025-05-29 15:33         ` Junio C Hamano
2025-05-29 19:57           ` Taylor Blau
2025-05-29 22:04             ` Junio C Hamano
2025-05-30  3:50           ` lidongyan
2025-05-25  2:43       ` [PATCH v4 2/2] pack-bitmap: add load corrupt bitmap test Lidong Yan via GitGitGadget
2025-05-29 15:45         ` Junio C Hamano
2025-05-29 21:21           ` Taylor Blau
2025-05-30  3:53           ` lidongyan
2025-05-29 21:20         ` Taylor Blau
2025-05-30  4:03           ` lidongyan
2025-06-03  3:14       ` [PATCH v5 0/3] pack-bitmap: fix memory leak if load_bitmap failed Lidong Yan via GitGitGadget
2025-06-03  3:14         ` [PATCH v5 1/3] pack-bitmap: fix memory leak if load_bitmap() failed Taylor Blau via GitGitGadget
2025-06-03  3:14         ` [PATCH v5 2/3] pack-bitmap: reword comments in test_bitmap_commits() Lidong Yan via GitGitGadget
2025-06-03 22:13           ` Taylor Blau
2025-06-03  3:14         ` [PATCH v5 3/3] pack-bitmap: add load corrupt bitmap test Lidong Yan via GitGitGadget
2025-06-03 22:14         ` [PATCH v5 0/3] pack-bitmap: fix memory leak if load_bitmap failed Taylor Blau
2025-07-01  5:32         ` [PATCH v6 " Lidong Yan via GitGitGadget
2025-07-01  5:32           ` [PATCH v6 1/3] pack-bitmap: fix memory leak if load_bitmap() failed Taylor Blau via GitGitGadget
2025-07-01  5:32           ` [PATCH v6 2/3] pack-bitmap: reword comments in test_bitmap_commits() Lidong Yan via GitGitGadget
2025-07-01  5:32           ` [PATCH v6 3/3] pack-bitmap: add load corrupt bitmap test Lidong Yan via GitGitGadget
2025-07-07 22:53           ` [PATCH v6 0/3] pack-bitmap: fix memory leak if load_bitmap failed Junio C Hamano
2025-07-08 22:10             ` Taylor Blau
2025-07-08 22:35               ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).