public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>,
	Elijah Newren <newren@gmail.com>, Patrick Steinhardt <ps@pks.im>
Subject: [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow`
Date: Wed, 25 Mar 2026 19:51:42 -0400	[thread overview]
Message-ID: <cover.1774482700.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1773959041.git.me@ttaylorr.com>

This is a small reroll of my series to fix an issue where MIDX bitmaps
fail to generate after a geometric repack in certain scenarios where the
set of MIDX'd objects is not closed under reachability.

The main changes since last time are:

 * Clarification in the first patch that the added `release_revisions()`
   call prevents a *potential* leak, not an actual one.

 * Cleanup in the second patch (where we convert the --stdin-packs
   handling to use a strmap) based on Patrick's review.

 * Dropped an unnecessary "if (p)" conditional in the fourth patch's
   `add_object_entry_from_pack()` callback that is unnecessary.

Otherwise, the series is unchanged from the original round. As usual, a
range-diff is included below for convenience.

Thanks again for your review!

Taylor Blau (5):
  pack-objects: plug leak in `read_stdin_packs()`
  pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
  t7704: demonstrate failure with once-cruft objects above the geometric
    split
  pack-objects: support excluded-open packs with --stdin-packs
  repack: mark non-MIDX packs above the split as excluded-open

 Documentation/git-pack-objects.adoc |  25 ++-
 builtin/pack-objects.c              | 254 +++++++++++++++++++---------
 builtin/repack.c                    |  19 ++-
 packfile.c                          |   3 +-
 packfile.h                          |   2 +
 t/t5331-pack-objects-stdin.sh       | 105 ++++++++++++
 t/t7704-repack-cruft.sh             |  22 +++
 7 files changed, 339 insertions(+), 91 deletions(-)

Range-diff against v1:
1:  1dac74f1e4a ! 1:  1fabd88f5e3 pack-objects: plug leak in `read_stdin_packs()`
    @@ Commit message
         The `read_stdin_packs()` function added originally via 339bce27f4f
         (builtin/pack-objects.c: add '--stdin-packs' option, 2021-02-22)
         declares a `rev_info` struct but neglects to call `release_revisions()`
    -    on it before returning, creating a leak.
    +    on it before returning, creating the potential for a leak.
     
         The related change in 97ec43247c0 (pack-objects: declare 'rev_info' for
         '--stdin-packs' earlier, 2025-06-23) carried forward this oversight and
         did not address it.
     
         Ensure that we call `release_revisions()` appropriately to prevent a
    -    leak from this function.
    +    potential leak from this function. Note that in practice our `rev_info`
    +    here does not have a present leak, hence t5331 passes cleanly before
    +    this commit, even when built with SANITIZE=leak.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
2:  ea6fdbcc46f ! 2:  d5cb793f0eb pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
    @@ Commit message
         excluded.
     
         Extract the logic for sorting packs by mtime and adding their objects
    -    into a separate `stdin_packs_add_entries()` helper.
    +    into a separate `stdin_packs_add_pack_entries()` helper.
     
         While we could have used a `string_list`, we must handle the case where
         the same pack is specified more than once. With a `string_list` only, we
    @@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b
     +		if (!info->p)
     +			die(_("could not find pack '%s'"), entry->key);
     +
    -+		string_list_append(&keys, entry->key)->util = info->p;
    ++		string_list_append(&keys, entry->key)->util = info;
     +	}
     +
     +	/*
    @@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b
     +	QSORT(keys.items, keys.nr, pack_mtime_cmp);
     +
     +	for_each_string_list_item(item, &keys) {
    -+		struct stdin_pack_info *info = strmap_get(packs, item->string);
    -+		if (!info->p)
    -+			die(_("could not find pack '%s'"), item->string);
    ++		struct stdin_pack_info *info = item->util;
     +
     +		if (info->kind & STDIN_PACK_INCLUDE)
     +			for_each_object_in_pack(info->p,
    @@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b
     +		struct stdin_pack_info *info;
     +		const char *key = buf.buf;
     +
    -+		if (!key || !*key)
    ++		if (!*key)
      			continue;
    - 
     +		if (*key == '^')
     +			key++;
     +
    @@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b
     +			CALLOC_ARRAY(info, 1);
     +			strmap_put(&packs, key, info);
     +		}
    -+
    + 
      		if (*buf.buf == '^')
     -			string_list_append(&exclude_packs, buf.buf + 1);
     +			info->kind |= STDIN_PACK_EXCLUDE_CLOSED;
3:  0ec9bba92ad = 3:  d8f0577077c t7704: demonstrate failure with once-cruft objects above the geometric split
4:  bd78919e19c ! 4:  e028dfbc9fb pack-objects: support excluded-open packs with --stdin-packs
    @@ Commit message
         In `add_object_entry_from_pack()`, move the `want_object_in_pack()`
         check to *after* `add_pending_oid()`. This is necessary so that commits
         from excluded-open packs are added as traversal tips even though their
    -    objects won't appear in the output. The `include_check` and
    -    `include_check_obj` callbacks on `rev_info` are used to halt the walk at
    -    closed-excluded packs, since objects behind a '^' boundary are
    -    guaranteed to have closure and need not be rescued.
    +    objects won't appear in the output. As a consequence, the caller
    +    `for_each_object_in_pack()` will always provide a non-NULL 'p', hence we
    +    are able to drop the "if (p)" conditional.
    +
    +    The `include_check` and `include_check_obj` callbacks on `rev_info` are
    +    used to halt the walk at closed-excluded packs, since objects behind a
    +    '^' boundary are guaranteed to have closure and need not be rescued.
     
         The following commit will make use of this new functionality within the
         repack layer to resolve the test failure demonstrated in the previous
    @@ builtin/pack-objects.c: static int want_found_object(const struct object_id *oid
      				return 0;
      		} else {
     @@ builtin/pack-objects.c: static int add_object_entry_from_pack(const struct object_id *oid,
    + 				      void *_data)
    + {
    + 	off_t ofs;
    ++	struct object_info oi = OBJECT_INFO_INIT;
    + 	enum object_type type = OBJ_NONE;
    + 
    + 	display_progress(progress_state, ++nr_seen);
    +@@ builtin/pack-objects.c: static int add_object_entry_from_pack(const struct object_id *oid,
    + 	if (have_duplicate_entry(oid, 0))
      		return 0;
      
    ++	stdin_packs_found_nr++;
    ++
      	ofs = nth_packed_object_offset(p, pos);
    --	if (!want_object_in_pack(oid, 0, &p, &ofs))
    --		return 0;
    ++
    ++	oi.typep = &type;
    ++	if (packed_object_info(p, ofs, &oi) < 0) {
    ++		die(_("could not get type of object %s in pack %s"),
    ++		    oid_to_hex(oid), p->pack_name);
    ++	} else if (type == OBJ_COMMIT) {
    ++		struct rev_info *revs = _data;
    ++		/*
    ++		 * commits in included packs are used as starting points
    ++		 * for the subsequent revision walk
    ++		 *
    ++		 * Note that we do want to walk through commits that are
    ++		 * present in excluded-open ('!') packs to pick up any
    ++		 * objects reachable from them not present in the
    ++		 * excluded-closed ('^') packs.
    ++		 *
    ++		 * However, we'll only add those objects to the packing
    ++		 * list after checking `want_object_in_pack()` below.
    ++		 */
    ++		add_pending_oid(revs, NULL, oid, 0);
    ++	}
    ++
    + 	if (!want_object_in_pack(oid, 0, &p, &ofs))
    + 		return 0;
      
    - 	if (p) {
    - 		struct object_info oi = OBJECT_INFO_INIT;
    -@@ builtin/pack-objects.c: static int add_object_entry_from_pack(const struct object_id *oid,
    - 		} else if (type == OBJ_COMMIT) {
    - 			struct rev_info *revs = _data;
    - 			/*
    +-	if (p) {
    +-		struct object_info oi = OBJECT_INFO_INIT;
    +-
    +-		oi.typep = &type;
    +-		if (packed_object_info(p, ofs, &oi) < 0) {
    +-			die(_("could not get type of object %s in pack %s"),
    +-			    oid_to_hex(oid), p->pack_name);
    +-		} else if (type == OBJ_COMMIT) {
    +-			struct rev_info *revs = _data;
    +-			/*
     -			 * commits in included packs are used as starting points for the
     -			 * subsequent revision walk
    -+			 * commits in included packs are used as starting points
    -+			 * for the subsequent revision walk
    -+			 *
    -+			 * Note that we do want to walk through commits that are
    -+			 * present in excluded-open ('!') packs to pick up any
    -+			 * objects reachable from them not present in the
    -+			 * excluded-closed ('^') packs.
    -+			 *
    -+			 * However, we'll only add those objects to the packing
    -+			 * list after checking `want_object_in_pack()` below.
    - 			 */
    - 			add_pending_oid(revs, NULL, oid, 0);
    - 		}
    +-			 */
    +-			add_pending_oid(revs, NULL, oid, 0);
    +-		}
     -
     -		stdin_packs_found_nr++;
    - 	}
    - 
    -+	if (!want_object_in_pack(oid, 0, &p, &ofs))
    -+		return 0;
    -+
    -+	stdin_packs_found_nr++;
    -+
    +-	}
    +-
      	create_object_entry(oid, type, 0, 0, 0, p, ofs);
      
      	return 0;
    @@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b
      };
      
     @@ builtin/pack-objects.c: static void stdin_packs_add_pack_entries(struct strmap *packs,
    - 		if (!info->p)
    - 			die(_("could not find pack '%s'"), item->string);
    + 	for_each_string_list_item(item, &keys) {
    + 		struct stdin_pack_info *info = item->util;
      
     -		if (info->kind & STDIN_PACK_INCLUDE)
     +		if (info->kind & STDIN_PACK_EXCLUDE_OPEN) {
    @@ builtin/pack-objects.c: static void stdin_packs_add_pack_entries(struct strmap *
      	struct strbuf buf = STRBUF_INIT;
      	struct strmap packs = STRMAP_INIT;
     @@ builtin/pack-objects.c: static void stdin_packs_read_input(struct rev_info *revs)
    - 		if (!key || !*key)
    + 
    + 		if (!*key)
      			continue;
    - 
     -		if (*key == '^')
     +		if (*key == '^' ||
     +		    (*key == '!' && mode == STDIN_PACKS_MODE_FOLLOW))
5:  c4fd52e751a = 5:  23cb9f33dba repack: mark non-MIDX packs above the split as excluded-open

base-commit: 7ff1e8dc1e1680510c96e69965b3fa81372c5037
-- 
2.53.0.614.g164f3b634ec

  parent reply	other threads:[~2026-03-25 23:51 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-19 22:24 [PATCH 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
2026-03-19 22:24 ` [PATCH 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-24  7:39   ` Patrick Steinhardt
2026-03-25 23:03     ` Taylor Blau
2026-03-19 22:24 ` [PATCH 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-24  7:39   ` Patrick Steinhardt
2026-03-25 23:13     ` Taylor Blau
2026-03-19 22:24 ` [PATCH 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-19 22:24 ` [PATCH 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-21 16:57   ` Jeff King
2026-03-22 18:09     ` Taylor Blau
2026-03-25 23:19       ` Taylor Blau
2026-03-19 22:24 ` [PATCH 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-25 23:51 ` Taylor Blau [this message]
2026-03-25 23:51   ` [PATCH v2 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-25 23:51   ` [PATCH v2 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-26 20:40     ` Derrick Stolee
2026-03-26 21:44       ` Taylor Blau
2026-03-26 22:11         ` Junio C Hamano
2026-03-26 22:32           ` Taylor Blau
2026-03-27  0:29             ` Derrick Stolee
2026-03-27 17:51               ` Taylor Blau
2026-03-27 18:34                 ` Derrick Stolee
2026-03-27 15:52             ` Junio C Hamano
2026-03-26 22:37     ` Taylor Blau
2026-03-25 23:51   ` [PATCH v2 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-25 23:51   ` [PATCH v2 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-26 20:48     ` Derrick Stolee
2026-03-25 23:51   ` [PATCH v2 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-26 20:49     ` Derrick Stolee
2026-03-26 21:44       ` Taylor Blau
2026-03-26 20:51   ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Derrick Stolee
2026-03-26 21:46     ` Taylor Blau
2026-03-27 20:06 ` [PATCH v3 " Taylor Blau
2026-03-27 20:06   ` [PATCH v3 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-27 20:06   ` [PATCH v3 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-27 20:06   ` [PATCH v3 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-27 20:06   ` [PATCH v3 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-27 20:06   ` [PATCH v3 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-27 20:16   ` [PATCH v3 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Derrick Stolee
2026-03-27 20:43     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1774482700.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox