public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>,
	Elijah Newren <newren@gmail.com>, Patrick Steinhardt <ps@pks.im>,
	Derrick Stolee <stolee@gmail.com>
Subject: [PATCH v3 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow`
Date: Fri, 27 Mar 2026 16:06:40 -0400	[thread overview]
Message-ID: <cover.1774641999.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1773959041.git.me@ttaylorr.com>

This is another small reroll of my series to fix an issue where MIDX
bitmaps fail to generate after a geometric repack in certain scenarios
where the set of MIDX'd objects is not closed under reachability.

The main changes since last time are:

 * Named enum stdin_pack_info_kind.

 * Refactored how we handle reading incoming packs via stdin.

 * Fixed a nasty case where sorting the packs in order of mtime happened
   to work on some systems, but ASan detected a very legitimate bug.

As usual, a range-diff is included below for convenience.

Thanks again for your review!

Taylor Blau (5):
  pack-objects: plug leak in `read_stdin_packs()`
  pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
  t7704: demonstrate failure with once-cruft objects above the geometric
    split
  pack-objects: support excluded-open packs with --stdin-packs
  repack: mark non-MIDX packs above the split as excluded-open

 Documentation/git-pack-objects.adoc |  25 ++-
 builtin/pack-objects.c              | 301 +++++++++++++++++++---------
 builtin/repack.c                    |  19 +-
 packfile.c                          |   3 +-
 packfile.h                          |   2 +
 t/t5331-pack-objects-stdin.sh       | 105 ++++++++++
 t/t7704-repack-cruft.sh             |  22 ++
 7 files changed, 373 insertions(+), 104 deletions(-)

Range-diff against v2:
1:  1fabd88f5e3 = 1:  d6ff4e801ab pack-objects: plug leak in `read_stdin_packs()`
2:  d5cb793f0eb ! 2:  dd9ff1ede4a pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
    @@ builtin/pack-objects.c
      #include "list.h"
      #include "packfile.h"
      #include "object-file.h"
    -@@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b)
    +@@ builtin/pack-objects.c: static void show_commit_pack_hint(struct commit *commit, void *data)
    + 
    + }
    + 
    ++/*
    ++ * stdin_pack_info_kind specifies how a pack specified over stdin
    ++ * should be treated when pack-objects is invoked with --stdin-packs.
    ++ *
    ++ *  - STDIN_PACK_INCLUDE: objects in any packs with this flag bit set
    ++ *    should be included in the output pack, unless they appear in an
    ++ *    excluded pack.
    ++ *
    ++ *  - STDIN_PACK_EXCLUDE_CLOSED: objects in any packs with this flag
    ++ *    bit set should be excluded from the output pack.
    ++ *
    ++ * Objects in packs whose 'kind' bits include STDIN_PACK_INCLUDE are
    ++ * used as traversal tips when invoked with --stdin-packs=follow.
    ++ */
    ++enum stdin_pack_info_kind {
    ++	STDIN_PACK_INCLUDE = (1<<0),
    ++	STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
    ++};
    ++
    ++struct stdin_pack_info {
    ++	struct packed_git *p;
    ++	enum stdin_pack_info_kind kind;
    ++};
    ++
    + static int pack_mtime_cmp(const void *_a, const void *_b)
    + {
    +-	struct packed_git *a = ((const struct string_list_item*)_a)->util;
    +-	struct packed_git *b = ((const struct string_list_item*)_b)->util;
    ++	struct stdin_pack_info *a = ((const struct string_list_item*)_a)->util;
    ++	struct stdin_pack_info *b = ((const struct string_list_item*)_b)->util;
    + 
    + 	/*
    + 	 * order packs by descending mtime so that objects are laid out
    + 	 * roughly as newest-to-oldest
    + 	 */
    +-	if (a->mtime < b->mtime)
    ++	if (a->p->mtime < b->p->mtime)
    + 		return 1;
    +-	else if (b->mtime < a->mtime)
    ++	else if (b->p->mtime < a->p->mtime)
    + 		return -1;
    + 	else
      		return 0;
      }
      
     -static void read_packs_list_from_stdin(struct rev_info *revs)
    -+struct stdin_pack_info {
    -+	struct packed_git *p;
    -+	enum {
    -+		STDIN_PACK_INCLUDE = (1<<0),
    -+		STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
    -+	} kind;
    -+};
    -+
     +static void stdin_packs_add_pack_entries(struct strmap *packs,
     +					 struct rev_info *revs)
    -+{
    + {
    +-	struct strbuf buf = STRBUF_INIT;
    +-	struct string_list include_packs = STRING_LIST_INIT_DUP;
    +-	struct string_list exclude_packs = STRING_LIST_INIT_DUP;
    +-	struct string_list_item *item = NULL;
    +-	struct packed_git *p;
     +	struct string_list keys = STRING_LIST_INIT_NODUP;
     +	struct string_list_item *item;
     +	struct hashmap_iter iter;
     +	struct strmap_entry *entry;
    -+
    + 
    +-	while (strbuf_getline(&buf, stdin) != EOF) {
    +-		if (!buf.len)
    +-			continue;
     +	strmap_for_each_entry(packs, &iter, entry) {
     +		struct stdin_pack_info *info = entry->value;
     +		if (!info->p)
     +			die(_("could not find pack '%s'"), entry->key);
    -+
    + 
    +-		if (*buf.buf == '^')
    +-			string_list_append(&exclude_packs, buf.buf + 1);
    +-		else
    +-			string_list_append(&include_packs, buf.buf);
    +-
    +-		strbuf_reset(&buf);
    +-	}
    +-
    +-	string_list_sort_u(&include_packs, 0);
    +-	string_list_sort_u(&exclude_packs, 0);
    +-
    +-	repo_for_each_pack(the_repository, p) {
    +-		const char *pack_name = pack_basename(p);
    +-
    +-		if ((item = string_list_lookup(&include_packs, pack_name))) {
    +-			if (exclude_promisor_objects && p->pack_promisor)
    +-				die(_("packfile %s is a promisor but --exclude-promisor-objects was given"), p->pack_name);
    +-			item->util = p;
    +-		}
    +-		if ((item = string_list_lookup(&exclude_packs, pack_name)))
    +-			item->util = p;
    +-	}
    +-
    +-	/*
    +-	 * Arguments we got on stdin may not even be packs. First
    +-	 * check that to avoid segfaulting later on in
    +-	 * e.g. pack_mtime_cmp(), excluded packs are handled below.
    +-	 *
    +-	 * Since we first parsed our STDIN and then sorted the input
    +-	 * lines the pack we error on will be whatever line happens to
    +-	 * sort first. This is lazy, it's enough that we report one
    +-	 * bad case here, we don't need to report the first/last one,
    +-	 * or all of them.
    +-	 */
    +-	for_each_string_list_item(item, &include_packs) {
    +-		struct packed_git *p = item->util;
    +-		if (!p)
    +-			die(_("could not find pack '%s'"), item->string);
    +-		if (!is_pack_valid(p))
    +-			die(_("packfile %s cannot be accessed"), p->pack_name);
    +-	}
    +-
    +-	/*
    +-	 * Then, handle all of the excluded packs, marking them as
    +-	 * kept in-core so that later calls to add_object_entry()
    +-	 * discards any objects that are also found in excluded packs.
    +-	 */
    +-	for_each_string_list_item(item, &exclude_packs) {
    +-		struct packed_git *p = item->util;
    +-		if (!p)
    +-			die(_("could not find pack '%s'"), item->string);
    +-		p->pack_keep_in_core = 1;
     +		string_list_append(&keys, entry->key)->util = info;
    -+	}
    -+
    -+	/*
    -+	 * Order packs by ascending mtime; use QSORT directly to access the
    -+	 * string_list_item's ->util pointer, which string_list_sort() does not
    -+	 * provide.
    -+	 */
    + 	}
    + 
    + 	/*
    +@@ builtin/pack-objects.c: static void read_packs_list_from_stdin(struct rev_info *revs)
    + 	 * string_list_item's ->util pointer, which string_list_sort() does not
    + 	 * provide.
    + 	 */
    +-	QSORT(include_packs.items, include_packs.nr, pack_mtime_cmp);
    +-
    +-	for_each_string_list_item(item, &include_packs) {
    +-		struct packed_git *p = item->util;
    +-		for_each_object_in_pack(p,
    +-					add_object_entry_from_pack,
    +-					revs,
    +-					ODB_FOR_EACH_OBJECT_PACK_ORDER);
     +	QSORT(keys.items, keys.nr, pack_mtime_cmp);
     +
     +	for_each_string_list_item(item, &keys) {
    @@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b
     +						add_object_entry_from_pack,
     +						revs,
     +						ODB_FOR_EACH_OBJECT_PACK_ORDER);
    -+	}
    -+
    + 	}
    + 
     +	string_list_clear(&keys, 0);
     +}
     +
     +static void stdin_packs_read_input(struct rev_info *revs)
    - {
    - 	struct strbuf buf = STRBUF_INIT;
    --	struct string_list include_packs = STRING_LIST_INIT_DUP;
    --	struct string_list exclude_packs = STRING_LIST_INIT_DUP;
    --	struct string_list_item *item = NULL;
    ++{
    ++	struct strbuf buf = STRBUF_INIT;
     +	struct strmap packs = STRMAP_INIT;
    - 	struct packed_git *p;
    - 
    - 	while (strbuf_getline(&buf, stdin) != EOF) {
    --		if (!buf.len)
    ++	struct packed_git *p;
    ++
    ++	while (strbuf_getline(&buf, stdin) != EOF) {
     +		struct stdin_pack_info *info;
    ++		enum stdin_pack_info_kind kind = STDIN_PACK_INCLUDE;
     +		const char *key = buf.buf;
     +
     +		if (!*key)
    - 			continue;
    -+		if (*key == '^')
    ++			continue;
    ++		else if (*key == '^')
    ++			kind = STDIN_PACK_EXCLUDE_CLOSED;
    ++
    ++		if (kind != STDIN_PACK_INCLUDE)
     +			key++;
     +
     +		info = strmap_get(&packs, key);
    @@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b
     +			CALLOC_ARRAY(info, 1);
     +			strmap_put(&packs, key, info);
     +		}
    - 
    - 		if (*buf.buf == '^')
    --			string_list_append(&exclude_packs, buf.buf + 1);
    -+			info->kind |= STDIN_PACK_EXCLUDE_CLOSED;
    - 		else
    --			string_list_append(&include_packs, buf.buf);
    -+			info->kind |= STDIN_PACK_INCLUDE;
    - 
    - 		strbuf_reset(&buf);
    - 	}
    - 
    --	string_list_sort_u(&include_packs, 0);
    --	string_list_sort_u(&exclude_packs, 0);
    --
    - 	repo_for_each_pack(the_repository, p) {
    --		const char *pack_name = pack_basename(p);
    ++
    ++		info->kind |= kind;
    ++
    ++		strbuf_reset(&buf);
    ++	}
    ++
    ++	repo_for_each_pack(the_repository, p) {
     +		struct stdin_pack_info *info;
    - 
    --		if ((item = string_list_lookup(&include_packs, pack_name))) {
    ++
     +		info = strmap_get(&packs, pack_basename(p));
     +		if (!info)
     +			continue;
     +
     +		if (info->kind & STDIN_PACK_INCLUDE) {
    - 			if (exclude_promisor_objects && p->pack_promisor)
    - 				die(_("packfile %s is a promisor but --exclude-promisor-objects was given"), p->pack_name);
    --			item->util = p;
    ++			if (exclude_promisor_objects && p->pack_promisor)
    ++				die(_("packfile %s is a promisor but --exclude-promisor-objects was given"), p->pack_name);
     +
     +			/*
     +			 * Arguments we got on stdin may not even be
    @@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b
     +			 */
     +			if (!is_pack_valid(p))
     +				die(_("packfile %s cannot be accessed"), p->pack_name);
    - 		}
    --		if ((item = string_list_lookup(&exclude_packs, pack_name)))
    --			item->util = p;
    --	}
    - 
    --	/*
    --	 * Arguments we got on stdin may not even be packs. First
    --	 * check that to avoid segfaulting later on in
    --	 * e.g. pack_mtime_cmp(), excluded packs are handled below.
    --	 *
    --	 * Since we first parsed our STDIN and then sorted the input
    --	 * lines the pack we error on will be whatever line happens to
    --	 * sort first. This is lazy, it's enough that we report one
    --	 * bad case here, we don't need to report the first/last one,
    --	 * or all of them.
    --	 */
    --	for_each_string_list_item(item, &include_packs) {
    --		struct packed_git *p = item->util;
    --		if (!p)
    --			die(_("could not find pack '%s'"), item->string);
    --		if (!is_pack_valid(p))
    --			die(_("packfile %s cannot be accessed"), p->pack_name);
    --	}
    ++		}
    ++
     +		if (info->kind & STDIN_PACK_EXCLUDE_CLOSED) {
     +			/*
     +			 * Marking excluded packs as kept in-core so
    @@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b
     +			 */
     +			p->pack_keep_in_core = 1;
     +		}
    - 
    --	/*
    --	 * Then, handle all of the excluded packs, marking them as
    --	 * kept in-core so that later calls to add_object_entry()
    --	 * discards any objects that are also found in excluded packs.
    --	 */
    --	for_each_string_list_item(item, &exclude_packs) {
    --		struct packed_git *p = item->util;
    --		if (!p)
    --			die(_("could not find pack '%s'"), item->string);
    --		p->pack_keep_in_core = 1;
    ++
     +		info->p = p;
    - 	}
    - 
    --	/*
    --	 * Order packs by ascending mtime; use QSORT directly to access the
    --	 * string_list_item's ->util pointer, which string_list_sort() does not
    --	 * provide.
    --	 */
    --	QSORT(include_packs.items, include_packs.nr, pack_mtime_cmp);
    --
    --	for_each_string_list_item(item, &include_packs) {
    --		struct packed_git *p = item->util;
    --		for_each_object_in_pack(p,
    --					add_object_entry_from_pack,
    --					revs,
    --					ODB_FOR_EACH_OBJECT_PACK_ORDER);
    --	}
    ++	}
    ++
     +	stdin_packs_add_pack_entries(&packs, revs);
    - 
    ++
      	strbuf_release(&buf);
     -	string_list_clear(&include_packs, 0);
     -	string_list_clear(&exclude_packs, 0);
3:  d8f0577077c = 3:  5a5090f8da2 t7704: demonstrate failure with once-cruft objects above the geometric split
4:  e028dfbc9fb ! 4:  efba1ab93d8 pack-objects: support excluded-open packs with --stdin-packs
    @@ builtin/pack-objects.c: static int add_object_entry_from_pack(const struct objec
      	create_object_entry(oid, type, 0, 0, 0, p, ofs);
      
      	return 0;
    +@@ builtin/pack-objects.c: static void show_commit_pack_hint(struct commit *commit, void *data)
    +  *  - STDIN_PACK_EXCLUDE_CLOSED: objects in any packs with this flag
    +  *    bit set should be excluded from the output pack.
    +  *
    +- * Objects in packs whose 'kind' bits include STDIN_PACK_INCLUDE are
    +- * used as traversal tips when invoked with --stdin-packs=follow.
    ++ *  - STDIN_PACK_EXCLUDE_OPEN: objects in any packs with this flag
    ++ *    bit set should be excluded from the output pack, but are not
    ++ *    guaranteed to be closed under reachability.
    ++ *
    ++ * Objects in packs whose 'kind' bits include STDIN_PACK_INCLUDE or
    ++ * STDIN_PACK_EXCLUDE_OPEN are used as traversal tips when invoked
    ++ * with --stdin-packs=follow.
    +  */
    + enum stdin_pack_info_kind {
    + 	STDIN_PACK_INCLUDE = (1<<0),
    + 	STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
    ++	STDIN_PACK_EXCLUDE_OPEN = (1<<2),
    + };
    + 
    + struct stdin_pack_info {
     @@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b)
      		return 0;
      }
    @@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b
     +	return stdin_packs_include_check_obj((struct object *)commit, data);
     +}
     +
    - struct stdin_pack_info {
    - 	struct packed_git *p;
    - 	enum {
    - 		STDIN_PACK_INCLUDE = (1<<0),
    - 		STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
    -+		STDIN_PACK_EXCLUDE_OPEN = (1<<2),
    - 	} kind;
    - };
    - 
    + static void stdin_packs_add_pack_entries(struct strmap *packs,
    + 					 struct rev_info *revs)
    + {
     @@ builtin/pack-objects.c: static void stdin_packs_add_pack_entries(struct strmap *packs,
      	for_each_string_list_item(item, &keys) {
      		struct stdin_pack_info *info = item->util;
    @@ builtin/pack-objects.c: static void stdin_packs_add_pack_entries(struct strmap *
      	struct strbuf buf = STRBUF_INIT;
      	struct strmap packs = STRMAP_INIT;
     @@ builtin/pack-objects.c: static void stdin_packs_read_input(struct rev_info *revs)
    - 
    - 		if (!*key)
      			continue;
    --		if (*key == '^')
    -+		if (*key == '^' ||
    -+		    (*key == '!' && mode == STDIN_PACKS_MODE_FOLLOW))
    + 		else if (*key == '^')
    + 			kind = STDIN_PACK_EXCLUDE_CLOSED;
    ++		else if (*key == '!' && mode == STDIN_PACKS_MODE_FOLLOW)
    ++			kind = STDIN_PACK_EXCLUDE_OPEN;
    + 
    + 		if (kind != STDIN_PACK_INCLUDE)
      			key++;
    - 
    - 		info = strmap_get(&packs, key);
    -@@ builtin/pack-objects.c: static void stdin_packs_read_input(struct rev_info *revs)
    - 
    - 		if (*buf.buf == '^')
    - 			info->kind |= STDIN_PACK_EXCLUDE_CLOSED;
    -+		else if (*buf.buf == '!' && mode == STDIN_PACKS_MODE_FOLLOW)
    -+			info->kind |= STDIN_PACK_EXCLUDE_OPEN;
    - 		else
    - 			info->kind |= STDIN_PACK_INCLUDE;
    - 
     @@ builtin/pack-objects.c: static void stdin_packs_read_input(struct rev_info *revs)
      			p->pack_keep_in_core = 1;
      		}
5:  23cb9f33dba = 5:  c9ad9a0c4ae repack: mark non-MIDX packs above the split as excluded-open

base-commit: 41688c1a2312f62f44435e1a6d03b4b904b5b0ec
-- 
2.53.0.724.gb20b077944a

  parent reply	other threads:[~2026-03-27 20:06 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-19 22:24 [PATCH 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
2026-03-19 22:24 ` [PATCH 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-24  7:39   ` Patrick Steinhardt
2026-03-25 23:03     ` Taylor Blau
2026-03-19 22:24 ` [PATCH 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-24  7:39   ` Patrick Steinhardt
2026-03-25 23:13     ` Taylor Blau
2026-03-19 22:24 ` [PATCH 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-19 22:24 ` [PATCH 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-21 16:57   ` Jeff King
2026-03-22 18:09     ` Taylor Blau
2026-03-25 23:19       ` Taylor Blau
2026-03-19 22:24 ` [PATCH 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-25 23:51 ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
2026-03-25 23:51   ` [PATCH v2 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-25 23:51   ` [PATCH v2 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-26 20:40     ` Derrick Stolee
2026-03-26 21:44       ` Taylor Blau
2026-03-26 22:11         ` Junio C Hamano
2026-03-26 22:32           ` Taylor Blau
2026-03-27  0:29             ` Derrick Stolee
2026-03-27 17:51               ` Taylor Blau
2026-03-27 18:34                 ` Derrick Stolee
2026-03-27 15:52             ` Junio C Hamano
2026-03-26 22:37     ` Taylor Blau
2026-03-25 23:51   ` [PATCH v2 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-25 23:51   ` [PATCH v2 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-26 20:48     ` Derrick Stolee
2026-03-25 23:51   ` [PATCH v2 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-26 20:49     ` Derrick Stolee
2026-03-26 21:44       ` Taylor Blau
2026-03-26 20:51   ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Derrick Stolee
2026-03-26 21:46     ` Taylor Blau
2026-03-27 20:06 ` Taylor Blau [this message]
2026-03-27 20:06   ` [PATCH v3 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-27 20:06   ` [PATCH v3 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-27 20:06   ` [PATCH v3 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-27 20:06   ` [PATCH v3 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-27 20:06   ` [PATCH v3 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-27 20:16   ` [PATCH v3 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Derrick Stolee
2026-03-27 20:43     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1774641999.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox