All of lore.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>,
	Elijah Newren <newren@gmail.com>, Patrick Steinhardt <ps@pks.im>,
	Derrick Stolee <stolee@gmail.com>
Subject: [PATCH v3 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow`
Date: Fri, 27 Mar 2026 16:06:40 -0400	[thread overview]
Message-ID: <cover.1774641999.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1773959041.git.me@ttaylorr.com>

This is another small reroll of my series to fix an issue where MIDX
bitmaps fail to generate after a geometric repack in certain scenarios
where the set of MIDX'd objects is not closed under reachability.

The main changes since last time are:

 * Named enum stdin_pack_info_kind.

 * Refactored how we handle reading incoming packs via stdin.

 * Fixed a nasty case where sorting the packs in order of mtime happened
   to work on some systems, but ASan detected a very legitimate bug.

As usual, a range-diff is included below for convenience.

Thanks again for your review!

Taylor Blau (5):
  pack-objects: plug leak in `read_stdin_packs()`
  pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
  t7704: demonstrate failure with once-cruft objects above the geometric
    split
  pack-objects: support excluded-open packs with --stdin-packs
  repack: mark non-MIDX packs above the split as excluded-open

 Documentation/git-pack-objects.adoc |  25 ++-
 builtin/pack-objects.c              | 301 +++++++++++++++++++---------
 builtin/repack.c                    |  19 +-
 packfile.c                          |   3 +-
 packfile.h                          |   2 +
 t/t5331-pack-objects-stdin.sh       | 105 ++++++++++
 t/t7704-repack-cruft.sh             |  22 ++
 7 files changed, 373 insertions(+), 104 deletions(-)

Range-diff against v2:
1:  1fabd88f5e3 = 1:  d6ff4e801ab pack-objects: plug leak in `read_stdin_packs()`
2:  d5cb793f0eb ! 2:  dd9ff1ede4a pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
    @@ builtin/pack-objects.c
      #include "list.h"
      #include "packfile.h"
      #include "object-file.h"
    -@@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b)
    +@@ builtin/pack-objects.c: static void show_commit_pack_hint(struct commit *commit, void *data)
    + 
    + }
    + 
    ++/*
    ++ * stdin_pack_info_kind specifies how a pack specified over stdin
    ++ * should be treated when pack-objects is invoked with --stdin-packs.
    ++ *
    ++ *  - STDIN_PACK_INCLUDE: objects in any packs with this flag bit set
    ++ *    should be included in the output pack, unless they appear in an
    ++ *    excluded pack.
    ++ *
    ++ *  - STDIN_PACK_EXCLUDE_CLOSED: objects in any packs with this flag
    ++ *    bit set should be excluded from the output pack.
    ++ *
    ++ * Objects in packs whose 'kind' bits include STDIN_PACK_INCLUDE are
    ++ * used as traversal tips when invoked with --stdin-packs=follow.
    ++ */
    ++enum stdin_pack_info_kind {
    ++	STDIN_PACK_INCLUDE = (1<<0),
    ++	STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
    ++};
    ++
    ++struct stdin_pack_info {
    ++	struct packed_git *p;
    ++	enum stdin_pack_info_kind kind;
    ++};
    ++
    + static int pack_mtime_cmp(const void *_a, const void *_b)
    + {
    +-	struct packed_git *a = ((const struct string_list_item*)_a)->util;
    +-	struct packed_git *b = ((const struct string_list_item*)_b)->util;
    ++	struct stdin_pack_info *a = ((const struct string_list_item*)_a)->util;
    ++	struct stdin_pack_info *b = ((const struct string_list_item*)_b)->util;
    + 
    + 	/*
    + 	 * order packs by descending mtime so that objects are laid out
    + 	 * roughly as newest-to-oldest
    + 	 */
    +-	if (a->mtime < b->mtime)
    ++	if (a->p->mtime < b->p->mtime)
    + 		return 1;
    +-	else if (b->mtime < a->mtime)
    ++	else if (b->p->mtime < a->p->mtime)
    + 		return -1;
    + 	else
      		return 0;
      }
      
     -static void read_packs_list_from_stdin(struct rev_info *revs)
    -+struct stdin_pack_info {
    -+	struct packed_git *p;
    -+	enum {
    -+		STDIN_PACK_INCLUDE = (1<<0),
    -+		STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
    -+	} kind;
    -+};
    -+
     +static void stdin_packs_add_pack_entries(struct strmap *packs,
     +					 struct rev_info *revs)
    -+{
    + {
    +-	struct strbuf buf = STRBUF_INIT;
    +-	struct string_list include_packs = STRING_LIST_INIT_DUP;
    +-	struct string_list exclude_packs = STRING_LIST_INIT_DUP;
    +-	struct string_list_item *item = NULL;
    +-	struct packed_git *p;
     +	struct string_list keys = STRING_LIST_INIT_NODUP;
     +	struct string_list_item *item;
     +	struct hashmap_iter iter;
     +	struct strmap_entry *entry;
    -+
    + 
    +-	while (strbuf_getline(&buf, stdin) != EOF) {
    +-		if (!buf.len)
    +-			continue;
     +	strmap_for_each_entry(packs, &iter, entry) {
     +		struct stdin_pack_info *info = entry->value;
     +		if (!info->p)
     +			die(_("could not find pack '%s'"), entry->key);
    -+
    + 
    +-		if (*buf.buf == '^')
    +-			string_list_append(&exclude_packs, buf.buf + 1);
    +-		else
    +-			string_list_append(&include_packs, buf.buf);
    +-
    +-		strbuf_reset(&buf);
    +-	}
    +-
    +-	string_list_sort_u(&include_packs, 0);
    +-	string_list_sort_u(&exclude_packs, 0);
    +-
    +-	repo_for_each_pack(the_repository, p) {
    +-		const char *pack_name = pack_basename(p);
    +-
    +-		if ((item = string_list_lookup(&include_packs, pack_name))) {
    +-			if (exclude_promisor_objects && p->pack_promisor)
    +-				die(_("packfile %s is a promisor but --exclude-promisor-objects was given"), p->pack_name);
    +-			item->util = p;
    +-		}
    +-		if ((item = string_list_lookup(&exclude_packs, pack_name)))
    +-			item->util = p;
    +-	}
    +-
    +-	/*
    +-	 * Arguments we got on stdin may not even be packs. First
    +-	 * check that to avoid segfaulting later on in
    +-	 * e.g. pack_mtime_cmp(), excluded packs are handled below.
    +-	 *
    +-	 * Since we first parsed our STDIN and then sorted the input
    +-	 * lines the pack we error on will be whatever line happens to
    +-	 * sort first. This is lazy, it's enough that we report one
    +-	 * bad case here, we don't need to report the first/last one,
    +-	 * or all of them.
    +-	 */
    +-	for_each_string_list_item(item, &include_packs) {
    +-		struct packed_git *p = item->util;
    +-		if (!p)
    +-			die(_("could not find pack '%s'"), item->string);
    +-		if (!is_pack_valid(p))
    +-			die(_("packfile %s cannot be accessed"), p->pack_name);
    +-	}
    +-
    +-	/*
    +-	 * Then, handle all of the excluded packs, marking them as
    +-	 * kept in-core so that later calls to add_object_entry()
    +-	 * discards any objects that are also found in excluded packs.
    +-	 */
    +-	for_each_string_list_item(item, &exclude_packs) {
    +-		struct packed_git *p = item->util;
    +-		if (!p)
    +-			die(_("could not find pack '%s'"), item->string);
    +-		p->pack_keep_in_core = 1;
     +		string_list_append(&keys, entry->key)->util = info;
    -+	}
    -+
    -+	/*
    -+	 * Order packs by ascending mtime; use QSORT directly to access the
    -+	 * string_list_item's ->util pointer, which string_list_sort() does not
    -+	 * provide.
    -+	 */
    + 	}
    + 
    + 	/*
    +@@ builtin/pack-objects.c: static void read_packs_list_from_stdin(struct rev_info *revs)
    + 	 * string_list_item's ->util pointer, which string_list_sort() does not
    + 	 * provide.
    + 	 */
    +-	QSORT(include_packs.items, include_packs.nr, pack_mtime_cmp);
    +-
    +-	for_each_string_list_item(item, &include_packs) {
    +-		struct packed_git *p = item->util;
    +-		for_each_object_in_pack(p,
    +-					add_object_entry_from_pack,
    +-					revs,
    +-					ODB_FOR_EACH_OBJECT_PACK_ORDER);
     +	QSORT(keys.items, keys.nr, pack_mtime_cmp);
     +
     +	for_each_string_list_item(item, &keys) {
    @@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b
     +						add_object_entry_from_pack,
     +						revs,
     +						ODB_FOR_EACH_OBJECT_PACK_ORDER);
    -+	}
    -+
    + 	}
    + 
     +	string_list_clear(&keys, 0);
     +}
     +
     +static void stdin_packs_read_input(struct rev_info *revs)
    - {
    - 	struct strbuf buf = STRBUF_INIT;
    --	struct string_list include_packs = STRING_LIST_INIT_DUP;
    --	struct string_list exclude_packs = STRING_LIST_INIT_DUP;
    --	struct string_list_item *item = NULL;
    ++{
    ++	struct strbuf buf = STRBUF_INIT;
     +	struct strmap packs = STRMAP_INIT;
    - 	struct packed_git *p;
    - 
    - 	while (strbuf_getline(&buf, stdin) != EOF) {
    --		if (!buf.len)
    ++	struct packed_git *p;
    ++
    ++	while (strbuf_getline(&buf, stdin) != EOF) {
     +		struct stdin_pack_info *info;
    ++		enum stdin_pack_info_kind kind = STDIN_PACK_INCLUDE;
     +		const char *key = buf.buf;
     +
     +		if (!*key)
    - 			continue;
    -+		if (*key == '^')
    ++			continue;
    ++		else if (*key == '^')
    ++			kind = STDIN_PACK_EXCLUDE_CLOSED;
    ++
    ++		if (kind != STDIN_PACK_INCLUDE)
     +			key++;
     +
     +		info = strmap_get(&packs, key);
    @@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b
     +			CALLOC_ARRAY(info, 1);
     +			strmap_put(&packs, key, info);
     +		}
    - 
    - 		if (*buf.buf == '^')
    --			string_list_append(&exclude_packs, buf.buf + 1);
    -+			info->kind |= STDIN_PACK_EXCLUDE_CLOSED;
    - 		else
    --			string_list_append(&include_packs, buf.buf);
    -+			info->kind |= STDIN_PACK_INCLUDE;
    - 
    - 		strbuf_reset(&buf);
    - 	}
    - 
    --	string_list_sort_u(&include_packs, 0);
    --	string_list_sort_u(&exclude_packs, 0);
    --
    - 	repo_for_each_pack(the_repository, p) {
    --		const char *pack_name = pack_basename(p);
    ++
    ++		info->kind |= kind;
    ++
    ++		strbuf_reset(&buf);
    ++	}
    ++
    ++	repo_for_each_pack(the_repository, p) {
     +		struct stdin_pack_info *info;
    - 
    --		if ((item = string_list_lookup(&include_packs, pack_name))) {
    ++
     +		info = strmap_get(&packs, pack_basename(p));
     +		if (!info)
     +			continue;
     +
     +		if (info->kind & STDIN_PACK_INCLUDE) {
    - 			if (exclude_promisor_objects && p->pack_promisor)
    - 				die(_("packfile %s is a promisor but --exclude-promisor-objects was given"), p->pack_name);
    --			item->util = p;
    ++			if (exclude_promisor_objects && p->pack_promisor)
    ++				die(_("packfile %s is a promisor but --exclude-promisor-objects was given"), p->pack_name);
     +
     +			/*
     +			 * Arguments we got on stdin may not even be
    @@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b
     +			 */
     +			if (!is_pack_valid(p))
     +				die(_("packfile %s cannot be accessed"), p->pack_name);
    - 		}
    --		if ((item = string_list_lookup(&exclude_packs, pack_name)))
    --			item->util = p;
    --	}
    - 
    --	/*
    --	 * Arguments we got on stdin may not even be packs. First
    --	 * check that to avoid segfaulting later on in
    --	 * e.g. pack_mtime_cmp(), excluded packs are handled below.
    --	 *
    --	 * Since we first parsed our STDIN and then sorted the input
    --	 * lines the pack we error on will be whatever line happens to
    --	 * sort first. This is lazy, it's enough that we report one
    --	 * bad case here, we don't need to report the first/last one,
    --	 * or all of them.
    --	 */
    --	for_each_string_list_item(item, &include_packs) {
    --		struct packed_git *p = item->util;
    --		if (!p)
    --			die(_("could not find pack '%s'"), item->string);
    --		if (!is_pack_valid(p))
    --			die(_("packfile %s cannot be accessed"), p->pack_name);
    --	}
    ++		}
    ++
     +		if (info->kind & STDIN_PACK_EXCLUDE_CLOSED) {
     +			/*
     +			 * Marking excluded packs as kept in-core so
    @@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b
     +			 */
     +			p->pack_keep_in_core = 1;
     +		}
    - 
    --	/*
    --	 * Then, handle all of the excluded packs, marking them as
    --	 * kept in-core so that later calls to add_object_entry()
    --	 * discards any objects that are also found in excluded packs.
    --	 */
    --	for_each_string_list_item(item, &exclude_packs) {
    --		struct packed_git *p = item->util;
    --		if (!p)
    --			die(_("could not find pack '%s'"), item->string);
    --		p->pack_keep_in_core = 1;
    ++
     +		info->p = p;
    - 	}
    - 
    --	/*
    --	 * Order packs by ascending mtime; use QSORT directly to access the
    --	 * string_list_item's ->util pointer, which string_list_sort() does not
    --	 * provide.
    --	 */
    --	QSORT(include_packs.items, include_packs.nr, pack_mtime_cmp);
    --
    --	for_each_string_list_item(item, &include_packs) {
    --		struct packed_git *p = item->util;
    --		for_each_object_in_pack(p,
    --					add_object_entry_from_pack,
    --					revs,
    --					ODB_FOR_EACH_OBJECT_PACK_ORDER);
    --	}
    ++	}
    ++
     +	stdin_packs_add_pack_entries(&packs, revs);
    - 
    ++
      	strbuf_release(&buf);
     -	string_list_clear(&include_packs, 0);
     -	string_list_clear(&exclude_packs, 0);
3:  d8f0577077c = 3:  5a5090f8da2 t7704: demonstrate failure with once-cruft objects above the geometric split
4:  e028dfbc9fb ! 4:  efba1ab93d8 pack-objects: support excluded-open packs with --stdin-packs
    @@ builtin/pack-objects.c: static int add_object_entry_from_pack(const struct objec
      	create_object_entry(oid, type, 0, 0, 0, p, ofs);
      
      	return 0;
    +@@ builtin/pack-objects.c: static void show_commit_pack_hint(struct commit *commit, void *data)
    +  *  - STDIN_PACK_EXCLUDE_CLOSED: objects in any packs with this flag
    +  *    bit set should be excluded from the output pack.
    +  *
    +- * Objects in packs whose 'kind' bits include STDIN_PACK_INCLUDE are
    +- * used as traversal tips when invoked with --stdin-packs=follow.
    ++ *  - STDIN_PACK_EXCLUDE_OPEN: objects in any packs with this flag
    ++ *    bit set should be excluded from the output pack, but are not
    ++ *    guaranteed to be closed under reachability.
    ++ *
    ++ * Objects in packs whose 'kind' bits include STDIN_PACK_INCLUDE or
    ++ * STDIN_PACK_EXCLUDE_OPEN are used as traversal tips when invoked
    ++ * with --stdin-packs=follow.
    +  */
    + enum stdin_pack_info_kind {
    + 	STDIN_PACK_INCLUDE = (1<<0),
    + 	STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
    ++	STDIN_PACK_EXCLUDE_OPEN = (1<<2),
    + };
    + 
    + struct stdin_pack_info {
     @@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b)
      		return 0;
      }
    @@ builtin/pack-objects.c: static int pack_mtime_cmp(const void *_a, const void *_b
     +	return stdin_packs_include_check_obj((struct object *)commit, data);
     +}
     +
    - struct stdin_pack_info {
    - 	struct packed_git *p;
    - 	enum {
    - 		STDIN_PACK_INCLUDE = (1<<0),
    - 		STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
    -+		STDIN_PACK_EXCLUDE_OPEN = (1<<2),
    - 	} kind;
    - };
    - 
    + static void stdin_packs_add_pack_entries(struct strmap *packs,
    + 					 struct rev_info *revs)
    + {
     @@ builtin/pack-objects.c: static void stdin_packs_add_pack_entries(struct strmap *packs,
      	for_each_string_list_item(item, &keys) {
      		struct stdin_pack_info *info = item->util;
    @@ builtin/pack-objects.c: static void stdin_packs_add_pack_entries(struct strmap *
      	struct strbuf buf = STRBUF_INIT;
      	struct strmap packs = STRMAP_INIT;
     @@ builtin/pack-objects.c: static void stdin_packs_read_input(struct rev_info *revs)
    - 
    - 		if (!*key)
      			continue;
    --		if (*key == '^')
    -+		if (*key == '^' ||
    -+		    (*key == '!' && mode == STDIN_PACKS_MODE_FOLLOW))
    + 		else if (*key == '^')
    + 			kind = STDIN_PACK_EXCLUDE_CLOSED;
    ++		else if (*key == '!' && mode == STDIN_PACKS_MODE_FOLLOW)
    ++			kind = STDIN_PACK_EXCLUDE_OPEN;
    + 
    + 		if (kind != STDIN_PACK_INCLUDE)
      			key++;
    - 
    - 		info = strmap_get(&packs, key);
    -@@ builtin/pack-objects.c: static void stdin_packs_read_input(struct rev_info *revs)
    - 
    - 		if (*buf.buf == '^')
    - 			info->kind |= STDIN_PACK_EXCLUDE_CLOSED;
    -+		else if (*buf.buf == '!' && mode == STDIN_PACKS_MODE_FOLLOW)
    -+			info->kind |= STDIN_PACK_EXCLUDE_OPEN;
    - 		else
    - 			info->kind |= STDIN_PACK_INCLUDE;
    - 
     @@ builtin/pack-objects.c: static void stdin_packs_read_input(struct rev_info *revs)
      			p->pack_keep_in_core = 1;
      		}
5:  23cb9f33dba = 5:  c9ad9a0c4ae repack: mark non-MIDX packs above the split as excluded-open

base-commit: 41688c1a2312f62f44435e1a6d03b4b904b5b0ec
-- 
2.53.0.724.gb20b077944a

  parent reply	other threads:[~2026-03-27 20:06 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-19 22:24 [PATCH 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
2026-03-19 22:24 ` [PATCH 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-24  7:39   ` Patrick Steinhardt
2026-03-25 23:03     ` Taylor Blau
2026-03-19 22:24 ` [PATCH 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-24  7:39   ` Patrick Steinhardt
2026-03-25 23:13     ` Taylor Blau
2026-03-19 22:24 ` [PATCH 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-19 22:24 ` [PATCH 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-21 16:57   ` Jeff King
2026-03-22 18:09     ` Taylor Blau
2026-03-25 23:19       ` Taylor Blau
2026-03-19 22:24 ` [PATCH 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-25 23:51 ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Taylor Blau
2026-03-25 23:51   ` [PATCH v2 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-25 23:51   ` [PATCH v2 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-26 20:40     ` Derrick Stolee
2026-03-26 21:44       ` Taylor Blau
2026-03-26 22:11         ` Junio C Hamano
2026-03-26 22:32           ` Taylor Blau
2026-03-27  0:29             ` Derrick Stolee
2026-03-27 17:51               ` Taylor Blau
2026-03-27 18:34                 ` Derrick Stolee
2026-03-27 15:52             ` Junio C Hamano
2026-03-26 22:37     ` Taylor Blau
2026-03-25 23:51   ` [PATCH v2 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-25 23:51   ` [PATCH v2 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-26 20:48     ` Derrick Stolee
2026-03-25 23:51   ` [PATCH v2 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-26 20:49     ` Derrick Stolee
2026-03-26 21:44       ` Taylor Blau
2026-03-26 20:51   ` [PATCH v2 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Derrick Stolee
2026-03-26 21:46     ` Taylor Blau
2026-03-27 20:06 ` Taylor Blau [this message]
2026-03-27 20:06   ` [PATCH v3 1/5] pack-objects: plug leak in `read_stdin_packs()` Taylor Blau
2026-03-27 20:06   ` [PATCH v3 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Taylor Blau
2026-03-27 20:06   ` [PATCH v3 3/5] t7704: demonstrate failure with once-cruft objects above the geometric split Taylor Blau
2026-03-27 20:06   ` [PATCH v3 4/5] pack-objects: support excluded-open packs with --stdin-packs Taylor Blau
2026-03-27 20:06   ` [PATCH v3 5/5] repack: mark non-MIDX packs above the split as excluded-open Taylor Blau
2026-03-27 20:16   ` [PATCH v3 0/5] pack-objects: handle excluded-but-open packs via `--stdin-packs=follow` Derrick Stolee
2026-03-27 20:43     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1774641999.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.