* [PATCH 0/3] Fix REF_DELTA chain bug in 'git index-pack'
@ 2025-04-23 17:40 Derrick Stolee via GitGitGadget
2025-04-23 17:40 ` [PATCH 1/3] test-tool: add pack-deltas helper Derrick Stolee via GitGitGadget
` (3 more replies)
0 siblings, 4 replies; 29+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-04-23 17:40 UTC (permalink / raw)
To: git; +Cc: gitster, peff, Derrick Stolee
When fetching content from a remote, 'git index-pack' processes the packfile
content, storing a packfile appropriate for on-disk storage and a pack-index
helping to perform random-access into that packfile. To help with
compression, the packfile sent over the wire can use REF_DELTAs in addition
to OFS_DELTAs to refer to objects that are already known to exist in the
client's repository. REF_DELTAs can also refer to objects within the
packfile, though this is not typically done.
Because this inter-pack REF_DELTA is not a typical data shape, a latent bug
has been waiting that causes 'git index-pack' to die() even on legitimate
packfile content that it could resolve.
This series resolves this problem while also creating a test helper for
constructing packfiles with specific objects represented in specific types
of deltas and in a given order. This should make it easier to create test
cases like this in the future instead of updating t/lib-pack.sh through
other means.
Thanks, -Stolee
Derrick Stolee (3):
test-tool: add pack-deltas helper
t5309: create failing test for 'git index-pack'
index-pack: allow revisiting REF_DELTA chains
Makefile | 1 +
builtin/index-pack.c | 58 ++++++++-------
t/helper/meson.build | 1 +
t/helper/test-pack-deltas.c | 140 +++++++++++++++++++++++++++++++++++
t/helper/test-tool.c | 1 +
t/helper/test-tool.h | 1 +
t/t5309-pack-delta-cycles.sh | 36 ++++++++-
7 files changed, 210 insertions(+), 28 deletions(-)
create mode 100644 t/helper/test-pack-deltas.c
base-commit: 4bbb303af69990ccd05fe3a2eb58a1ce036f8220
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1906%2Fderrickstolee%2Findex-pack-ref-deltas-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1906/derrickstolee/index-pack-ref-deltas-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1906
--
gitgitgadget
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 1/3] test-tool: add pack-deltas helper
2025-04-23 17:40 [PATCH 0/3] Fix REF_DELTA chain bug in 'git index-pack' Derrick Stolee via GitGitGadget
@ 2025-04-23 17:40 ` Derrick Stolee via GitGitGadget
2025-04-23 19:26 ` Junio C Hamano
` (2 more replies)
2025-04-23 17:40 ` [PATCH 2/3] t5309: create failing test for 'git index-pack' Derrick Stolee via GitGitGadget
` (2 subsequent siblings)
3 siblings, 3 replies; 29+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-04-23 17:40 UTC (permalink / raw)
To: git; +Cc: gitster, peff, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
When trying to demonstrate certain behavior in tests, it can be helpful
to create packfiles that have specific delta structures. 'git
pack-objects' uses various algorithms to select deltas based on their
compression rates, but that does not always demonstrate all possible
packfile shapes. This becomes especially important when wanting to test
'git index-pack' and its ability to parse certain pack shapes.
We have prior art in t/lib-pack.sh, where certain delta structures are
produced by manually writing certain opaque pack contents. However,
producing these script updates is cumbersome and difficult to do as a
contributor.
Instead, create a new test-tool, 'test-tool pack-deltas', that reads a
list of instructions for which objects to include in a packfile and how
those objects should be written in delta form.
At the moment, this only supports REF_DELTAs as those are the kinds of
deltas needed to exercise a bug in 'git index-pack'.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
Makefile | 1 +
t/helper/meson.build | 1 +
t/helper/test-pack-deltas.c | 140 ++++++++++++++++++++++++++++++++++++
t/helper/test-tool.c | 1 +
t/helper/test-tool.h | 1 +
5 files changed, 144 insertions(+)
create mode 100644 t/helper/test-pack-deltas.c
diff --git a/Makefile b/Makefile
index 13f9062a056..c4d21ccd3d1 100644
--- a/Makefile
+++ b/Makefile
@@ -821,6 +821,7 @@ TEST_BUILTINS_OBJS += test-mergesort.o
TEST_BUILTINS_OBJS += test-mktemp.o
TEST_BUILTINS_OBJS += test-name-hash.o
TEST_BUILTINS_OBJS += test-online-cpus.o
+TEST_BUILTINS_OBJS += test-pack-deltas.o
TEST_BUILTINS_OBJS += test-pack-mtimes.o
TEST_BUILTINS_OBJS += test-parse-options.o
TEST_BUILTINS_OBJS += test-parse-pathspec-file.o
diff --git a/t/helper/meson.build b/t/helper/meson.build
index d2cabaa2bcf..d4e8b26df8d 100644
--- a/t/helper/meson.build
+++ b/t/helper/meson.build
@@ -36,6 +36,7 @@ test_tool_sources = [
'test-mktemp.c',
'test-name-hash.c',
'test-online-cpus.c',
+ 'test-pack-deltas.c',
'test-pack-mtimes.c',
'test-parse-options.c',
'test-parse-pathspec-file.c',
diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
new file mode 100644
index 00000000000..db7d1c3cd1f
--- /dev/null
+++ b/t/helper/test-pack-deltas.c
@@ -0,0 +1,140 @@
+#define USE_THE_REPOSITORY_VARIABLE
+
+#include "test-tool.h"
+#include "git-compat-util.h"
+#include "delta.h"
+#include "git-zlib.h"
+#include "hash.h"
+#include "hex.h"
+#include "pack.h"
+#include "pack-objects.h"
+#include "setup.h"
+#include "strbuf.h"
+#include "string-list.h"
+
+static const char usage_str[] = "test-tool pack-deltas <n>";
+
+static unsigned long do_compress(void **pptr, unsigned long size)
+{
+ git_zstream stream;
+ void *in, *out;
+ unsigned long maxsize;
+
+ git_deflate_init(&stream, 1);
+ maxsize = git_deflate_bound(&stream, size);
+
+ in = *pptr;
+ out = xmalloc(maxsize);
+ *pptr = out;
+
+ stream.next_in = in;
+ stream.avail_in = size;
+ stream.next_out = out;
+ stream.avail_out = maxsize;
+ while (git_deflate(&stream, Z_FINISH) == Z_OK)
+ ; /* nothing */
+ git_deflate_end(&stream);
+
+ free(in);
+ return stream.total_out;
+}
+
+static void write_ref_delta(struct hashfile *f,
+ struct object_id *oid,
+ struct object_id *base)
+{
+ unsigned char header[MAX_PACK_OBJECT_HEADER];
+ unsigned long size, base_size, delta_size, compressed_size, hdrlen;
+ enum object_type type;
+ void *base_buf, *delta_buf;
+ void *buf = repo_read_object_file(the_repository,
+ oid, &type,
+ &size);
+
+ if (!buf)
+ die("unable to read %s", oid_to_hex(oid));
+
+ base_buf = repo_read_object_file(the_repository,
+ base, &type,
+ &base_size);
+
+ if (!base_buf)
+ die("unable to read %s", oid_to_hex(base));
+
+ delta_buf = diff_delta(base_buf, base_size,
+ buf, size, &delta_size, 0);
+
+ compressed_size = do_compress(&delta_buf, delta_size);
+
+ hdrlen = encode_in_pack_object_header(header, sizeof(header),
+ OBJ_REF_DELTA, delta_size);
+ hashwrite(f, header, hdrlen);
+ hashwrite(f, base->hash, the_repository->hash_algo->rawsz);
+ hashwrite(f, delta_buf, compressed_size);
+
+ free(buf);
+ free(base_buf);
+ free(delta_buf);
+}
+
+int cmd__pack_deltas(int argc, const char **argv)
+{
+ int N;
+ struct hashfile *f;
+ struct strbuf line = STRBUF_INIT;
+
+ if (argc != 2) {
+ usage(usage_str);
+ return -1;
+ }
+
+ N = atoi(argv[1]);
+
+ setup_git_directory();
+
+ f = hashfd(the_repository->hash_algo, 1, "<stdout>");
+ write_pack_header(f, N);
+
+ /* Read each line from stdin into 'line' */
+ while (strbuf_getline_lf(&line, stdin) != EOF) {
+ const char *type_str, *content_oid_str, *base_oid_str = NULL;
+ struct object_id content_oid, base_oid;
+ struct string_list items = STRING_LIST_INIT_NODUP;
+ /*
+ * Tokenize into two or three parts:
+ * 1. REF_DELTA, OFS_DELTA, or FULL.
+ * 2. The object ID for the content object.
+ * 3. The object ID for the base object (optional).
+ */
+ if (string_list_split_in_place(&items, line.buf, " ", 3) < 0)
+ die("invalid input format: %s", line.buf);
+
+ if (items.nr < 2)
+ die("invalid input format: %s", line.buf);
+
+ type_str = items.items[0].string;
+ content_oid_str = items.items[1].string;
+
+ if (get_oid_hex(content_oid_str, &content_oid))
+ die("invalid object: %s", content_oid_str);
+ if (items.nr >= 3) {
+ base_oid_str = items.items[2].string;
+ if (get_oid_hex(base_oid_str, &base_oid))
+ die("invalid object: %s", base_oid_str);
+ }
+
+ if (!strcmp(type_str, "REF_DELTA"))
+ write_ref_delta(f, &content_oid, &base_oid);
+ else if (!strcmp(type_str, "OFS_DELTA"))
+ die("OFS_DELTA not implemented");
+ else if (!strcmp(type_str, "FULL"))
+ die("FULL not implemented");
+ else
+ die("unknown pack type: %s", type_str);
+ }
+
+ finalize_hashfile(f, NULL, FSYNC_COMPONENT_PACK,
+ CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE);
+ strbuf_release(&line);
+ return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 50dc4dac4ed..74812ed86d3 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -46,6 +46,7 @@ static struct test_cmd cmds[] = {
{ "mktemp", cmd__mktemp },
{ "name-hash", cmd__name_hash },
{ "online-cpus", cmd__online_cpus },
+ { "pack-deltas", cmd__pack_deltas },
{ "pack-mtimes", cmd__pack_mtimes },
{ "parse-options", cmd__parse_options },
{ "parse-options-flags", cmd__parse_options_flags },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 6d62a5b53d9..2571a3ccfe8 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -39,6 +39,7 @@ int cmd__mergesort(int argc, const char **argv);
int cmd__mktemp(int argc, const char **argv);
int cmd__name_hash(int argc, const char **argv);
int cmd__online_cpus(int argc, const char **argv);
+int cmd__pack_deltas(int argc, const char **argv);
int cmd__pack_mtimes(int argc, const char **argv);
int cmd__parse_options(int argc, const char **argv);
int cmd__parse_options_flags(int argc, const char **argv);
--
gitgitgadget
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 2/3] t5309: create failing test for 'git index-pack'
2025-04-23 17:40 [PATCH 0/3] Fix REF_DELTA chain bug in 'git index-pack' Derrick Stolee via GitGitGadget
2025-04-23 17:40 ` [PATCH 1/3] test-tool: add pack-deltas helper Derrick Stolee via GitGitGadget
@ 2025-04-23 17:40 ` Derrick Stolee via GitGitGadget
2025-04-23 19:37 ` Junio C Hamano
2025-04-23 17:40 ` [PATCH 3/3] index-pack: allow revisiting REF_DELTA chains Derrick Stolee via GitGitGadget
2025-04-28 20:24 ` [PATCH v2 0/3] Fix REF_DELTA chain bug in 'git index-pack' Derrick Stolee via GitGitGadget
3 siblings, 1 reply; 29+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-04-23 17:40 UTC (permalink / raw)
To: git; +Cc: gitster, peff, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
This new test demonstrates some behavior where a valid packfile is being
rejected by the Git client due to the order in which it is resolving
REF_DELTAs.
The thin packfile has a REF_DELTA chain A->B->C where C is not included
in the packfile. However, the client repository contains both C and B
already. Thus, 'git index-pack' is able to resolve A before resolving B.
When resolving B, it then attempts to resolve any other REF_DELTAs that
are pointing to B as a base. This "revisits" A and complains as if there
is a cycle, but it did not actually detect a cycle.
A fix will arrive in the next change.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
t/t5309-pack-delta-cycles.sh | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/t/t5309-pack-delta-cycles.sh b/t/t5309-pack-delta-cycles.sh
index 60fc710bacb..9029f8a0dda 100755
--- a/t/t5309-pack-delta-cycles.sh
+++ b/t/t5309-pack-delta-cycles.sh
@@ -75,4 +75,30 @@ test_expect_success 'failover to a duplicate object in the same pack' '
test_must_fail git index-pack --fix-thin --stdin <recoverable.pack
'
+test_expect_failure 'index-pack works with thin pack A->B->C with B on disk' '
+ git init server &&
+ (
+ cd server &&
+ test_commit_bulk 4
+ ) &&
+
+ A=$(git -C server rev-parse HEAD^{tree}) &&
+ B=$(git -C server rev-parse HEAD~1^{tree}) &&
+ C=$(git -C server rev-parse HEAD~2^{tree}) &&
+ git -C server reset --hard HEAD~1 &&
+
+ cat >in <<-EOF &&
+ REF_DELTA $A $B
+ REF_DELTA $B $C
+ EOF
+
+ test-tool -C server pack-deltas 2 <in >thin.pack &&
+
+ git clone "file://$(pwd)/server" client &&
+ (
+ cd client &&
+ git index-pack --fix-thin --stdin <../thin.pack
+ )
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 3/3] index-pack: allow revisiting REF_DELTA chains
2025-04-23 17:40 [PATCH 0/3] Fix REF_DELTA chain bug in 'git index-pack' Derrick Stolee via GitGitGadget
2025-04-23 17:40 ` [PATCH 1/3] test-tool: add pack-deltas helper Derrick Stolee via GitGitGadget
2025-04-23 17:40 ` [PATCH 2/3] t5309: create failing test for 'git index-pack' Derrick Stolee via GitGitGadget
@ 2025-04-23 17:40 ` Derrick Stolee via GitGitGadget
2025-04-24 21:41 ` Junio C Hamano
2025-04-28 20:24 ` [PATCH v2 0/3] Fix REF_DELTA chain bug in 'git index-pack' Derrick Stolee via GitGitGadget
3 siblings, 1 reply; 29+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-04-23 17:40 UTC (permalink / raw)
To: git; +Cc: gitster, peff, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
As detailed in the previous changes to t5309-pack-delta-cycles.sh, the
logic within 'git index-pack' to analyze an incoming thin packfile with
REF_DELTAs is suspect. The algorithm is overly cautious around delta
cycles, and that leads in fact to failing even when there is no cycle.
This change adjusts the algorithm to no longer fail in these cases. In
fact, these cycle cases will no longer fail but more importantly the
valid cases will no longer fail, either. The resulting packfile from the
--fix-thin operation will not have cycles either since REF_DELTAs are
forbidden from the on-disk format and OFS_DELTAs are impossible to write
as a cycle.
The crux of the matter is how the algorithm works when the REF_DELTAs
point to base objects that exist in the local repository. When reading
the thin packfile, the object IDs for the delta objects are unknown so
we do not have the delta chain structure automatically. Instead, we need
to start somewhere by selecting a delta whose base is inside our current
object database.
Consider the case where the packfile has two REF_DELTA objects, A and B,
and the delta chain looks like "A depends on B" and "B depends on C" for
some third object C, where C is already in the current repository. The
algorithm _should_ start with all objects that depend on C, finding B,
and then moving on to all objects depending on B, finding A.
However, if the repository also already has object B, then the delta
chain can be analyzed in a different order. The deltas with base B can
be analyzed first, finding A, and then the deltas with base C are
analyzed, finding B. The algorithm currently continues to look for
objects that depend on B, finding A again. This fails due to A's
'real_type' member already being overwritten from OBJ_REF_DELTA to the
correct object type.
This scenario is possible in a typical 'git fetch' where the client does
not advertise B as a 'have' but requests A as a 'want' (and C is noticed
as a common object based on other 'have's). The reason this isn't
typically seen is that most Git servers use OFS_DELTAs to represent
deltas within a packfile. However, if a server uses only REF_DELTAs,
then this kind of issue can occur. There is nothing in the explicit
packfile format that states this use of inter-pack REF_DELTA is
incorrect, only that REF_DELTAs should not be used in the on-disk
representation to avoid cycles.
This die() was introduced in ab791dd138 (index-pack: fix race condition
with duplicate bases, 2014-08-29). Several refactors have adjusted the
error message and the surrounding logic, but this issue has existed for
a longer time as that was only a conversion from an assert().
The tests in t5309 originated in 3b910d0c5e (add tests for indexing
packs with delta cycles, 2013-08-23) and b2ef3d9ebb (test index-pack on
packs with recoverable delta cycles, 2013-08-23). These changes make
note that the current behavior of handling "resolvable" cycles is mostly
a documentation-only test, not that this behavior is the best way for
Git to handle the situation.
The fix here is somewhat complicated due to the amount of state being
adjusted by the loop within threaded_second_pass(). Instead of trying to
resume the start of the loop while adjusting the necessary context, I
chose to scan the REF_DELTAs depending on the current 'parent' and skip
any that have already been processed. This necessarily leaves us in a
state where 'child' and 'child_obj' could be left as NULL and that must
be handled later. There is also some careful handling around skipping
REF_DELTAs when there are also OFS_DELTAs depending on that parent.
There may be value in extending 'test-tool pack-deltas' to allow writing
OFS_DELTAs in order to exercise this logic across the delta types.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
builtin/index-pack.c | 58 ++++++++++++++++++++----------------
t/t5309-pack-delta-cycles.sh | 12 ++++++--
2 files changed, 41 insertions(+), 29 deletions(-)
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index de127c0ff13..dbe79701fb8 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1109,8 +1109,8 @@ static void *threaded_second_pass(void *data)
set_thread_data(data);
for (;;) {
struct base_data *parent = NULL;
- struct object_entry *child_obj;
- struct base_data *child;
+ struct object_entry *child_obj = NULL;
+ struct base_data *child = NULL;
counter_lock();
display_progress(progress, nr_resolved_deltas);
@@ -1137,15 +1137,18 @@ static void *threaded_second_pass(void *data)
parent = list_first_entry(&work_head, struct base_data,
list);
- if (parent->ref_first <= parent->ref_last) {
+ while (parent->ref_first <= parent->ref_last) {
int offset = ref_deltas[parent->ref_first++].obj_no;
child_obj = objects + offset;
- if (child_obj->real_type != OBJ_REF_DELTA)
- die("REF_DELTA at offset %"PRIuMAX" already resolved (duplicate base %s?)",
- (uintmax_t) child_obj->idx.offset,
- oid_to_hex(&parent->obj->idx.oid));
+ if (child_obj->real_type != OBJ_REF_DELTA) {
+ child_obj = NULL;
+ continue;
+ }
child_obj->real_type = parent->obj->real_type;
- } else {
+ break;
+ }
+
+ if (!child_obj && parent->ofs_first <= parent->ofs_last) {
child_obj = objects +
ofs_deltas[parent->ofs_first++].obj_no;
assert(child_obj->real_type == OBJ_OFS_DELTA);
@@ -1178,29 +1181,32 @@ static void *threaded_second_pass(void *data)
}
work_unlock();
- if (parent) {
- child = resolve_delta(child_obj, parent);
- if (!child->children_remaining)
- FREE_AND_NULL(child->data);
- } else {
- child = make_base(child_obj, NULL);
- if (child->children_remaining) {
- /*
- * Since this child has its own delta children,
- * we will need this data in the future.
- * Inflate now so that future iterations will
- * have access to this object's data while
- * outside the work mutex.
- */
- child->data = get_data_from_pack(child_obj);
- child->size = child_obj->size;
+ if (child_obj) {
+ if (parent) {
+ child = resolve_delta(child_obj, parent);
+ if (!child->children_remaining)
+ FREE_AND_NULL(child->data);
+ } else{
+ child = make_base(child_obj, NULL);
+ if (child->children_remaining) {
+ /*
+ * Since this child has its own delta children,
+ * we will need this data in the future.
+ * Inflate now so that future iterations will
+ * have access to this object's data while
+ * outside the work mutex.
+ */
+ child->data = get_data_from_pack(child_obj);
+ child->size = child_obj->size;
+ }
}
}
work_lock();
if (parent)
parent->retain_data--;
- if (child->data) {
+
+ if (child && child->data) {
/*
* This child has its own children, so add it to
* work_head.
@@ -1209,7 +1215,7 @@ static void *threaded_second_pass(void *data)
base_cache_used += child->size;
prune_base_data(NULL);
free_base_data(child);
- } else {
+ } else if (child) {
/*
* This child does not have its own children. It may be
* the last descendant of its ancestors; free those
diff --git a/t/t5309-pack-delta-cycles.sh b/t/t5309-pack-delta-cycles.sh
index 9029f8a0dda..fc3594aaae9 100755
--- a/t/t5309-pack-delta-cycles.sh
+++ b/t/t5309-pack-delta-cycles.sh
@@ -60,7 +60,10 @@ test_expect_success 'index-pack detects REF_DELTA cycles' '
test_expect_success 'failover to an object in another pack' '
clear_packs &&
git index-pack --stdin <ab.pack &&
- test_must_fail git index-pack --stdin --fix-thin <cycle.pack
+
+ # This cycle does not fail since the existence of A & B in
+ # the repo allows us to resolve the cycle.
+ git index-pack --stdin --fix-thin <cycle.pack
'
test_expect_success 'failover to a duplicate object in the same pack' '
@@ -72,10 +75,13 @@ test_expect_success 'failover to a duplicate object in the same pack' '
pack_obj $A
} >recoverable.pack &&
pack_trailer recoverable.pack &&
- test_must_fail git index-pack --fix-thin --stdin <recoverable.pack
+
+ # This cycle does not fail since the existence of a full copy
+ # of A in the pack allows us to resolve the cycle.
+ git index-pack --fix-thin --stdin <recoverable.pack
'
-test_expect_failure 'index-pack works with thin pack A->B->C with B on disk' '
+test_expect_success 'index-pack works with thin pack A->B->C with B on disk' '
git init server &&
(
cd server &&
--
gitgitgadget
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] test-tool: add pack-deltas helper
2025-04-23 17:40 ` [PATCH 1/3] test-tool: add pack-deltas helper Derrick Stolee via GitGitGadget
@ 2025-04-23 19:26 ` Junio C Hamano
2025-04-23 19:32 ` Derrick Stolee
2025-04-24 19:41 ` Junio C Hamano
2025-04-25 4:34 ` Patrick Steinhardt
2 siblings, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2025-04-23 19:26 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, peff, Derrick Stolee
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Derrick Stolee <stolee@gmail.com>
>
> When trying to demonstrate certain behavior in tests, it can be helpful
> to create packfiles that have specific delta structures. 'git
> pack-objects' uses various algorithms to select deltas based on their
> compression rates, but that does not always demonstrate all possible
> packfile shapes. This becomes especially important when wanting to test
> 'git index-pack' and its ability to parse certain pack shapes.
>
> We have prior art in t/lib-pack.sh, where certain delta structures are
> produced by manually writing certain opaque pack contents. However,
> producing these script updates is cumbersome and difficult to do as a
> contributor.
>
> Instead, create a new test-tool, 'test-tool pack-deltas', that reads a
> list of instructions for which objects to include in a packfile and how
> those objects should be written in delta form.
>
> At the moment, this only supports REF_DELTAs as those are the kinds of
> deltas needed to exercise a bug in 'git index-pack'.
Wonderful writing. I agree with the destination where this effort
wants to go, including the decision that starting with ref-delta
only is a good enough first step.
As to the implementation, I was a tiny little bit bummed to see
that, even though it does share the code with the real pack-objects
code paths to compute delta data by calling diff_delta(), and to
write per-object header by calling encode_in_pack_object_header(),
it has its own compression loop that does not even do an error
checking after calling into zlib deflate machinery.
Perhaps that is unavoidable due to the code structure of the
production code.
> +static const char usage_str[] = "test-tool pack-deltas <n>";
> ...
> +int cmd__pack_deltas(int argc, const char **argv)
> +{
> + int N;
> + struct hashfile *f;
> + struct strbuf line = STRBUF_INIT;
> +
> + if (argc != 2) {
> + usage(usage_str);
> + return -1;
> + }
> +
> + N = atoi(argv[1]);
It somewhat looks strange to see an uppercase N used as a variable
name. Together with the usage string, how about renaming "N" and
"n" after "number of objects", e.g.
test-tool pack-deltas <num-objects>
int num_objects;
or something?
Other than that, looking very good.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] test-tool: add pack-deltas helper
2025-04-23 19:26 ` Junio C Hamano
@ 2025-04-23 19:32 ` Derrick Stolee
0 siblings, 0 replies; 29+ messages in thread
From: Derrick Stolee @ 2025-04-23 19:32 UTC (permalink / raw)
To: Junio C Hamano, Derrick Stolee via GitGitGadget; +Cc: git, peff
On 4/23/2025 3:26 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Derrick Stolee <stolee@gmail.com>
>>
>> When trying to demonstrate certain behavior in tests, it can be helpful
>> to create packfiles that have specific delta structures. 'git
>> pack-objects' uses various algorithms to select deltas based on their
>> compression rates, but that does not always demonstrate all possible
>> packfile shapes. This becomes especially important when wanting to test
>> 'git index-pack' and its ability to parse certain pack shapes.
>>
>> We have prior art in t/lib-pack.sh, where certain delta structures are
>> produced by manually writing certain opaque pack contents. However,
>> producing these script updates is cumbersome and difficult to do as a
>> contributor.
>>
>> Instead, create a new test-tool, 'test-tool pack-deltas', that reads a
>> list of instructions for which objects to include in a packfile and how
>> those objects should be written in delta form.
>>
>> At the moment, this only supports REF_DELTAs as those are the kinds of
>> deltas needed to exercise a bug in 'git index-pack'.
>
> Wonderful writing. I agree with the destination where this effort
> wants to go, including the decision that starting with ref-delta
> only is a good enough first step.
>
> As to the implementation, I was a tiny little bit bummed to see
> that, even though it does share the code with the real pack-objects
> code paths to compute delta data by calling diff_delta(), and to
> write per-object header by calling encode_in_pack_object_header(),
> it has its own compression loop that does not even do an error
> checking after calling into zlib deflate machinery.
I could strengthen these options to help folks more quickly understand
potential failures as being part of the pack write instead of them
failing during the later pack read.
> Perhaps that is unavoidable due to the code structure of the
> production code.
I briefly considered extracting some code out of builtin/pack-objects.c
but it relies heavily on globals and context that I won't have in this
helper. I'm open to suggestions for how I can safely share more code,
but my initial attempt required too much refactoring to be worth it.
I am grateful for the amount of code from pack-write.c that I _was_
able to reuse.
>> +static const char usage_str[] = "test-tool pack-deltas <n>";
>> ...
>> +int cmd__pack_deltas(int argc, const char **argv)
>> +{
>> + int N;
>> + struct hashfile *f;
>> + struct strbuf line = STRBUF_INIT;
>> +
>> + if (argc != 2) {
>> + usage(usage_str);
>> + return -1;
>> + }
>> +
>> + N = atoi(argv[1]);
>
> It somewhat looks strange to see an uppercase N used as a variable
> name. Together with the usage string, how about renaming "N" and
> "n" after "number of objects", e.g.
>
> test-tool pack-deltas <num-objects>
> int num_objects;
>
> or something?
I definitely should have used a better name here. Thanks.
-Stolee
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2/3] t5309: create failing test for 'git index-pack'
2025-04-23 17:40 ` [PATCH 2/3] t5309: create failing test for 'git index-pack' Derrick Stolee via GitGitGadget
@ 2025-04-23 19:37 ` Junio C Hamano
0 siblings, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2025-04-23 19:37 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, peff, Derrick Stolee
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Derrick Stolee <stolee@gmail.com>
>
> This new test demonstrates some behavior where a valid packfile is being
> rejected by the Git client due to the order in which it is resolving
> REF_DELTAs.
>
> The thin packfile has a REF_DELTA chain A->B->C where C is not included
> in the packfile. However, the client repository contains both C and B
> already. Thus, 'git index-pack' is able to resolve A before resolving B.
In order to reconstitute A, B is needed (which recipient has), and
in order to reconstitute B, C is needed (which recipient also has).
The index-pack sees delta based on B to recreate A; it should be
able to reconstitute A using B that already exists.
OK.
> When resolving B, it then attempts to resolve any other REF_DELTAs that
> are pointing to B as a base. This "revisits" A and complains as if there
> is a cycle, but it did not actually detect a cycle.
That's interesting.
> +test_expect_failure 'index-pack works with thin pack A->B->C with B on disk' '
> + git init server &&
> + (
> + cd server &&
> + test_commit_bulk 4
> + ) &&
> +
> + A=$(git -C server rev-parse HEAD^{tree}) &&
> + B=$(git -C server rev-parse HEAD~1^{tree}) &&
> + C=$(git -C server rev-parse HEAD~2^{tree}) &&
> + git -C server reset --hard HEAD~1 &&
> +
> + cat >in <<-EOF &&
> + REF_DELTA $A $B
> + REF_DELTA $B $C
> + EOF
> +
> + test-tool -C server pack-deltas 2 <in >thin.pack &&
This is minor, but I somehow find it easier to follow without the
temporary file, i.e.
test-tool -C server pack-deltas 2 >thin.pack <<-EOF &&
REF_DELTA $A $B
REF_DELTA $B $C
EOF
> + git clone "file://$(pwd)/server" client &&
This truly loses A from the resulting "client" repository, but the
history still can reach B and C.
> + (
> + cd client &&
> + git index-pack --fix-thin --stdin <../thin.pack
> + )
> +'
Makes sense.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] test-tool: add pack-deltas helper
2025-04-23 17:40 ` [PATCH 1/3] test-tool: add pack-deltas helper Derrick Stolee via GitGitGadget
2025-04-23 19:26 ` Junio C Hamano
@ 2025-04-24 19:41 ` Junio C Hamano
2025-04-24 20:06 ` Derrick Stolee
2025-04-25 4:34 ` Patrick Steinhardt
2 siblings, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2025-04-24 19:41 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, peff, Derrick Stolee
I needed this to make
$ SANITIZE=leak GIT_TEST_PASSING_SANITIZE_LEAK=true make
$ cd t && sh t5309-pack-delta-cycles.sh
pass.
--- >8 ------ >8 ------ >8 ---
Subject: [PATCH] fixup! test-tool: add pack-deltas helper
t/helper/test-pack-deltas.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
index db7d1c3cd1..c8e837ea06 100644
--- a/t/helper/test-pack-deltas.c
+++ b/t/helper/test-pack-deltas.c
@@ -122,6 +122,7 @@ int cmd__pack_deltas(int argc, const char **argv)
if (get_oid_hex(base_oid_str, &base_oid))
die("invalid object: %s", base_oid_str);
}
+ string_list_clear(&items, 0);
if (!strcmp(type_str, "REF_DELTA"))
write_ref_delta(f, &content_oid, &base_oid);
--
2.49.0-555-g43235db9c8
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] test-tool: add pack-deltas helper
2025-04-24 19:41 ` Junio C Hamano
@ 2025-04-24 20:06 ` Derrick Stolee
2025-04-24 20:56 ` Junio C Hamano
0 siblings, 1 reply; 29+ messages in thread
From: Derrick Stolee @ 2025-04-24 20:06 UTC (permalink / raw)
To: Junio C Hamano, Derrick Stolee via GitGitGadget; +Cc: git, peff
On 4/24/2025 3:41 PM, Junio C Hamano wrote:
> I needed this to make
>
> $ SANITIZE=leak GIT_TEST_PASSING_SANITIZE_LEAK=true make
> $ cd t && sh t5309-pack-delta-cycles.sh
>
> pass.
> --- >8 ------ >8 ------ >8 ---
> Subject: [PATCH] fixup! test-tool: add pack-deltas helper
>
> t/helper/test-pack-deltas.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
> index db7d1c3cd1..c8e837ea06 100644
> --- a/t/helper/test-pack-deltas.c
> +++ b/t/helper/test-pack-deltas.c
> @@ -122,6 +122,7 @@ int cmd__pack_deltas(int argc, const char **argv)
> if (get_oid_hex(base_oid_str, &base_oid))
> die("invalid object: %s", base_oid_str);
> }
> + string_list_clear(&items, 0);
Thanks. I'll make sure to apply it. My GGG PR validation was broken
top-to-bottom due to other environmental issues so I had not seen
this failure myself.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] test-tool: add pack-deltas helper
2025-04-24 20:06 ` Derrick Stolee
@ 2025-04-24 20:56 ` Junio C Hamano
0 siblings, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2025-04-24 20:56 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Derrick Stolee via GitGitGadget, git, peff
Derrick Stolee <stolee@gmail.com> writes:
> On 4/24/2025 3:41 PM, Junio C Hamano wrote:
>> I needed this to make
>>
>> $ SANITIZE=leak GIT_TEST_PASSING_SANITIZE_LEAK=true make
>> $ cd t && sh t5309-pack-delta-cycles.sh
>>
>> pass.
>> --- >8 ------ >8 ------ >8 ---
>> Subject: [PATCH] fixup! test-tool: add pack-deltas helper
>>
>> t/helper/test-pack-deltas.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
>> index db7d1c3cd1..c8e837ea06 100644
>> --- a/t/helper/test-pack-deltas.c
>> +++ b/t/helper/test-pack-deltas.c
>> @@ -122,6 +122,7 @@ int cmd__pack_deltas(int argc, const char **argv)
>> if (get_oid_hex(base_oid_str, &base_oid))
>> die("invalid object: %s", base_oid_str);
>> }
>> + string_list_clear(&items, 0);
>
> Thanks. I'll make sure to apply it. My GGG PR validation was broken
> top-to-bottom due to other environmental issues so I had not seen
> this failure myself.
I squashed this in so unless there are other things you need to
change, this alone does not make it necessary to reroll the series.
Thanks.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 3/3] index-pack: allow revisiting REF_DELTA chains
2025-04-23 17:40 ` [PATCH 3/3] index-pack: allow revisiting REF_DELTA chains Derrick Stolee via GitGitGadget
@ 2025-04-24 21:41 ` Junio C Hamano
2025-04-25 3:49 ` Derrick Stolee
0 siblings, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2025-04-24 21:41 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, peff, Derrick Stolee
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Derrick Stolee <stolee@gmail.com>
>
> As detailed in the previous changes to t5309-pack-delta-cycles.sh, the
> logic within 'git index-pack' to analyze an incoming thin packfile with
> REF_DELTAs is suspect. The algorithm is overly cautious around delta
> cycles, and that leads in fact to failing even when there is no cycle.
>
> This change adjusts the algorithm to no longer fail in these cases. In
> fact, these cycle cases will no longer fail but more importantly the
> valid cases will no longer fail, either. The resulting packfile from the
> --fix-thin operation will not have cycles either since REF_DELTAs are
> forbidden from the on-disk format and OFS_DELTAs are impossible to write
> as a cycle.
Loosening cycle avoidance indeed is worrysome. How do we guarantee
"since REF_DELTAs are forbidden from the on-disk format" (it is
obvious with OFS_DELTA only you cannot form a cycle)? By code
inspection and in-code comment for the code path of "--fix-thin"?
I think the most interesting question would be how, with the
loosening of the cycle check in this patch, we would still protect
against a malicious on-the-wire packdata that has a cycle in it
already, where deltas cannot be resolved.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 3/3] index-pack: allow revisiting REF_DELTA chains
2025-04-24 21:41 ` Junio C Hamano
@ 2025-04-25 3:49 ` Derrick Stolee
0 siblings, 0 replies; 29+ messages in thread
From: Derrick Stolee @ 2025-04-25 3:49 UTC (permalink / raw)
To: Junio C Hamano, Derrick Stolee via GitGitGadget; +Cc: git, peff
On 4/24/2025 5:41 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Derrick Stolee <stolee@gmail.com>
>>
>> As detailed in the previous changes to t5309-pack-delta-cycles.sh, the
>> logic within 'git index-pack' to analyze an incoming thin packfile with
>> REF_DELTAs is suspect. The algorithm is overly cautious around delta
>> cycles, and that leads in fact to failing even when there is no cycle.
>>
>> This change adjusts the algorithm to no longer fail in these cases. In
>> fact, these cycle cases will no longer fail but more importantly the
>> valid cases will no longer fail, either. The resulting packfile from the
>> --fix-thin operation will not have cycles either since REF_DELTAs are
>> forbidden from the on-disk format and OFS_DELTAs are impossible to write
>> as a cycle.
>
> Loosening cycle avoidance indeed is worrysome. How do we guarantee
> "since REF_DELTAs are forbidden from the on-disk format" (it is
> obvious with OFS_DELTA only you cannot form a cycle)? By code
> inspection and in-code comment for the code path of "--fix-thin"?
>
> I think the most interesting question would be how, with the
> loosening of the cycle check in this patch, we would still protect
> against a malicious on-the-wire packdata that has a cycle in it
> already, where deltas cannot be resolved.
The good news is two-fold:
1. We have a test in t5309 (index-pack detects REF_DELTA cycles) that
checks this case and did not change behavior with this patch.
2. If we don't already have any of the objects that exist in the cycle,
then we can't _start_ expanding the objects to their full contents
as we don't have their bases. So what is left is a list of REF_DELTAs
that have not been processed but also a list of base object IDs that
we can't load. We'll fail without writing those objects to disk.
Does that satisfy your concerns in this space?
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] test-tool: add pack-deltas helper
2025-04-23 17:40 ` [PATCH 1/3] test-tool: add pack-deltas helper Derrick Stolee via GitGitGadget
2025-04-23 19:26 ` Junio C Hamano
2025-04-24 19:41 ` Junio C Hamano
@ 2025-04-25 4:34 ` Patrick Steinhardt
2025-04-25 9:34 ` Johannes Schindelin
2 siblings, 1 reply; 29+ messages in thread
From: Patrick Steinhardt @ 2025-04-25 4:34 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, gitster, peff, Derrick Stolee
On Wed, Apr 23, 2025 at 05:40:02PM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
> new file mode 100644
> index 00000000000..db7d1c3cd1f
> --- /dev/null
> +++ b/t/helper/test-pack-deltas.c
> @@ -0,0 +1,140 @@
[snip]
> +int cmd__pack_deltas(int argc, const char **argv)
> +{
> + int N;
> + struct hashfile *f;
> + struct strbuf line = STRBUF_INIT;
> +
> + if (argc != 2) {
> + usage(usage_str);
> + return -1;
> + }
> +
> + N = atoi(argv[1]);
Is there a reason why we don't use `parse_options()` here? It might make
this tool easier to use and extend going forward, and we wouldn't have
to care about invalid arguments. Right now, we silently accept a
non-integer argument and do the wrong thing.
Patrick
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] test-tool: add pack-deltas helper
2025-04-25 4:34 ` Patrick Steinhardt
@ 2025-04-25 9:34 ` Johannes Schindelin
2025-04-25 9:45 ` Patrick Steinhardt
0 siblings, 1 reply; 29+ messages in thread
From: Johannes Schindelin @ 2025-04-25 9:34 UTC (permalink / raw)
To: Patrick Steinhardt
Cc: Derrick Stolee via GitGitGadget, git, gitster, peff,
Derrick Stolee
Hi Patrick,
On Fri, 25 Apr 2025, Patrick Steinhardt wrote:
> On Wed, Apr 23, 2025 at 05:40:02PM +0000, Derrick Stolee via GitGitGadget wrote:
> > diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
> > new file mode 100644
> > index 00000000000..db7d1c3cd1f
> > --- /dev/null
> > +++ b/t/helper/test-pack-deltas.c
> > @@ -0,0 +1,140 @@
> [snip]
> > +int cmd__pack_deltas(int argc, const char **argv)
> > +{
> > + int N;
> > + struct hashfile *f;
> > + struct strbuf line = STRBUF_INIT;
> > +
> > + if (argc != 2) {
> > + usage(usage_str);
> > + return -1;
> > + }
> > +
> > + N = atoi(argv[1]);
>
> Is there a reason why we don't use `parse_options()` here? It might make
> this tool easier to use and extend going forward, and we wouldn't have
> to care about invalid arguments. Right now, we silently accept a
> non-integer argument and do the wrong thing.
I think that `parse_options()` would be overkill here because:
- This is a _mandatory_ argument, not an optional one.
- The required data type is `uint32_t`, and `parse_options()` has no
support for that.
But you do have a good point in that we may want to validate the data type
(even if technically, this is not a user-facing program, it's a test
helper that is used under tight control by Git's own test suite).
Consequently, I would suggest this fixup instead:
-- snipsnap --
Subject: [PATCH] fixup! test-tool: add pack-deltas helper
Let's make the command-line parsing a bit more stringent. We _could_
use `parse_options()`, but that would be overkill for a single,
non-optional argument. Besides, it would not bring any benefit, as the
parsed value needs to fit in the `uint32_t` type, and `parse_options()`
has no provision to ensure that.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
t/helper/test-pack-deltas.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
index 4af69bdc05d3..f95d8ee16768 100644
--- a/t/helper/test-pack-deltas.c
+++ b/t/helper/test-pack-deltas.c
@@ -8,11 +8,12 @@
#include "hex.h"
#include "pack.h"
#include "pack-objects.h"
+#include "parse.h"
#include "setup.h"
#include "strbuf.h"
#include "string-list.h"
-static const char usage_str[] = "test-tool pack-deltas <n>";
+static const char usage_str[] = "test-tool pack-deltas <nr_entries>";
static unsigned long do_compress(void **pptr, unsigned long size)
{
@@ -79,7 +80,7 @@ static void write_ref_delta(struct hashfile *f,
int cmd__pack_deltas(int argc, const char **argv)
{
- int N;
+ unsigned long n;
struct hashfile *f;
struct strbuf line = STRBUF_INIT;
@@ -88,12 +89,13 @@ int cmd__pack_deltas(int argc, const char **argv)
return -1;
}
- N = atoi(argv[1]);
+ if (!git_parse_ulong(argv[1], &n) || n != (uint32_t)n)
+ die("invalid number of objects: %s", argv[1]);
setup_git_directory();
f = hashfd(1, "<stdout>");
- write_pack_header(f, N);
+ write_pack_header(f, n);
/* Read each line from stdin into 'line' */
while (strbuf_getline_lf(&line, stdin) != EOF) {
--
2.49.0.windows.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] test-tool: add pack-deltas helper
2025-04-25 9:34 ` Johannes Schindelin
@ 2025-04-25 9:45 ` Patrick Steinhardt
2025-04-25 9:51 ` Johannes Schindelin
2025-04-25 16:27 ` Junio C Hamano
0 siblings, 2 replies; 29+ messages in thread
From: Patrick Steinhardt @ 2025-04-25 9:45 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Derrick Stolee via GitGitGadget, git, gitster, peff,
Derrick Stolee
On Fri, Apr 25, 2025 at 11:34:01AM +0200, Johannes Schindelin wrote:
> Hi Patrick,
>
> On Fri, 25 Apr 2025, Patrick Steinhardt wrote:
>
> > On Wed, Apr 23, 2025 at 05:40:02PM +0000, Derrick Stolee via GitGitGadget wrote:
> > > diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
> > > new file mode 100644
> > > index 00000000000..db7d1c3cd1f
> > > --- /dev/null
> > > +++ b/t/helper/test-pack-deltas.c
> > > @@ -0,0 +1,140 @@
> > [snip]
> > > +int cmd__pack_deltas(int argc, const char **argv)
> > > +{
> > > + int N;
> > > + struct hashfile *f;
> > > + struct strbuf line = STRBUF_INIT;
> > > +
> > > + if (argc != 2) {
> > > + usage(usage_str);
> > > + return -1;
> > > + }
> > > +
> > > + N = atoi(argv[1]);
> >
> > Is there a reason why we don't use `parse_options()` here? It might make
> > this tool easier to use and extend going forward, and we wouldn't have
> > to care about invalid arguments. Right now, we silently accept a
> > non-integer argument and do the wrong thing.
>
> I think that `parse_options()` would be overkill here because:
>
> - This is a _mandatory_ argument, not an optional one.
>
> - The required data type is `uint32_t`, and `parse_options()` has no
> support for that.
Support for that has been merged just this week via 2bc5414c411 (Merge
branch 'ps/parse-options-integers', 2025-04-24).
> But you do have a good point in that we may want to validate the data type
> (even if technically, this is not a user-facing program, it's a test
> helper that is used under tight control by Git's own test suite).
>
> Consequently, I would suggest this fixup instead:
But in any case, I'd be equally fine with your suggestion.
Patrick
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] test-tool: add pack-deltas helper
2025-04-25 9:45 ` Patrick Steinhardt
@ 2025-04-25 9:51 ` Johannes Schindelin
2025-04-25 16:27 ` Junio C Hamano
1 sibling, 0 replies; 29+ messages in thread
From: Johannes Schindelin @ 2025-04-25 9:51 UTC (permalink / raw)
To: Patrick Steinhardt
Cc: Derrick Stolee via GitGitGadget, git, gitster, peff,
Derrick Stolee
Hi Patrick,
On Fri, 25 Apr 2025, Patrick Steinhardt wrote:
> On Fri, Apr 25, 2025 at 11:34:01AM +0200, Johannes Schindelin wrote:
> >
> > On Fri, 25 Apr 2025, Patrick Steinhardt wrote:
> >
> > > On Wed, Apr 23, 2025 at 05:40:02PM +0000, Derrick Stolee via GitGitGadget wrote:
> > > > diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
> > > > new file mode 100644
> > > > index 00000000000..db7d1c3cd1f
> > > > --- /dev/null
> > > > +++ b/t/helper/test-pack-deltas.c
> > > > @@ -0,0 +1,140 @@
> > > [snip]
> > > > +int cmd__pack_deltas(int argc, const char **argv)
> > > > +{
> > > > + int N;
> > > > + struct hashfile *f;
> > > > + struct strbuf line = STRBUF_INIT;
> > > > +
> > > > + if (argc != 2) {
> > > > + usage(usage_str);
> > > > + return -1;
> > > > + }
> > > > +
> > > > + N = atoi(argv[1]);
> > >
> > > Is there a reason why we don't use `parse_options()` here? It might make
> > > this tool easier to use and extend going forward, and we wouldn't have
> > > to care about invalid arguments. Right now, we silently accept a
> > > non-integer argument and do the wrong thing.
> >
> > I think that `parse_options()` would be overkill here because:
> >
> > - This is a _mandatory_ argument, not an optional one.
> >
> > - The required data type is `uint32_t`, and `parse_options()` has no
> > support for that.
>
> Support for that has been merged just this week via 2bc5414c411 (Merge
> branch 'ps/parse-options-integers', 2025-04-24).
That's too new for me to be useful, as I have to work on top of v2.49.0
(see https://github.com/microsoft/git/pull/745).
> > But you do have a good point in that we may want to validate the data type
> > (even if technically, this is not a user-facing program, it's a test
> > helper that is used under tight control by Git's own test suite).
> >
> > Consequently, I would suggest this fixup instead:
>
> But in any case, I'd be equally fine with your suggestion.
Thanks!
Johannes
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] test-tool: add pack-deltas helper
2025-04-25 9:45 ` Patrick Steinhardt
2025-04-25 9:51 ` Johannes Schindelin
@ 2025-04-25 16:27 ` Junio C Hamano
2025-04-28 15:22 ` Derrick Stolee
1 sibling, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2025-04-25 16:27 UTC (permalink / raw)
To: Patrick Steinhardt
Cc: Johannes Schindelin, Derrick Stolee via GitGitGadget, git, peff,
Derrick Stolee
Patrick Steinhardt <ps@pks.im> writes:
>> > Is there a reason why we don't use `parse_options()` here? It might make
>> > this tool easier to use and extend going forward, and we wouldn't have
>> > to care about invalid arguments. Right now, we silently accept a
>> > non-integer argument and do the wrong thing.
>>
>> I think that `parse_options()` would be overkill here because:
>>
>> - This is a _mandatory_ argument, not an optional one.
>>
>> - The required data type is `uint32_t`, and `parse_options()` has no
>> support for that.
>
> Support for that has been merged just this week via 2bc5414c411 (Merge
> branch 'ps/parse-options-integers', 2025-04-24).
>
>> But you do have a good point in that we may want to validate the data type
>> (even if technically, this is not a user-facing program, it's a test
>> helper that is used under tight control by Git's own test suite).
>>
>> Consequently, I would suggest this fixup instead:
>
> But in any case, I'd be equally fine with your suggestion.
Yeah, I think we clearly showed our "it's just test helper, whose
callers are supposed to know what they are doing" attitude, but with
proper helpers, it is not too much additional effort to do the right
thing.
Thanks, both.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] test-tool: add pack-deltas helper
2025-04-25 16:27 ` Junio C Hamano
@ 2025-04-28 15:22 ` Derrick Stolee
2025-04-28 16:37 ` Junio C Hamano
0 siblings, 1 reply; 29+ messages in thread
From: Derrick Stolee @ 2025-04-28 15:22 UTC (permalink / raw)
To: Junio C Hamano, Patrick Steinhardt
Cc: Johannes Schindelin, Derrick Stolee via GitGitGadget, git, peff
On 4/25/2025 12:27 PM, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
>
>>>> Is there a reason why we don't use `parse_options()` here? It might make
>>>> this tool easier to use and extend going forward, and we wouldn't have
>>>> to care about invalid arguments. Right now, we silently accept a
>>>> non-integer argument and do the wrong thing.
>>>
>>> I think that `parse_options()` would be overkill here because:
>>>
>>> - This is a _mandatory_ argument, not an optional one.
>>>
>>> - The required data type is `uint32_t`, and `parse_options()` has no
>>> support for that.
>>
>> Support for that has been merged just this week via 2bc5414c411 (Merge
>> branch 'ps/parse-options-integers', 2025-04-24).
The thing that confused me even with those changes is that this is a
_positional_ argument and we don't have a way to say "parse the 1st
positional argument into an integer".
>>> But you do have a good point in that we may want to validate the data type
>>> (even if technically, this is not a user-facing program, it's a test
>>> helper that is used under tight control by Git's own test suite).
>>>
>>> Consequently, I would suggest this fixup instead:
>>
>> But in any case, I'd be equally fine with your suggestion.
>
> Yeah, I think we clearly showed our "it's just test helper, whose
> callers are supposed to know what they are doing" attitude, but with
> proper helpers, it is not too much additional effort to do the right
> thing.
But with this philosophy in mind I can change the CLI to be of the form
"--num-objects <n>" to use the parse-options feature. This should make
things more extensible in the future.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] test-tool: add pack-deltas helper
2025-04-28 15:22 ` Derrick Stolee
@ 2025-04-28 16:37 ` Junio C Hamano
2025-04-28 18:59 ` Derrick Stolee
0 siblings, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2025-04-28 16:37 UTC (permalink / raw)
To: Derrick Stolee
Cc: Patrick Steinhardt, Johannes Schindelin,
Derrick Stolee via GitGitGadget, git, peff
Derrick Stolee <stolee@gmail.com> writes:
>> Yeah, I think we clearly showed our "it's just test helper, whose
>> callers are supposed to know what they are doing" attitude, but with
>> proper helpers, it is not too much additional effort to do the right
>> thing.
>
> But with this philosophy in mind I can change the CLI to be of the form
> "--num-objects <n>" to use the parse-options feature. This should make
> things more extensible in the future.
True. If we are aiming to deliver this to end-user's hands in some
future, I agree that we want to make it extensible, make it dtrt
without being told, and make it harder to give wrong input. If we
are going in that direction [*], I suspect this should not be a
separate and independent input---rather, shouldn't the tool already
_know_ what objects it placed in the resulting output stream, and
should be able to _count_ that number by itself? One thing it lets
us do to have this as a separate number is to create an invalid pack
stream where the header gives a wrong number, and as a test tool,
that may trump the convenience of not having to give the number
explicitly.
Another thing we may want to add to the tool is to give it a mode
that either (1) refuses to place the same object in a single pack
stream more than once, or (2) warn when it happens. The latter
would be useful to create an invalid pack stream for testing.
Thanks.
[Footnote]
* ... which I would welcome as the project manager, even though
personally I find that 'it is just test helper' attitude
attractive. Not everybody in this project is experienced enough
to understand the 'it is just test helper, let's not spend too
much effort on it' attitude and elements in the code there should
not be blindly copied and pasted to production part of the system.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] test-tool: add pack-deltas helper
2025-04-28 16:37 ` Junio C Hamano
@ 2025-04-28 18:59 ` Derrick Stolee
2025-04-28 20:35 ` Junio C Hamano
0 siblings, 1 reply; 29+ messages in thread
From: Derrick Stolee @ 2025-04-28 18:59 UTC (permalink / raw)
To: Junio C Hamano
Cc: Patrick Steinhardt, Johannes Schindelin,
Derrick Stolee via GitGitGadget, git, peff
On 4/28/2025 12:37 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
>
>>> Yeah, I think we clearly showed our "it's just test helper, whose
>>> callers are supposed to know what they are doing" attitude, but with
>>> proper helpers, it is not too much additional effort to do the right
>>> thing.
>>
>> But with this philosophy in mind I can change the CLI to be of the form
>> "--num-objects <n>" to use the parse-options feature. This should make
>> things more extensible in the future.
>
> True. If we are aiming to deliver this to end-user's hands in some
> future, I agree that we want to make it extensible, make it dtrt
> without being told, and make it harder to give wrong input.
I'm not focused on usability, but instead on faster development cycles
if more advanced options are required in the future. I'll happily take
some extra time now to help those who come after.
> If we
> are going in that direction [*], I suspect this should not be a
> separate and independent input---rather, shouldn't the tool already
> _know_ what objects it placed in the resulting output stream, and
> should be able to _count_ that number by itself?
This makes sense as a feature, except that we need to write the number
of objects present in a packfile in its header. If we wanted to avoid
the argument, then we'd need to load the data into a list before
starting to write the packfile. By taking the count in advance, the
implementation is simpler.
> One thing it lets
> us do to have this as a separate number is to create an invalid pack
> stream where the header gives a wrong number, and as a test tool,
> that may trump the convenience of not having to give the number
> explicitly.
But also, this allows generating bad packfiles which is a bonus.
A very good point.
> Another thing we may want to add to the tool is to give it a mode
> that either (1) refuses to place the same object in a single pack
> stream more than once, or (2) warn when it happens. The latter
> would be useful to create an invalid pack stream for testing.
Noted for potential future expansion.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v2 0/3] Fix REF_DELTA chain bug in 'git index-pack'
2025-04-23 17:40 [PATCH 0/3] Fix REF_DELTA chain bug in 'git index-pack' Derrick Stolee via GitGitGadget
` (2 preceding siblings ...)
2025-04-23 17:40 ` [PATCH 3/3] index-pack: allow revisiting REF_DELTA chains Derrick Stolee via GitGitGadget
@ 2025-04-28 20:24 ` Derrick Stolee via GitGitGadget
2025-04-28 20:24 ` [PATCH v2 1/3] test-tool: add pack-deltas helper Derrick Stolee via GitGitGadget
` (3 more replies)
3 siblings, 4 replies; 29+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-04-28 20:24 UTC (permalink / raw)
To: git; +Cc: gitster, peff, Patrick Steinhardt, Johannes Schindelin,
Derrick Stolee
When fetching content from a remote, 'git index-pack' processes the packfile
content, storing a packfile appropriate for on-disk storage and a pack-index
helping to perform random-access into that packfile. To help with
compression, the packfile sent over the wire can use REF_DELTAs in addition
to OFS_DELTAs to refer to objects that are already known to exist in the
client's repository. REF_DELTAs can also refer to objects within the
packfile, though this is not typically done.
Because this inter-pack REF_DELTA is not a typical data shape, a latent bug
has been waiting that causes 'git index-pack' to die() even on legitimate
packfile content that it could resolve.
This series resolves this problem while also creating a test helper for
constructing packfiles with specific objects represented in specific types
of deltas and in a given order. This should make it easier to create test
cases like this in the future instead of updating t/lib-pack.sh through
other means.
Updates in V2
=============
* Fixed a memory leak in the test helper.
* The test helper has a better CLI that makes use of the parse-options
library.
* The test script skips the in file and instead feeds the input directly to
the test helper.
Thanks, -Stolee
Derrick Stolee (3):
test-tool: add pack-deltas helper
t5309: create failing test for 'git index-pack'
index-pack: allow revisiting REF_DELTA chains
Makefile | 1 +
builtin/index-pack.c | 58 ++++++++------
t/helper/meson.build | 1 +
t/helper/test-pack-deltas.c | 148 +++++++++++++++++++++++++++++++++++
t/helper/test-tool.c | 1 +
t/helper/test-tool.h | 1 +
t/t5309-pack-delta-cycles.sh | 34 +++++++-
7 files changed, 216 insertions(+), 28 deletions(-)
create mode 100644 t/helper/test-pack-deltas.c
base-commit: 4bbb303af69990ccd05fe3a2eb58a1ce036f8220
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1906%2Fderrickstolee%2Findex-pack-ref-deltas-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1906/derrickstolee/index-pack-ref-deltas-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1906
Range-diff vs v1:
1: 5d4beb202d6 ! 1: 41aac8e782f test-tool: add pack-deltas helper
@@ t/helper/test-pack-deltas.c (new)
+#include "hex.h"
+#include "pack.h"
+#include "pack-objects.h"
++#include "parse-options.h"
+#include "setup.h"
+#include "strbuf.h"
+#include "string-list.h"
+
-+static const char usage_str[] = "test-tool pack-deltas <n>";
++static const char *usage_str[] = {
++ "test-tool pack-deltas --num-objects <num-objects>",
++ NULL
++};
+
+static unsigned long do_compress(void **pptr, unsigned long size)
+{
@@ t/helper/test-pack-deltas.c (new)
+
+int cmd__pack_deltas(int argc, const char **argv)
+{
-+ int N;
++ int num_objects = -1;
+ struct hashfile *f;
+ struct strbuf line = STRBUF_INIT;
++ struct option options[] = {
++ OPT_INTEGER('n', "num-objects", &num_objects, N_("the number of objects to write")),
++ OPT_END()
++ };
+
-+ if (argc != 2) {
-+ usage(usage_str);
-+ return -1;
-+ }
++ argc = parse_options(argc, argv, NULL,
++ options, usage_str, 0);
+
-+ N = atoi(argv[1]);
++ if (argc || num_objects < 0)
++ usage_with_options(usage_str, options);
+
+ setup_git_directory();
+
+ f = hashfd(the_repository->hash_algo, 1, "<stdout>");
-+ write_pack_header(f, N);
++ write_pack_header(f, num_objects);
+
+ /* Read each line from stdin into 'line' */
+ while (strbuf_getline_lf(&line, stdin) != EOF) {
@@ t/helper/test-pack-deltas.c (new)
+ if (get_oid_hex(base_oid_str, &base_oid))
+ die("invalid object: %s", base_oid_str);
+ }
++ string_list_clear(&items, 0);
+
+ if (!strcmp(type_str, "REF_DELTA"))
+ write_ref_delta(f, &content_oid, &base_oid);
2: a9430447641 ! 2: 53a990e69ea t5309: create failing test for 'git index-pack'
@@ t/t5309-pack-delta-cycles.sh: test_expect_success 'failover to a duplicate objec
+ C=$(git -C server rev-parse HEAD~2^{tree}) &&
+ git -C server reset --hard HEAD~1 &&
+
-+ cat >in <<-EOF &&
++ test-tool -C server pack-deltas --num-objects=2 >thin.pack <<-EOF &&
+ REF_DELTA $A $B
+ REF_DELTA $B $C
+ EOF
+
-+ test-tool -C server pack-deltas 2 <in >thin.pack &&
-+
+ git clone "file://$(pwd)/server" client &&
+ (
+ cd client &&
3: 27d36402fe9 = 3: 1358039b2f3 index-pack: allow revisiting REF_DELTA chains
--
gitgitgadget
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v2 1/3] test-tool: add pack-deltas helper
2025-04-28 20:24 ` [PATCH v2 0/3] Fix REF_DELTA chain bug in 'git index-pack' Derrick Stolee via GitGitGadget
@ 2025-04-28 20:24 ` Derrick Stolee via GitGitGadget
2025-04-28 20:24 ` [PATCH v2 2/3] t5309: create failing test for 'git index-pack' Derrick Stolee via GitGitGadget
` (2 subsequent siblings)
3 siblings, 0 replies; 29+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-04-28 20:24 UTC (permalink / raw)
To: git
Cc: gitster, peff, Patrick Steinhardt, Johannes Schindelin,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
When trying to demonstrate certain behavior in tests, it can be helpful
to create packfiles that have specific delta structures. 'git
pack-objects' uses various algorithms to select deltas based on their
compression rates, but that does not always demonstrate all possible
packfile shapes. This becomes especially important when wanting to test
'git index-pack' and its ability to parse certain pack shapes.
We have prior art in t/lib-pack.sh, where certain delta structures are
produced by manually writing certain opaque pack contents. However,
producing these script updates is cumbersome and difficult to do as a
contributor.
Instead, create a new test-tool, 'test-tool pack-deltas', that reads a
list of instructions for which objects to include in a packfile and how
those objects should be written in delta form.
At the moment, this only supports REF_DELTAs as those are the kinds of
deltas needed to exercise a bug in 'git index-pack'.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
Makefile | 1 +
t/helper/meson.build | 1 +
t/helper/test-pack-deltas.c | 148 ++++++++++++++++++++++++++++++++++++
t/helper/test-tool.c | 1 +
t/helper/test-tool.h | 1 +
5 files changed, 152 insertions(+)
create mode 100644 t/helper/test-pack-deltas.c
diff --git a/Makefile b/Makefile
index 13f9062a056..c4d21ccd3d1 100644
--- a/Makefile
+++ b/Makefile
@@ -821,6 +821,7 @@ TEST_BUILTINS_OBJS += test-mergesort.o
TEST_BUILTINS_OBJS += test-mktemp.o
TEST_BUILTINS_OBJS += test-name-hash.o
TEST_BUILTINS_OBJS += test-online-cpus.o
+TEST_BUILTINS_OBJS += test-pack-deltas.o
TEST_BUILTINS_OBJS += test-pack-mtimes.o
TEST_BUILTINS_OBJS += test-parse-options.o
TEST_BUILTINS_OBJS += test-parse-pathspec-file.o
diff --git a/t/helper/meson.build b/t/helper/meson.build
index d2cabaa2bcf..d4e8b26df8d 100644
--- a/t/helper/meson.build
+++ b/t/helper/meson.build
@@ -36,6 +36,7 @@ test_tool_sources = [
'test-mktemp.c',
'test-name-hash.c',
'test-online-cpus.c',
+ 'test-pack-deltas.c',
'test-pack-mtimes.c',
'test-parse-options.c',
'test-parse-pathspec-file.c',
diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
new file mode 100644
index 00000000000..4caa024b1eb
--- /dev/null
+++ b/t/helper/test-pack-deltas.c
@@ -0,0 +1,148 @@
+#define USE_THE_REPOSITORY_VARIABLE
+
+#include "test-tool.h"
+#include "git-compat-util.h"
+#include "delta.h"
+#include "git-zlib.h"
+#include "hash.h"
+#include "hex.h"
+#include "pack.h"
+#include "pack-objects.h"
+#include "parse-options.h"
+#include "setup.h"
+#include "strbuf.h"
+#include "string-list.h"
+
+static const char *usage_str[] = {
+ "test-tool pack-deltas --num-objects <num-objects>",
+ NULL
+};
+
+static unsigned long do_compress(void **pptr, unsigned long size)
+{
+ git_zstream stream;
+ void *in, *out;
+ unsigned long maxsize;
+
+ git_deflate_init(&stream, 1);
+ maxsize = git_deflate_bound(&stream, size);
+
+ in = *pptr;
+ out = xmalloc(maxsize);
+ *pptr = out;
+
+ stream.next_in = in;
+ stream.avail_in = size;
+ stream.next_out = out;
+ stream.avail_out = maxsize;
+ while (git_deflate(&stream, Z_FINISH) == Z_OK)
+ ; /* nothing */
+ git_deflate_end(&stream);
+
+ free(in);
+ return stream.total_out;
+}
+
+static void write_ref_delta(struct hashfile *f,
+ struct object_id *oid,
+ struct object_id *base)
+{
+ unsigned char header[MAX_PACK_OBJECT_HEADER];
+ unsigned long size, base_size, delta_size, compressed_size, hdrlen;
+ enum object_type type;
+ void *base_buf, *delta_buf;
+ void *buf = repo_read_object_file(the_repository,
+ oid, &type,
+ &size);
+
+ if (!buf)
+ die("unable to read %s", oid_to_hex(oid));
+
+ base_buf = repo_read_object_file(the_repository,
+ base, &type,
+ &base_size);
+
+ if (!base_buf)
+ die("unable to read %s", oid_to_hex(base));
+
+ delta_buf = diff_delta(base_buf, base_size,
+ buf, size, &delta_size, 0);
+
+ compressed_size = do_compress(&delta_buf, delta_size);
+
+ hdrlen = encode_in_pack_object_header(header, sizeof(header),
+ OBJ_REF_DELTA, delta_size);
+ hashwrite(f, header, hdrlen);
+ hashwrite(f, base->hash, the_repository->hash_algo->rawsz);
+ hashwrite(f, delta_buf, compressed_size);
+
+ free(buf);
+ free(base_buf);
+ free(delta_buf);
+}
+
+int cmd__pack_deltas(int argc, const char **argv)
+{
+ int num_objects = -1;
+ struct hashfile *f;
+ struct strbuf line = STRBUF_INIT;
+ struct option options[] = {
+ OPT_INTEGER('n', "num-objects", &num_objects, N_("the number of objects to write")),
+ OPT_END()
+ };
+
+ argc = parse_options(argc, argv, NULL,
+ options, usage_str, 0);
+
+ if (argc || num_objects < 0)
+ usage_with_options(usage_str, options);
+
+ setup_git_directory();
+
+ f = hashfd(the_repository->hash_algo, 1, "<stdout>");
+ write_pack_header(f, num_objects);
+
+ /* Read each line from stdin into 'line' */
+ while (strbuf_getline_lf(&line, stdin) != EOF) {
+ const char *type_str, *content_oid_str, *base_oid_str = NULL;
+ struct object_id content_oid, base_oid;
+ struct string_list items = STRING_LIST_INIT_NODUP;
+ /*
+ * Tokenize into two or three parts:
+ * 1. REF_DELTA, OFS_DELTA, or FULL.
+ * 2. The object ID for the content object.
+ * 3. The object ID for the base object (optional).
+ */
+ if (string_list_split_in_place(&items, line.buf, " ", 3) < 0)
+ die("invalid input format: %s", line.buf);
+
+ if (items.nr < 2)
+ die("invalid input format: %s", line.buf);
+
+ type_str = items.items[0].string;
+ content_oid_str = items.items[1].string;
+
+ if (get_oid_hex(content_oid_str, &content_oid))
+ die("invalid object: %s", content_oid_str);
+ if (items.nr >= 3) {
+ base_oid_str = items.items[2].string;
+ if (get_oid_hex(base_oid_str, &base_oid))
+ die("invalid object: %s", base_oid_str);
+ }
+ string_list_clear(&items, 0);
+
+ if (!strcmp(type_str, "REF_DELTA"))
+ write_ref_delta(f, &content_oid, &base_oid);
+ else if (!strcmp(type_str, "OFS_DELTA"))
+ die("OFS_DELTA not implemented");
+ else if (!strcmp(type_str, "FULL"))
+ die("FULL not implemented");
+ else
+ die("unknown pack type: %s", type_str);
+ }
+
+ finalize_hashfile(f, NULL, FSYNC_COMPONENT_PACK,
+ CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE);
+ strbuf_release(&line);
+ return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 50dc4dac4ed..74812ed86d3 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -46,6 +46,7 @@ static struct test_cmd cmds[] = {
{ "mktemp", cmd__mktemp },
{ "name-hash", cmd__name_hash },
{ "online-cpus", cmd__online_cpus },
+ { "pack-deltas", cmd__pack_deltas },
{ "pack-mtimes", cmd__pack_mtimes },
{ "parse-options", cmd__parse_options },
{ "parse-options-flags", cmd__parse_options_flags },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 6d62a5b53d9..2571a3ccfe8 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -39,6 +39,7 @@ int cmd__mergesort(int argc, const char **argv);
int cmd__mktemp(int argc, const char **argv);
int cmd__name_hash(int argc, const char **argv);
int cmd__online_cpus(int argc, const char **argv);
+int cmd__pack_deltas(int argc, const char **argv);
int cmd__pack_mtimes(int argc, const char **argv);
int cmd__parse_options(int argc, const char **argv);
int cmd__parse_options_flags(int argc, const char **argv);
--
gitgitgadget
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 2/3] t5309: create failing test for 'git index-pack'
2025-04-28 20:24 ` [PATCH v2 0/3] Fix REF_DELTA chain bug in 'git index-pack' Derrick Stolee via GitGitGadget
2025-04-28 20:24 ` [PATCH v2 1/3] test-tool: add pack-deltas helper Derrick Stolee via GitGitGadget
@ 2025-04-28 20:24 ` Derrick Stolee via GitGitGadget
2025-04-28 20:24 ` [PATCH v2 3/3] index-pack: allow revisiting REF_DELTA chains Derrick Stolee via GitGitGadget
2025-04-28 22:40 ` [PATCH v2 0/3] Fix REF_DELTA chain bug in 'git index-pack' Junio C Hamano
3 siblings, 0 replies; 29+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-04-28 20:24 UTC (permalink / raw)
To: git
Cc: gitster, peff, Patrick Steinhardt, Johannes Schindelin,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
This new test demonstrates some behavior where a valid packfile is being
rejected by the Git client due to the order in which it is resolving
REF_DELTAs.
The thin packfile has a REF_DELTA chain A->B->C where C is not included
in the packfile. However, the client repository contains both C and B
already. Thus, 'git index-pack' is able to resolve A before resolving B.
When resolving B, it then attempts to resolve any other REF_DELTAs that
are pointing to B as a base. This "revisits" A and complains as if there
is a cycle, but it did not actually detect a cycle.
A fix will arrive in the next change.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
t/t5309-pack-delta-cycles.sh | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/t/t5309-pack-delta-cycles.sh b/t/t5309-pack-delta-cycles.sh
index 60fc710bacb..6a936763302 100755
--- a/t/t5309-pack-delta-cycles.sh
+++ b/t/t5309-pack-delta-cycles.sh
@@ -75,4 +75,28 @@ test_expect_success 'failover to a duplicate object in the same pack' '
test_must_fail git index-pack --fix-thin --stdin <recoverable.pack
'
+test_expect_failure 'index-pack works with thin pack A->B->C with B on disk' '
+ git init server &&
+ (
+ cd server &&
+ test_commit_bulk 4
+ ) &&
+
+ A=$(git -C server rev-parse HEAD^{tree}) &&
+ B=$(git -C server rev-parse HEAD~1^{tree}) &&
+ C=$(git -C server rev-parse HEAD~2^{tree}) &&
+ git -C server reset --hard HEAD~1 &&
+
+ test-tool -C server pack-deltas --num-objects=2 >thin.pack <<-EOF &&
+ REF_DELTA $A $B
+ REF_DELTA $B $C
+ EOF
+
+ git clone "file://$(pwd)/server" client &&
+ (
+ cd client &&
+ git index-pack --fix-thin --stdin <../thin.pack
+ )
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 3/3] index-pack: allow revisiting REF_DELTA chains
2025-04-28 20:24 ` [PATCH v2 0/3] Fix REF_DELTA chain bug in 'git index-pack' Derrick Stolee via GitGitGadget
2025-04-28 20:24 ` [PATCH v2 1/3] test-tool: add pack-deltas helper Derrick Stolee via GitGitGadget
2025-04-28 20:24 ` [PATCH v2 2/3] t5309: create failing test for 'git index-pack' Derrick Stolee via GitGitGadget
@ 2025-04-28 20:24 ` Derrick Stolee via GitGitGadget
2025-05-07 2:08 ` Taylor Blau
2025-04-28 22:40 ` [PATCH v2 0/3] Fix REF_DELTA chain bug in 'git index-pack' Junio C Hamano
3 siblings, 1 reply; 29+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-04-28 20:24 UTC (permalink / raw)
To: git
Cc: gitster, peff, Patrick Steinhardt, Johannes Schindelin,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
As detailed in the previous changes to t5309-pack-delta-cycles.sh, the
logic within 'git index-pack' to analyze an incoming thin packfile with
REF_DELTAs is suspect. The algorithm is overly cautious around delta
cycles, and that leads in fact to failing even when there is no cycle.
This change adjusts the algorithm to no longer fail in these cases. In
fact, these cycle cases will no longer fail but more importantly the
valid cases will no longer fail, either. The resulting packfile from the
--fix-thin operation will not have cycles either since REF_DELTAs are
forbidden from the on-disk format and OFS_DELTAs are impossible to write
as a cycle.
The crux of the matter is how the algorithm works when the REF_DELTAs
point to base objects that exist in the local repository. When reading
the thin packfile, the object IDs for the delta objects are unknown so
we do not have the delta chain structure automatically. Instead, we need
to start somewhere by selecting a delta whose base is inside our current
object database.
Consider the case where the packfile has two REF_DELTA objects, A and B,
and the delta chain looks like "A depends on B" and "B depends on C" for
some third object C, where C is already in the current repository. The
algorithm _should_ start with all objects that depend on C, finding B,
and then moving on to all objects depending on B, finding A.
However, if the repository also already has object B, then the delta
chain can be analyzed in a different order. The deltas with base B can
be analyzed first, finding A, and then the deltas with base C are
analyzed, finding B. The algorithm currently continues to look for
objects that depend on B, finding A again. This fails due to A's
'real_type' member already being overwritten from OBJ_REF_DELTA to the
correct object type.
This scenario is possible in a typical 'git fetch' where the client does
not advertise B as a 'have' but requests A as a 'want' (and C is noticed
as a common object based on other 'have's). The reason this isn't
typically seen is that most Git servers use OFS_DELTAs to represent
deltas within a packfile. However, if a server uses only REF_DELTAs,
then this kind of issue can occur. There is nothing in the explicit
packfile format that states this use of inter-pack REF_DELTA is
incorrect, only that REF_DELTAs should not be used in the on-disk
representation to avoid cycles.
This die() was introduced in ab791dd138 (index-pack: fix race condition
with duplicate bases, 2014-08-29). Several refactors have adjusted the
error message and the surrounding logic, but this issue has existed for
a longer time as that was only a conversion from an assert().
The tests in t5309 originated in 3b910d0c5e (add tests for indexing
packs with delta cycles, 2013-08-23) and b2ef3d9ebb (test index-pack on
packs with recoverable delta cycles, 2013-08-23). These changes make
note that the current behavior of handling "resolvable" cycles is mostly
a documentation-only test, not that this behavior is the best way for
Git to handle the situation.
The fix here is somewhat complicated due to the amount of state being
adjusted by the loop within threaded_second_pass(). Instead of trying to
resume the start of the loop while adjusting the necessary context, I
chose to scan the REF_DELTAs depending on the current 'parent' and skip
any that have already been processed. This necessarily leaves us in a
state where 'child' and 'child_obj' could be left as NULL and that must
be handled later. There is also some careful handling around skipping
REF_DELTAs when there are also OFS_DELTAs depending on that parent.
There may be value in extending 'test-tool pack-deltas' to allow writing
OFS_DELTAs in order to exercise this logic across the delta types.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
builtin/index-pack.c | 58 ++++++++++++++++++++----------------
t/t5309-pack-delta-cycles.sh | 12 ++++++--
2 files changed, 41 insertions(+), 29 deletions(-)
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index de127c0ff13..dbe79701fb8 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1109,8 +1109,8 @@ static void *threaded_second_pass(void *data)
set_thread_data(data);
for (;;) {
struct base_data *parent = NULL;
- struct object_entry *child_obj;
- struct base_data *child;
+ struct object_entry *child_obj = NULL;
+ struct base_data *child = NULL;
counter_lock();
display_progress(progress, nr_resolved_deltas);
@@ -1137,15 +1137,18 @@ static void *threaded_second_pass(void *data)
parent = list_first_entry(&work_head, struct base_data,
list);
- if (parent->ref_first <= parent->ref_last) {
+ while (parent->ref_first <= parent->ref_last) {
int offset = ref_deltas[parent->ref_first++].obj_no;
child_obj = objects + offset;
- if (child_obj->real_type != OBJ_REF_DELTA)
- die("REF_DELTA at offset %"PRIuMAX" already resolved (duplicate base %s?)",
- (uintmax_t) child_obj->idx.offset,
- oid_to_hex(&parent->obj->idx.oid));
+ if (child_obj->real_type != OBJ_REF_DELTA) {
+ child_obj = NULL;
+ continue;
+ }
child_obj->real_type = parent->obj->real_type;
- } else {
+ break;
+ }
+
+ if (!child_obj && parent->ofs_first <= parent->ofs_last) {
child_obj = objects +
ofs_deltas[parent->ofs_first++].obj_no;
assert(child_obj->real_type == OBJ_OFS_DELTA);
@@ -1178,29 +1181,32 @@ static void *threaded_second_pass(void *data)
}
work_unlock();
- if (parent) {
- child = resolve_delta(child_obj, parent);
- if (!child->children_remaining)
- FREE_AND_NULL(child->data);
- } else {
- child = make_base(child_obj, NULL);
- if (child->children_remaining) {
- /*
- * Since this child has its own delta children,
- * we will need this data in the future.
- * Inflate now so that future iterations will
- * have access to this object's data while
- * outside the work mutex.
- */
- child->data = get_data_from_pack(child_obj);
- child->size = child_obj->size;
+ if (child_obj) {
+ if (parent) {
+ child = resolve_delta(child_obj, parent);
+ if (!child->children_remaining)
+ FREE_AND_NULL(child->data);
+ } else{
+ child = make_base(child_obj, NULL);
+ if (child->children_remaining) {
+ /*
+ * Since this child has its own delta children,
+ * we will need this data in the future.
+ * Inflate now so that future iterations will
+ * have access to this object's data while
+ * outside the work mutex.
+ */
+ child->data = get_data_from_pack(child_obj);
+ child->size = child_obj->size;
+ }
}
}
work_lock();
if (parent)
parent->retain_data--;
- if (child->data) {
+
+ if (child && child->data) {
/*
* This child has its own children, so add it to
* work_head.
@@ -1209,7 +1215,7 @@ static void *threaded_second_pass(void *data)
base_cache_used += child->size;
prune_base_data(NULL);
free_base_data(child);
- } else {
+ } else if (child) {
/*
* This child does not have its own children. It may be
* the last descendant of its ancestors; free those
diff --git a/t/t5309-pack-delta-cycles.sh b/t/t5309-pack-delta-cycles.sh
index 6a936763302..6b03675d91b 100755
--- a/t/t5309-pack-delta-cycles.sh
+++ b/t/t5309-pack-delta-cycles.sh
@@ -60,7 +60,10 @@ test_expect_success 'index-pack detects REF_DELTA cycles' '
test_expect_success 'failover to an object in another pack' '
clear_packs &&
git index-pack --stdin <ab.pack &&
- test_must_fail git index-pack --stdin --fix-thin <cycle.pack
+
+ # This cycle does not fail since the existence of A & B in
+ # the repo allows us to resolve the cycle.
+ git index-pack --stdin --fix-thin <cycle.pack
'
test_expect_success 'failover to a duplicate object in the same pack' '
@@ -72,10 +75,13 @@ test_expect_success 'failover to a duplicate object in the same pack' '
pack_obj $A
} >recoverable.pack &&
pack_trailer recoverable.pack &&
- test_must_fail git index-pack --fix-thin --stdin <recoverable.pack
+
+ # This cycle does not fail since the existence of a full copy
+ # of A in the pack allows us to resolve the cycle.
+ git index-pack --fix-thin --stdin <recoverable.pack
'
-test_expect_failure 'index-pack works with thin pack A->B->C with B on disk' '
+test_expect_success 'index-pack works with thin pack A->B->C with B on disk' '
git init server &&
(
cd server &&
--
gitgitgadget
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] test-tool: add pack-deltas helper
2025-04-28 18:59 ` Derrick Stolee
@ 2025-04-28 20:35 ` Junio C Hamano
0 siblings, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2025-04-28 20:35 UTC (permalink / raw)
To: Derrick Stolee
Cc: Patrick Steinhardt, Johannes Schindelin,
Derrick Stolee via GitGitGadget, git, peff
Derrick Stolee <stolee@gmail.com> writes:
> This makes sense as a feature, except that we need to write the number
> of objects present in a packfile in its header. If we wanted to avoid
> the argument, then we'd need to load the data into a list before
> starting to write the packfile. By taking the count in advance, the
> implementation is simpler.
Perhaps, but in the production code we already seek back and update
the header after writing a packfile with fixup_pack_header_footer(),
so which shoulnd't be too much work conceptually. You earlier said
something about faster development cycles; not having to figure out
how to seek back and fix the header does help faster development of
this helper ;-).
>> One thing it lets
>> us do to have this as a separate number is to create an invalid pack
>> stream where the header gives a wrong number, and as a test tool,
>> that may trump the convenience of not having to give the number
>> explicitly.
> But also, this allows generating bad packfiles which is a bonus.
> A very good point.
If this were an end-user facing program, we would most likely let it
count but let the count be overridden by the command line argument.
But until then, as I said, I am fine to just declare that it is just
a test helper ;-)
Thanks.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 0/3] Fix REF_DELTA chain bug in 'git index-pack'
2025-04-28 20:24 ` [PATCH v2 0/3] Fix REF_DELTA chain bug in 'git index-pack' Derrick Stolee via GitGitGadget
` (2 preceding siblings ...)
2025-04-28 20:24 ` [PATCH v2 3/3] index-pack: allow revisiting REF_DELTA chains Derrick Stolee via GitGitGadget
@ 2025-04-28 22:40 ` Junio C Hamano
2025-04-29 5:33 ` Patrick Steinhardt
3 siblings, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2025-04-28 22:40 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget
Cc: git, peff, Patrick Steinhardt, Johannes Schindelin,
Derrick Stolee
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> Updates in V2
> =============
>
> * Fixed a memory leak in the test helper.
> * The test helper has a better CLI that makes use of the parse-options
> library.
> * The test script skips the in file and instead feeds the input directly to
> the test helper.
Everything in the changes relative to the previous iteration looked
quite sane. Will replace. Thanks.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 0/3] Fix REF_DELTA chain bug in 'git index-pack'
2025-04-28 22:40 ` [PATCH v2 0/3] Fix REF_DELTA chain bug in 'git index-pack' Junio C Hamano
@ 2025-04-29 5:33 ` Patrick Steinhardt
0 siblings, 0 replies; 29+ messages in thread
From: Patrick Steinhardt @ 2025-04-29 5:33 UTC (permalink / raw)
To: Junio C Hamano
Cc: Derrick Stolee via GitGitGadget, git, peff, Johannes Schindelin,
Derrick Stolee
On Mon, Apr 28, 2025 at 03:40:26PM -0700, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Updates in V2
> > =============
> >
> > * Fixed a memory leak in the test helper.
> > * The test helper has a better CLI that makes use of the parse-options
> > library.
> > * The test script skips the in file and instead feeds the input directly to
> > the test helper.
>
> Everything in the changes relative to the previous iteration looked
> quite sane. Will replace. Thanks.
Seconded, the range-diff looks as expected and addresses my comments.
Thanks!
Patrick
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 3/3] index-pack: allow revisiting REF_DELTA chains
2025-04-28 20:24 ` [PATCH v2 3/3] index-pack: allow revisiting REF_DELTA chains Derrick Stolee via GitGitGadget
@ 2025-05-07 2:08 ` Taylor Blau
2025-05-07 13:47 ` Derrick Stolee
0 siblings, 1 reply; 29+ messages in thread
From: Taylor Blau @ 2025-05-07 2:08 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget
Cc: git, gitster, peff, Patrick Steinhardt, Johannes Schindelin,
Derrick Stolee
On Mon, Apr 28, 2025 at 08:24:45PM +0000, Derrick Stolee via GitGitGadget wrote:
> The crux of the matter is how the algorithm works when the REF_DELTAs
> point to base objects that exist in the local repository.
Hmm. I'm having trouble squaring this with these next two paragraphs:
> Consider the case where the packfile has two REF_DELTA objects, A and B,
> and the delta chain looks like "A depends on B" and "B depends on C" for
> some third object C, where C is already in the current repository. The
> algorithm _should_ start with all objects that depend on C, finding B,
> and then moving on to all objects depending on B, finding A.
>
> However, if the repository also already has object B, then the delta
> chain can be analyzed in a different order. The deltas with base B can
> be analyzed first, finding A, and then the deltas with base C are
> analyzed, finding B. The algorithm currently continues to look for
> objects that depend on B, finding A again. This fails due to A's
> 'real_type' member already being overwritten from OBJ_REF_DELTA to the
> correct object type.
ISTM that this A->B->C chain is a problem because (in the above example)
the server sent B as a REF_DELTA base for A but also had its own
pre-existing copy of B.
But the first quoted sentence suggests that the issue is with REF_DELTAs
that point to base objects that exist in the local repository. Does
"point to" mean that the REF_DELTA's base is the local object, or that
the local object itself was sent as a REF_DELTA against some other base?
I haven't fully wrapped my head around the implications of this all yet,
but I think that it's the former, though admittedly even typing this I
am not quite sure of myself. I *think* that doing this is OK if the only
path from base objects to their corresponding deltas is unique, and/or
there were no such paths at all.
I'm trying to think through the implications of this against my
series[1] from a while ago that converts OFS_DELTAs that weren't usable
as part of verbatim pack-reuse into REF_DELTAs. There are two cases
there that I was considering:
- For some (delta, base) pair in a pack, there was an additional copy
of 'base' in some other pack, and the MIDX chose the copy from that
pack. That forms what I'm calling a "cross-pack" delta. This isn't
currently reusable as an OFS_DELTA for a variety of reasons, but is
OK as a REF_DELTA provided we know that the client either already
has the base object or we are sending it as part of the pack anyway.
- The other case is that we the client wants the delta-half of a
delta/base-pair, but not the base object. In this case, we can't
currently reuse the OFS_DELTA verbatim, but could if we converted it
to a REF_DELTA based on the knowledge that the client already has
the base object.
The latter is doable based on the information in the wants/haves bitmap.
The process there looks like: determine that the client doesn't want the
base object, realize that its bit is set in the 'haves' bitmap, and then
convert the delta object from a OFS_DELTA to a REF_DELTA.
But I think that all breaks for older clients that don't meet the unique
paths condition above. Does that sound right to you?
I think the cross-pack case is fine, provided we know ahead of time that
the client doesn't have the (converted-to-REF_DELTA) delta object in its
local copy.
Unfortunately, I think this means that [1] is a bit of a dead-end for
serves that have older clients (running a version that does not include
this patch). At least, I think that's true if we can construct a
situation where the server sends a REF_DELTA that it thinks the client
doesn't have but actually does. I'm not immediately sure what such a
situation would look like beyond cases like: "the client verbatim asked
for an object it already has, but isn't reachable from the set of
provided 'haves'".
Thanks,
Taylor
[1]: https://lore.kernel.org/git/cover.1728505840.git.me@ttaylorr.com/
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 3/3] index-pack: allow revisiting REF_DELTA chains
2025-05-07 2:08 ` Taylor Blau
@ 2025-05-07 13:47 ` Derrick Stolee
0 siblings, 0 replies; 29+ messages in thread
From: Derrick Stolee @ 2025-05-07 13:47 UTC (permalink / raw)
To: Taylor Blau, Derrick Stolee via GitGitGadget
Cc: git, gitster, peff, Patrick Steinhardt, Johannes Schindelin
On 5/6/2025 10:08 PM, Taylor Blau wrote:
> On Mon, Apr 28, 2025 at 08:24:45PM +0000, Derrick Stolee via GitGitGadget wrote:
>> The crux of the matter is how the algorithm works when the REF_DELTAs
>> point to base objects that exist in the local repository.
>
> Hmm. I'm having trouble squaring this with these next two paragraphs:
>
>> Consider the case where the packfile has two REF_DELTA objects, A and B,
>> and the delta chain looks like "A depends on B" and "B depends on C" for
>> some third object C, where C is already in the current repository. The
>> algorithm _should_ start with all objects that depend on C, finding B,
>> and then moving on to all objects depending on B, finding A.
>>
>> However, if the repository also already has object B, then the delta
>> chain can be analyzed in a different order. The deltas with base B can
>> be analyzed first, finding A, and then the deltas with base C are
>> analyzed, finding B. The algorithm currently continues to look for
>> objects that depend on B, finding A again. This fails due to A's
>> 'real_type' member already being overwritten from OBJ_REF_DELTA to the
>> correct object type.
>
> ISTM that this A->B->C chain is a problem because (in the above example)
> the server sent B as a REF_DELTA base for A but also had its own
> pre-existing copy of B.
>
> But the first quoted sentence suggests that the issue is with REF_DELTAs
> that point to base objects that exist in the local repository. Does
> "point to" mean that the REF_DELTA's base is the local object, or that
> the local object itself was sent as a REF_DELTA against some other base?
The issue is that based on the negotiation, the server knows the client
wants A and has C but doesn't think the client has B despite the client
having B.
If the server sends these two objects as a delta chain A->B->C using
REF_DELTAs (and not the typical OFS_DELTA from A to B) then the client
may inflate A first and B second (and then try to inflate A again).
> I haven't fully wrapped my head around the implications of this all yet,
> but I think that it's the former, though admittedly even typing this I
> am not quite sure of myself. I *think* that doing this is OK if the only
> path from base objects to their corresponding deltas is unique, and/or
> there were no such paths at all.
>
> I'm trying to think through the implications of this against my
> series[1] from a while ago that converts OFS_DELTAs that weren't usable
> as part of verbatim pack-reuse into REF_DELTAs. There are two cases
> there that I was considering:
>
> - For some (delta, base) pair in a pack, there was an additional copy
> of 'base' in some other pack, and the MIDX chose the copy from that
> pack. That forms what I'm calling a "cross-pack" delta. This isn't
> currently reusable as an OFS_DELTA for a variety of reasons, but is
> OK as a REF_DELTA provided we know that the client either already
> has the base object or we are sending it as part of the pack anyway.
These are the kinds of deltas that could hit this problem.
> - The other case is that we the client wants the delta-half of a
> delta/base-pair, but not the base object. In this case, we can't
> currently reuse the OFS_DELTA verbatim, but could if we converted it
> to a REF_DELTA based on the knowledge that the client already has
> the base object.
If we know the client has the base object, then not including the base in
the pack and referencing it as a REF_DELTA will always be OK.
> The latter is doable based on the information in the wants/haves bitmap.
> The process there looks like: determine that the client doesn't want the
> base object, realize that its bit is set in the 'haves' bitmap, and then
> convert the delta object from a OFS_DELTA to a REF_DELTA.
>
> But I think that all breaks for older clients that don't meet the unique
> paths condition above. Does that sound right to you?
>
> I think the cross-pack case is fine, provided we know ahead of time that
> the client doesn't have the (converted-to-REF_DELTA) delta object in its
> local copy.
>
> Unfortunately, I think this means that [1] is a bit of a dead-end for
> serves that have older clients (running a version that does not include
> this patch). At least, I think that's true if we can construct a
> situation where the server sends a REF_DELTA that it thinks the client
> doesn't have but actually does. I'm not immediately sure what such a
> situation would look like beyond cases like: "the client verbatim asked
> for an object it already has, but isn't reachable from the set of
> provided 'haves'".
I agree that [1] is going to present more opportunities for this bug to
be hit by older clients, though I will admit that the frequency of the
problem seems to have odd frequencies. It relies upon the fact that the
fetch negotiation fails to identify "B" as a common object. This sort of
thing can happen more frequently when using the commit boundary between
the haves and wants (instead of walking from all haves like a bitmap walk
would do). If the client fails to advertise all of its haves, then that
could also lead to this problem even if the server is walking all objects
reachable from the haves.
The other thing that you'll have going for you is that cross-pack deltas
should be rare. The case that helped me discover this issue was where the
server did not send any OFS_DELTAs at all but used REF_DELTAs for all
delta relationships. (This is required if the client doesn't advertise
the "ofs-delta" capability, but we'd need to go _way_ back for a client
to not include that capability.)
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2025-05-07 13:47 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-23 17:40 [PATCH 0/3] Fix REF_DELTA chain bug in 'git index-pack' Derrick Stolee via GitGitGadget
2025-04-23 17:40 ` [PATCH 1/3] test-tool: add pack-deltas helper Derrick Stolee via GitGitGadget
2025-04-23 19:26 ` Junio C Hamano
2025-04-23 19:32 ` Derrick Stolee
2025-04-24 19:41 ` Junio C Hamano
2025-04-24 20:06 ` Derrick Stolee
2025-04-24 20:56 ` Junio C Hamano
2025-04-25 4:34 ` Patrick Steinhardt
2025-04-25 9:34 ` Johannes Schindelin
2025-04-25 9:45 ` Patrick Steinhardt
2025-04-25 9:51 ` Johannes Schindelin
2025-04-25 16:27 ` Junio C Hamano
2025-04-28 15:22 ` Derrick Stolee
2025-04-28 16:37 ` Junio C Hamano
2025-04-28 18:59 ` Derrick Stolee
2025-04-28 20:35 ` Junio C Hamano
2025-04-23 17:40 ` [PATCH 2/3] t5309: create failing test for 'git index-pack' Derrick Stolee via GitGitGadget
2025-04-23 19:37 ` Junio C Hamano
2025-04-23 17:40 ` [PATCH 3/3] index-pack: allow revisiting REF_DELTA chains Derrick Stolee via GitGitGadget
2025-04-24 21:41 ` Junio C Hamano
2025-04-25 3:49 ` Derrick Stolee
2025-04-28 20:24 ` [PATCH v2 0/3] Fix REF_DELTA chain bug in 'git index-pack' Derrick Stolee via GitGitGadget
2025-04-28 20:24 ` [PATCH v2 1/3] test-tool: add pack-deltas helper Derrick Stolee via GitGitGadget
2025-04-28 20:24 ` [PATCH v2 2/3] t5309: create failing test for 'git index-pack' Derrick Stolee via GitGitGadget
2025-04-28 20:24 ` [PATCH v2 3/3] index-pack: allow revisiting REF_DELTA chains Derrick Stolee via GitGitGadget
2025-05-07 2:08 ` Taylor Blau
2025-05-07 13:47 ` Derrick Stolee
2025-04-28 22:40 ` [PATCH v2 0/3] Fix REF_DELTA chain bug in 'git index-pack' Junio C Hamano
2025-04-29 5:33 ` Patrick Steinhardt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).