Git development

Git development
 help / color / mirror / Atom feed

* [PATCH 11/11] reftable/table: fix OOB read on truncated table
From: Patrick Steinhardt @ 2026-06-24  8:23 UTC (permalink / raw)
  To: git; +Cc: oxsignal
In-Reply-To: <20260624-pks-reftable-hardening-v1-0-66e4ce87c6b9@pks.im>

When opening a table we compute the size of its data section by
subtracting the footer size from the file size. We do not verify that
the file is actually large enough to contain both the header and the
footer though. With a truncated table the subtraction can thus
underflow, causing us to read the footer out of bounds:

  SUMMARY: AddressSanitizer: heap-buffer-overflow (/home/pks/Development/git/build/t/unit-tests+0x2479a4) in __asan_memcpy
  Shadow bytes around the buggy address:
    0x7ccff6e0de80: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
    0x7ccff6e0df00: fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa
    0x7ccff6e0df80: fa fa fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x7ccff6e0e000: fd fd fd fd fa fa fa fa fa fa fa fa fd fd fd fd
    0x7ccff6e0e080: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa fa
  =>0x7ccff6e0e100: fa fa fa fa fa[fa]00 00 00 00 00 00 00 00 00 00
    0x7ccff6e0e180: 00 00 00 00 00 00 00 04 fa fa fa fa fa fa fa fa
    0x7ccff6e0e200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x7ccff6e0e280: 00 00 fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7ccff6e0e300: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7ccff6e0e380: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable:           00
    Partially addressable: 01 02 03 04 05 06 07
    Heap left redzone:       fa
    Freed heap region:       fd
    Stack left redzone:      f1
    Stack mid redzone:       f2
    Stack right redzone:     f3
    Stack after return:      f5
    Stack use after scope:   f8
    Global redzone:          f9
    Global init order:       f6
    Poisoned by user:        f7
    Container overflow:      fc
    Array cookie:            ac
    Intra object redzone:    bb
    ASan internal:           fe
    Left alloca redzone:     ca
    Right alloca redzone:    cb
  ==1500371==ABORTING

Verify that the file is large enough to contain both the header and the
footer before computing the table size.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/table.c                |  5 +++++
 t/unit-tests/u-reftable-table.c | 28 ++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/reftable/table.c b/reftable/table.c
index f4bc86a29d..b4d3f9e211 100644
--- a/reftable/table.c
+++ b/reftable/table.c
@@ -562,6 +562,11 @@ int reftable_table_new(struct reftable_table **out,
 		goto done;
 	}
 
+	if (file_size < header_size(t->version) + footer_size(t->version)) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
 	t->size = file_size - footer_size(t->version);
 	t->source = *source;
 	t->name = reftable_strdup(name);
diff --git a/t/unit-tests/u-reftable-table.c b/t/unit-tests/u-reftable-table.c
index c7dca45e70..28b0ef5258 100644
--- a/t/unit-tests/u-reftable-table.c
+++ b/t/unit-tests/u-reftable-table.c
@@ -262,3 +262,31 @@ void test_reftable_table__seek_invalid_log_offset(void)
 	reftable_table_decref(table);
 	reftable_buf_release(&buf);
 }
+
+void test_reftable_table__new_with_truncated_table(void)
+{
+	struct reftable_ref_record refs[] = {
+		{
+			.refname = (char *) "refs/heads/main",
+			.value_type = REFTABLE_REF_VAL1,
+			.value.val1 = { 42 },
+		},
+	};
+	struct reftable_block_source source = { 0 };
+	struct reftable_table *table;
+	struct reftable_buf buf = REFTABLE_BUF_INIT;
+
+	cl_reftable_write_to_buf(&buf, refs, ARRAY_SIZE(refs), NULL, 0, NULL);
+
+	/*
+	 * Truncate the table so that it is large enough to read the header, but
+	 * too small to also contain the footer.
+	 */
+	buf.len = footer_size(1) - 1;
+	block_source_from_buf(&source, &buf);
+
+	cl_assert_equal_i(reftable_table_new(&table, &source, "name"),
+			  REFTABLE_FORMAT_ERROR);
+
+	reftable_buf_release(&buf);
+}

-- 
2.55.0.rc1.745.g43192e7977.dirty


^ permalink raw reply related

* [PATCH 10/11] reftable/table: fix NULL pointer access when seeking to bogus offsets
From: Patrick Steinhardt @ 2026-06-24  8:23 UTC (permalink / raw)
  To: git; +Cc: oxsignal
In-Reply-To: <20260624-pks-reftable-hardening-v1-0-66e4ce87c6b9@pks.im>

When seeking an iterator to an arbitrary offset we may return a positive
value in case the offset points beyond the block. This makes sense when
iterating through multiple blocks of the same section, as the positive
value indicates to us that we're at the end of the table.

But when the offset originates from a section or index offset it is
supposed to point at a valid block, so an out-of-bounds value means that
the table is corrupt. Treating it as a normal end-of-iteration causes us
to silently report an empty section instead of surfacing the corruption,
and we are left with a partially-initialized block. This may later on
cause a NULL pointer exception:

  ==1486841==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x55555598e02c bp 0x7fffffff4eb0 sp 0x7fffffff4e70 T0)
  ==1486841==The signal is caused by a READ memory access.
  ==1486841==Hint: address points to the zero page.
      #0 0x55555598e02c in reftable_block_type ./git/build/../reftable/block.c:392:9
      #1 0x55555598ee6e in block_iter_seek_key ./git/build/../reftable/block.c:536:35
      #2 0x5555559ae553 in table_iter_seek_linear ./git/build/../reftable/table.c:344:8
      #3 0x5555559adbff in table_iter_seek ./git/build/../reftable/table.c:450:9
      #4 0x5555559ada9c in table_iter_seek_void ./git/build/../reftable/table.c:460:9
      #5 0x555555992872 in reftable_iterator_seek_log_at ./git/build/../reftable/iter.c:281:9
      #6 0x555555992953 in reftable_iterator_seek_log ./git/build/../reftable/iter.c:287:9
      #7 0x55555583aa78 in test_reftable_table__seek_invalid_log_offset ./git/build/../t/unit-tests/u-reftable-table.c:257:20
      #8 0x5555557f684e in clar_run_test ./git/build/../t/unit-tests/clar/clar.c:335:3
      #9 0x5555557f2e69 in clar_run_suite ./git/build/../t/unit-tests/clar/clar.c:431:3
      #10 0x5555557f2882 in clar_test_run ./git/build/../t/unit-tests/clar/clar.c:636:4
      #11 0x5555557f375f in clar_test ./git/build/../t/unit-tests/clar/clar.c:687:11
      #12 0x5555557fa49d in cmd_main ./git/build/../t/unit-tests/unit-test.c:62:8
      #13 0x55555584cffa in main ./git/build/../common-main.c:9:11
      #14 0x7ffff7a2b284 in __libc_start_call_main (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b284) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #15 0x7ffff7a2b337 in __libc_start_main@GLIBC_2.2.5 (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b337) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #16 0x555555694c24 in _start (./git/build/t/unit-tests+0x140c24)

  ==1486841==Register values:
  rax = 0x0000000000000000  rbx = 0x00007fffffff4ec0  rcx = 0x0000000000000000  rdx = 0x00007cfff6e2bd58
  rdi = 0x00007cfff6e2bd58  rsi = 0x00007bfff5da1020  rbp = 0x00007fffffff4eb0  rsp = 0x00007fffffff4e70
   r8 = 0x0000000000000000   r9 = 0x0000000000000002  r10 = 0x0000000000000000  r11 = 0x0000000000000017
  r12 = 0x00007fffffff5908  r13 = 0x0000000000000001  r14 = 0x00007ffff7ffd000  r15 = 0x0000555556056e90
  AddressSanitizer can not provide additional info.
  SUMMARY: AddressSanitizer: SEGV ./git/build/../reftable/block.c:392:9 in reftable_block_type
  ==1486841==ABORTING

Fix this by returning a proper error in `table_iter_seek_to()` when the
offset ranges beyond the block.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/table.c                |  2 ++
 t/unit-tests/u-reftable-table.c | 63 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

diff --git a/reftable/table.c b/reftable/table.c
index 56362df0ed..f4bc86a29d 100644
--- a/reftable/table.c
+++ b/reftable/table.c
@@ -242,6 +242,8 @@ static int table_iter_seek_to(struct table_iter *ti, uint64_t off, uint8_t typ)
 	int err;
 
 	err = table_init_block(ti->table, &ti->block, off, typ);
+	if (err > 0)
+		return REFTABLE_FORMAT_ERROR;
 	if (err != 0)
 		return err;
 
diff --git a/t/unit-tests/u-reftable-table.c b/t/unit-tests/u-reftable-table.c
index 14fae8b199..c7dca45e70 100644
--- a/t/unit-tests/u-reftable-table.c
+++ b/t/unit-tests/u-reftable-table.c
@@ -1,8 +1,11 @@
 #include "unit-test.h"
 #include "lib-reftable.h"
+#include "reftable/basics.h"
+#include "reftable/block.h"
 #include "reftable/blocksource.h"
 #include "reftable/constants.h"
 #include "reftable/iter.h"
+#include "reftable/reftable-error.h"
 #include "reftable/table.h"
 #include "strbuf.h"
 
@@ -199,3 +202,63 @@ void test_reftable_table__block_iterator(void)
 	reftable_buf_release(&buf);
 	reftable_free(records);
 }
+
+void test_reftable_table__seek_invalid_log_offset(void)
+{
+	struct reftable_ref_record refs[] = {
+		{
+			.refname = (char *) "refs/heads/main",
+			.value_type = REFTABLE_REF_VAL1,
+			.value.val1 = { 42 },
+		},
+	};
+	struct reftable_log_record logs[] = {
+		{
+			.refname = (char *) "refs/heads/main",
+			.update_index = 1,
+			.value_type = REFTABLE_LOG_UPDATE,
+			.value.update = {
+				.name = (char *) "user",
+				.email = (char *) "user@example.com",
+				.message = (char *) "message\n",
+			},
+		},
+	};
+	struct reftable_block_source source = { 0 };
+	struct reftable_log_record log = { 0 };
+	struct reftable_iterator it = { 0 };
+	struct reftable_table *table;
+	struct reftable_buf buf = REFTABLE_BUF_INIT;
+	size_t fsize = footer_size(1);
+	uint8_t *footer;
+
+	cl_reftable_write_to_buf(&buf, refs, ARRAY_SIZE(refs),
+				 logs, ARRAY_SIZE(logs), NULL);
+
+	/*
+	 * Corrupt the log section offset stored in the footer so that it points
+	 * past the end of the table. The footer is checksummed, so we also have
+	 * to recompute and rewrite the CRC.
+	 */
+	footer = (uint8_t *) buf.buf + buf.len - fsize;
+	reftable_put_be64(footer + header_size(1) + 24, UINT64_MAX);
+	reftable_put_be32(footer + fsize - 4, crc32(0, footer, fsize - 4));
+
+	block_source_from_buf(&source, &buf);
+	cl_must_pass(reftable_table_new(&table, &source, "name"));
+
+	/*
+	 * Seeking the log iterator must not crash even though the log section
+	 * offset is bogus. As the offset points past the end of the table we
+	 * know that the table is corrupt, so the seek must report a format
+	 * error instead of pretending that the section is empty.
+	 */
+	reftable_table_init_log_iterator(table, &it);
+	cl_assert_equal_i(reftable_iterator_seek_log(&it, ""),
+			  REFTABLE_FORMAT_ERROR);
+
+	reftable_log_record_release(&log);
+	reftable_iterator_destroy(&it);
+	reftable_table_decref(table);
+	reftable_buf_release(&buf);
+}

-- 
2.55.0.rc1.745.g43192e7977.dirty


^ permalink raw reply related

* [PATCH 09/11] reftable/block: fix OOB read with bogus restart offset
From: Patrick Steinhardt @ 2026-06-24  8:23 UTC (permalink / raw)
  To: git; +Cc: oxsignal
In-Reply-To: <20260624-pks-reftable-hardening-v1-0-66e4ce87c6b9@pks.im>

Restart points encode records in a given block that do not use prefix
compression and that can thus immediately be seeked to. These offsets
are encoded in the restart table, where each offset needs to point at
one of the records of the block. We do not verify this though, so a
bogus restart offset may cause an out-of-bounds read:

  ==1472280==ERROR: AddressSanitizer: SEGV on unknown address 0x7d8ff7de5f7f (pc 0x55555599502b bp 0x7fffffff4df0 sp 0x7fffffff4d40 T0)
  ==1472280==The signal is caused by a READ memory access.
      #0 0x55555599502b in get_var_int ./git/build/../reftable/record.c:30:6
      #1 0x555555995c2a in reftable_decode_keylen ./git/build/../reftable/record.c:177:6
      #2 0x55555598e85c in restart_needle_less ./git/build/../reftable/block.c:455:6
      #3 0x55555598895f in binsearch ./git/build/../reftable/basics.c:175:9
      #4 0x55555598e189 in block_iter_seek_key ./git/build/../reftable/block.c:543:6
      #5 0x555555814aee in test_reftable_block__corrupt_restart_offset ./git/build/../t/unit-tests/u-reftable-block.c:636:20
      #6 0x5555557f684e in clar_run_test ./git/build/../t/unit-tests/clar/clar.c:335:3
      #7 0x5555557f2e69 in clar_run_suite ./git/build/../t/unit-tests/clar/clar.c:431:3
      #8 0x5555557f2882 in clar_test_run ./git/build/../t/unit-tests/clar/clar.c:636:4
      #9 0x5555557f375f in clar_test ./git/build/../t/unit-tests/clar/clar.c:687:11
      #10 0x5555557fa49d in cmd_main ./git/build/../t/unit-tests/unit-test.c:62:8
      #11 0x55555584c25a in main ./git/build/../common-main.c:9:11
      #12 0x7ffff7a2b284 in __libc_start_call_main (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b284) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #13 0x7ffff7a2b337 in __libc_start_main@GLIBC_2.2.5 (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b337) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #14 0x555555694c24 in _start (./git/build/t/unit-tests+0x140c24)

  ==1472280==Register values:
  rax = 0x00007d8ff7de5f7f  rbx = 0x00007fffffff4e00  rcx = 0x00007d8ff7de5f80  rdx = 0x00007bfff5b6af60
  rdi = 0x00007bfff5b6af40  rsi = 0x00007bfff592dfa0  rbp = 0x00007fffffff4df0  rsp = 0x00007fffffff4d40
   r8 = 0x00000000ff00002b   r9 = 0x00007d8ff7de5f7f  r10 = 0x00000f7ffeb25bf0  r11 = 0xf3f30000f1f1f1f1
  r12 = 0x00007fffffff58f8  r13 = 0x0000000000000001  r14 = 0x00007ffff7ffd000  r15 = 0x0000555556055fd0
  AddressSanitizer can not provide additional info.
  SUMMARY: AddressSanitizer: SEGV ./git/build/../reftable/record.c:30:6 in get_var_int

Guard against such restart offsets and signal an error to the caller via
`args.error`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c                |  9 ++++++++
 t/unit-tests/u-reftable-block.c | 51 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+)

diff --git a/reftable/block.c b/reftable/block.c
index 89efce8751..1fa81405d2 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -440,6 +440,15 @@ static int restart_needle_less(size_t idx, void *_args)
 	uint8_t extra;
 	int n;
 
+	/*
+	 * The restart offset must point to a record, which is stored before
+	 * the restart table. Verify that this is the case.
+	 */
+	if (off >= args->block->restart_off) {
+		args->error = 1;
+		return -1;
+	}
+
 	/*
 	 * Records at restart points are stored without prefix compression, so
 	 * there is no need to fully decode the record key here. This removes
diff --git a/t/unit-tests/u-reftable-block.c b/t/unit-tests/u-reftable-block.c
index ba410a0885..77a054d082 100644
--- a/t/unit-tests/u-reftable-block.c
+++ b/t/unit-tests/u-reftable-block.c
@@ -591,3 +591,54 @@ void test_reftable_block__corrupt_restart_count(void)
 	block_writer_release(&writer);
 	reftable_buf_release(&data);
 }
+
+void test_reftable_block__corrupt_restart_offset(void)
+{
+	struct reftable_block_source source = { 0 };
+	struct block_writer writer = {
+		.last_key = REFTABLE_BUF_INIT,
+	};
+	struct reftable_record rec = {
+		.type = REFTABLE_BLOCK_TYPE_REF,
+		.u.ref = {
+			.value_type = REFTABLE_REF_VAL1,
+			.refname = (char *) "refs/heads/main",
+		},
+	};
+	struct reftable_block block = { 0 };
+	struct block_iter it = BLOCK_ITER_INIT;
+	struct reftable_buf want = REFTABLE_BUF_INIT;
+	struct reftable_buf data;
+
+	data.len = 1024;
+	REFTABLE_CALLOC_ARRAY(data.buf, data.len);
+	cl_assert(data.buf != NULL);
+
+	cl_must_pass(block_writer_init(&writer, REFTABLE_BLOCK_TYPE_REF,
+				       (uint8_t *) data.buf, data.len,
+				       0, hash_size(REFTABLE_HASH_SHA1)));
+	cl_must_pass(block_writer_add(&writer, &rec));
+	cl_assert(block_writer_finish(&writer) > 0);
+
+	block_source_from_buf(&source, &data);
+	cl_must_pass(reftable_block_init(&block, &source, 0, 0, data.len,
+					 REFTABLE_HASH_SIZE_SHA1, REFTABLE_BLOCK_TYPE_REF));
+
+	/*
+	 * Corrupt the first restart offset, stored as a big-endian 24-bit
+	 * integer at the start of the restart table, to point past the end of
+	 * the records section. Seeking such a block must fail gracefully.
+	 */
+	reftable_put_be24((uint8_t *) block.block_data.data + block.restart_off,
+			  0xffffff);
+
+	block_iter_init(&it, &block);
+	cl_must_pass(reftable_buf_addstr(&want, "refs/heads/main"));
+	cl_assert_equal_i(block_iter_seek_key(&it, &want), REFTABLE_FORMAT_ERROR);
+
+	reftable_buf_release(&want);
+	block_iter_close(&it);
+	reftable_block_release(&block);
+	block_writer_release(&writer);
+	reftable_buf_release(&data);
+}

-- 
2.55.0.rc1.745.g43192e7977.dirty


^ permalink raw reply related

* [PATCH 08/11] reftable/block: fix use of uninitialized memory when binsearch fails
From: Patrick Steinhardt @ 2026-06-24  8:23 UTC (permalink / raw)
  To: git; +Cc: oxsignal
In-Reply-To: <20260624-pks-reftable-hardening-v1-0-66e4ce87c6b9@pks.im>

When doing the binary search through our restart offsets we may hit an
error in case `restart_needle_less()` fails to decode the record at the
given offset. While we correctly detect this case and error out, it will
cause us to call `reftable_record_release()` on the yet-uninitialized
record.

Fix this by initializing the record earlier.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index 4d285aefd7..89efce8751 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -517,6 +517,10 @@ int block_iter_seek_key(struct block_iter *it, struct reftable_buf *want)
 	int err = 0;
 	size_t i;
 
+	err = reftable_record_init(&rec, reftable_block_type(it->block));
+	if (err < 0)
+		goto done;
+
 	/*
 	 * Perform a binary search over the block's restart points, which
 	 * avoids doing a linear scan over the whole block. Like this, we
@@ -558,10 +562,6 @@ int block_iter_seek_key(struct block_iter *it, struct reftable_buf *want)
 	else
 		it->next_off = it->block->header_off + 4;
 
-	err = reftable_record_init(&rec, reftable_block_type(it->block));
-	if (err < 0)
-		goto done;
-
 	/*
 	 * We're looking for the last entry less than the wanted key so that
 	 * the next call to `block_reader_next()` would yield the wanted

-- 
2.55.0.rc1.745.g43192e7977.dirty


^ permalink raw reply related

* [PATCH 07/11] reftable/block: fix OOB read with bogus restart count
From: Patrick Steinhardt @ 2026-06-24  8:23 UTC (permalink / raw)
  To: git; +Cc: oxsignal
In-Reply-To: <20260624-pks-reftable-hardening-v1-0-66e4ce87c6b9@pks.im>

The restart count is stored in the last two bytes of a block. We use it
without verification to compute the offset of the restart table. With a
bogus restart count that is large enough this computation underflows,
and the subsequent reads via the restart table access out-of-bounds
memory:

  ==129439==ERROR: AddressSanitizer: SEGV on unknown address 0x7d90f6dcd0ad (pc 0x55555598ce89 bp 0x7fffffff4ed0 sp 0x7fffffff4e80 T0)
  ==129439==The signal is caused by a READ memory access.
      #0 0x55555598ce89 in reftable_get_be24 ./git/build/../reftable/basics.h:125:9
      #1 0x55555598eabf in block_restart_offset ./git/build/../reftable/block.c:407:9
      #2 0x55555598e5d5 in restart_needle_less ./git/build/../reftable/block.c:431:17
      #3 0x5555559887e2 in binsearch ./git/build/../reftable/basics.c:165:13
      #4 0x55555598dfec in block_iter_seek_key ./git/build/../reftable/block.c:529:6
      #5 0x555555814517 in test_reftable_block__corrupt_restart_count ./git/build/../t/unit-tests/u-reftable-block.c:593:15
      #6 0x5555557f684e in clar_run_test ./git/build/../t/unit-tests/clar/clar.c:335:3
      #7 0x5555557f2e69 in clar_run_suite ./git/build/../t/unit-tests/clar/clar.c:431:3
      #8 0x5555557f2882 in clar_test_run ./git/build/../t/unit-tests/clar/clar.c:636:4
      #9 0x5555557f375f in clar_test ./git/build/../t/unit-tests/clar/clar.c:687:11
      #10 0x5555557fa49d in cmd_main ./git/build/../t/unit-tests/unit-test.c:62:8
      #11 0x55555584c12a in main ./git/build/../common-main.c:9:11
      #12 0x7ffff7a2b284 in __libc_start_call_main (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b284) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #13 0x7ffff7a2b337 in __libc_start_main@GLIBC_2.2.5 (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b337) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #14 0x555555694c24 in _start (./git/build/t/unit-tests+0x140c24)

  ==129439==Register values:
  rax = 0x00007d90f6dcd0ad  rbx = 0x00007fffffff4f20  rcx = 0xf2f2f2f8f2f2f2f8  rdx = 0x0000000000000000
  rdi = 0x00007d90f6dcd0ad  rsi = 0x0000000000007fff  rbp = 0x00007fffffff4ed0  rsp = 0x00007fffffff4e80
   r8 = 0x0000000000000000   r9 = 0x0000000000000000  r10 = 0x0000000000000000  r11 = 0x0000000000000017
  r12 = 0x00007fffffff58e8  r13 = 0x0000000000000001  r14 = 0x00007ffff7ffd000  r15 = 0x00005555560550b0
  AddressSanitizer can not provide additional info.
  SUMMARY: AddressSanitizer: SEGV ./git/build/../reftable/basics.h:125:9 in reftable_get_be24

Verify that the restart table actually fits into the block.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c                |  4 ++++
 t/unit-tests/u-reftable-block.c | 46 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/reftable/block.c b/reftable/block.c
index 4d6b11c2e7..4d285aefd7 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -351,6 +351,10 @@ int reftable_block_init(struct reftable_block *block,
 
 	restart_count = reftable_get_be16(block->block_data.data + block_size - 2);
 	restart_off = block_size - 2 - 3 * restart_count;
+	if (restart_off < header_size + 4 || restart_off > block_size - 2) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
 
 	block->block_type = block_type;
 	block->hash_size = hash_size;
diff --git a/t/unit-tests/u-reftable-block.c b/t/unit-tests/u-reftable-block.c
index 1f35aed91a..ba410a0885 100644
--- a/t/unit-tests/u-reftable-block.c
+++ b/t/unit-tests/u-reftable-block.c
@@ -545,3 +545,49 @@ void test_reftable_block__corrupt_block_size(void)
 	block_writer_release(&writer);
 	reftable_buf_release(&data);
 }
+
+void test_reftable_block__corrupt_restart_count(void)
+{
+	struct reftable_block_source source = { 0 };
+	struct block_writer writer = {
+		.last_key = REFTABLE_BUF_INIT,
+	};
+	struct reftable_record rec = {
+		.type = REFTABLE_BLOCK_TYPE_REF,
+		.u.ref = {
+			.value_type = REFTABLE_REF_VAL1,
+			.refname = (char *) "refs/heads/main",
+		},
+	};
+	struct reftable_block block = { 0 };
+	struct reftable_buf data;
+	int block_size;
+
+	data.len = 1024;
+	REFTABLE_CALLOC_ARRAY(data.buf, data.len);
+	cl_assert(data.buf != NULL);
+
+	cl_must_pass(block_writer_init(&writer, REFTABLE_BLOCK_TYPE_REF,
+				       (uint8_t *) data.buf, data.len,
+				       0, hash_size(REFTABLE_HASH_SHA1)));
+	cl_must_pass(block_writer_add(&writer, &rec));
+	block_size = block_writer_finish(&writer);
+	cl_assert(block_size > 0);
+
+	/*
+	 * Corrupt the restart count to claim a bogus number of restart points.
+	 * Note that this would only cause us to perform an out-of-bounds
+	 * access when seeking into the block, but we want to refuse such a
+	 * block outright.
+	 */
+	reftable_put_be16((uint8_t *) data.buf + block_size - 2, 0xffff);
+
+	block_source_from_buf(&source, &data);
+	cl_assert_equal_i(reftable_block_init(&block, &source, 0, 0, data.len,
+					      REFTABLE_HASH_SIZE_SHA1, REFTABLE_BLOCK_TYPE_REF),
+			  REFTABLE_FORMAT_ERROR);
+
+	reftable_block_release(&block);
+	block_writer_release(&writer);
+	reftable_buf_release(&data);
+}

-- 
2.55.0.rc1.745.g43192e7977.dirty


^ permalink raw reply related

* [PATCH 06/11] reftable/block: fix OOB read with bogus block size
From: Patrick Steinhardt @ 2026-06-24  8:23 UTC (permalink / raw)
  To: git; +Cc: oxsignal
In-Reply-To: <20260624-pks-reftable-hardening-v1-0-66e4ce87c6b9@pks.im>

The block size is read from the block header, which is untrusted data.
We use it without verification to access the restart count at the end of
the block as well as to compute the restart table offset. With a bogus
block size that exceeds the data we have actually read this can lead to
an out-of-bounds read:

  ==1458284==ERROR: AddressSanitizer: SEGV on unknown address 0x7d8ff7de4b7d (pc 0x55555598c339 bp 0x7fffffff4ef0 sp 0x7fffffff4eb0 T0)
  ==1458284==The signal is caused by a READ memory access.
      #0 0x55555598c339 in reftable_get_be16 ./build/../reftable/basics.h:118:9
      #1 0x55555598bee2 in reftable_block_init ./build/../reftable/block.c:344:18
      #2 0x555555813e0e in test_reftable_block__corrupt_block_size ./build/../t/unit-tests/u-reftable-block.c:540:8
      #3 0x5555557f684e in clar_run_test ./build/../t/unit-tests/clar/clar.c:335:3
      #4 0x5555557f2e69 in clar_run_suite ./build/../t/unit-tests/clar/clar.c:431:3
      #5 0x5555557f2882 in clar_test_run ./build/../t/unit-tests/clar/clar.c:636:4
      #6 0x5555557f375f in clar_test ./build/../t/unit-tests/clar/clar.c:687:11
      #7 0x5555557fa49d in cmd_main ./build/../t/unit-tests/unit-test.c:62:8
      #8 0x55555584b55a in main ./build/../common-main.c:9:11
      #9 0x7ffff7a2b284 in __libc_start_call_main (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b284) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #10 0x7ffff7a2b337 in __libc_start_main@GLIBC_2.2.5 (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b337) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #11 0x555555694c24 in _start (./build/t/unit-tests+0x140c24)

  ==1458284==Register values:
  rax = 0x00007d8ff7de4b7d  rbx = 0x00007fffffff4f00  rcx = 0x0000000000000006  rdx = 0x0000000000000010
  rdi = 0x00007d8ff7de4b7d  rsi = 0x00007bfff5cf0420  rbp = 0x00007fffffff4ef0  rsp = 0x00007fffffff4eb0
   r8 = 0x00000f807eb960b8   r9 = 0x0000000000000001  r10 = 0x00007bfff5cf05e7  r11 = 0x000000000000000f
  r12 = 0x00007fffffff58f8  r13 = 0x0000000000000001  r14 = 0x0000555555ee8160  r15 = 0x0000000000000000
  AddressSanitizer can not provide additional info.

Verify that the claimed block size fits into the block data before using
it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c                |  9 +++++++++
 t/unit-tests/u-reftable-block.c | 45 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+)

diff --git a/reftable/block.c b/reftable/block.c
index b86cb9ec5a..4d6b11c2e7 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -340,6 +340,15 @@ int reftable_block_init(struct reftable_block *block,
 		full_block_size = block_size;
 	}
 
+	/*
+	 * Ensure that we have sufficient data available now to satisfy the
+	 * claimed block size.
+	 */
+	if (block_size > block->block_data.len) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
 	restart_count = reftable_get_be16(block->block_data.data + block_size - 2);
 	restart_off = block_size - 2 - 3 * restart_count;
 
diff --git a/t/unit-tests/u-reftable-block.c b/t/unit-tests/u-reftable-block.c
index 40274af5c0..1f35aed91a 100644
--- a/t/unit-tests/u-reftable-block.c
+++ b/t/unit-tests/u-reftable-block.c
@@ -500,3 +500,48 @@ void test_reftable_block__corrupt_log_block_size(void)
 	block_writer_release(&writer);
 	reftable_buf_release(&data);
 }
+
+void test_reftable_block__corrupt_block_size(void)
+{
+	struct reftable_block_source source = { 0 };
+	struct block_writer writer = {
+		.last_key = REFTABLE_BUF_INIT,
+	};
+	struct reftable_record rec = {
+		.type = REFTABLE_BLOCK_TYPE_REF,
+		.u.ref = {
+			.value_type = REFTABLE_REF_VAL1,
+			.refname = (char *) "refs/heads/main",
+		},
+	};
+	struct reftable_block block = { 0 };
+	struct reftable_buf data;
+
+	data.len = 1024;
+	REFTABLE_CALLOC_ARRAY(data.buf, data.len);
+	cl_assert(data.buf != NULL);
+
+	cl_must_pass(block_writer_init(&writer, REFTABLE_BLOCK_TYPE_REF,
+				       (uint8_t *) data.buf, data.len,
+				       0, hash_size(REFTABLE_HASH_SHA1)));
+	cl_must_pass(block_writer_add(&writer, &rec));
+	cl_assert(block_writer_finish(&writer) > 0);
+
+	/*
+	 * The block size is stored as a big-endian 24-bit integer right after
+	 * the one-byte block type at the start of the block. Corrupt it to
+	 * claim a size that is larger than the data we actually have. Reading
+	 * the restart count and restart table relative to such a bogus block
+	 * size must not access out-of-bounds memory.
+	 */
+	reftable_put_be24((uint8_t *) data.buf + 1, 0xffffff);
+
+	block_source_from_buf(&source, &data);
+	cl_assert_equal_i(reftable_block_init(&block, &source, 0, 0, data.len,
+					      REFTABLE_HASH_SIZE_SHA1, REFTABLE_BLOCK_TYPE_REF),
+			  REFTABLE_FORMAT_ERROR);
+
+	reftable_block_release(&block);
+	block_writer_release(&writer);
+	reftable_buf_release(&data);
+}

-- 
2.55.0.rc1.745.g43192e7977.dirty


^ permalink raw reply related

* [PATCH 05/11] reftable/block: fix OOB write with bogus inflated log size
From: Patrick Steinhardt @ 2026-06-24  8:23 UTC (permalink / raw)
  To: git; +Cc: oxsignal
In-Reply-To: <20260624-pks-reftable-hardening-v1-0-66e4ce87c6b9@pks.im>

The "log" reftable block stores reflog information. This information is
compressed using zlib. The inflated size is stored in the block header
so that callers can easily learn ahead of time how large of a buffer
they have to allocate to inflate the data in a single pass. So to
reconstruct the full inflated block we:

  - Copy over the header as-is, as it's not deflated.

  - Append the inflated data to the buffer.

The inflated block size stored in the header also includes the length of
the header itself. So to figure out the bytes that should be inflated by
zlib we need to subtract the header size, which is trusted data, from
the block size, which is untrusted data derived from the block header.

While we do verify that we were able to inflate all data as expected, we
don't verify ahead of time that the encoded block length is larger than
the header length. This can lead to an underflow, which makes zlib
assume that it can write more data into the target buffer than we have
allocated. The result is an out-of-bounds write:

  ==1422297==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c1ff6de5231 at pc 0x55555579a628 bp 0x7fffffff4f10 sp 0x7fffffff46d0
  WRITE of size 4 at 0x7c1ff6de5231 thread T0
      #0 0x55555579a627 in __asan_memcpy (./build/t/unit-tests+0x246627)
      #1 0x55555598b093 in reftable_block_init ./build/../reftable/block.c:277:3
      #2 0x555555813701 in test_reftable_block__corrupt_log_block_size ./build/../t/unit-tests/u-reftable-block.c:495:20
      #3 0x5555557f684e in clar_run_test ./build/../t/unit-tests/clar/clar.c:335:3
      #4 0x5555557f2e69 in clar_run_suite ./build/../t/unit-tests/clar/clar.c:431:3
      #5 0x5555557f2882 in clar_test_run ./build/../t/unit-tests/clar/clar.c:636:4
      #6 0x5555557f375f in clar_test ./build/../t/unit-tests/clar/clar.c:687:11
      #7 0x5555557fa49d in cmd_main ./build/../t/unit-tests/unit-test.c:62:8
      #8 0x55555584af4a in main ./build/../common-main.c:9:11
      #9 0x7ffff7a2b284 in __libc_start_call_main (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b284) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #10 0x7ffff7a2b337 in __libc_start_main@GLIBC_2.2.5 (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b337) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #11 0x555555694c24 in _start (./build/t/unit-tests+0x140c24)

  0x7c1ff6de5231 is located 0 bytes after 1-byte region [0x7c1ff6de5230,0x7c1ff6de5231)
  allocated by thread T0 here:
      #0 0x55555579db1b in realloc.part.0 asan_malloc_linux.cpp.o
      #1 0x5555559868d7 in reftable_realloc ./build/../reftable/basics.c:36:9
      #2 0x55555598a98f in reftable_alloc_grow ./build/../reftable/basics.h:229:10
      #3 0x55555598ae58 in reftable_block_init ./build/../reftable/block.c:269:3
      #4 0x555555813701 in test_reftable_block__corrupt_log_block_size ./build/../t/unit-tests/u-reftable-block.c:495:20
      #5 0x5555557f684e in clar_run_test ./build/../t/unit-tests/clar/clar.c:335:3
      #6 0x5555557f2e69 in clar_run_suite ./build/../t/unit-tests/clar/clar.c:431:3
      #7 0x5555557f2882 in clar_test_run ./build/../t/unit-tests/clar/clar.c:636:4
      #8 0x5555557f375f in clar_test ./build/../t/unit-tests/clar/clar.c:687:11
      #9 0x5555557fa49d in cmd_main ./build/../t/unit-tests/unit-test.c:62:8
      #10 0x55555584af4a in main ./build/../common-main.c:9:11
      #11 0x7ffff7a2b284 in __libc_start_call_main (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b284) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #12 0x7ffff7a2b337 in __libc_start_main@GLIBC_2.2.5 (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b337) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #13 0x555555694c24 in _start (./build/t/unit-tests+0x140c24)

  SUMMARY: AddressSanitizer: heap-buffer-overflow (./build/t/unit-tests+0x246627) in __asan_memcpy
  Shadow bytes around the buggy address:
    0x7c1ff6de4f80: fa fa fd fd fa fa fd fd fa fa fd fd fa fa fd fd
    0x7c1ff6de5000: fa fa fd fd fa fa fd fd fa fa fd fd fa fa fd fd
    0x7c1ff6de5080: fa fa fd fd fa fa fd fd fa fa fd fd fa fa fd fd
    0x7c1ff6de5100: fa fa fd fd fa fa fd fd fa fa fd fd fa fa fd fd
    0x7c1ff6de5180: fa fa fd fd fa fa fd fd fa fa fd fa fa fa fd fd
  =>0x7c1ff6de5200: fa fa 04 fa fa fa[01]fa fa fa fa fa fa fa fa fa
    0x7c1ff6de5280: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7c1ff6de5300: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7c1ff6de5380: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7c1ff6de5400: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7c1ff6de5480: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable:           00
    Partially addressable: 01 02 03 04 05 06 07
    Heap left redzone:       fa
    Freed heap region:       fd
    Stack left redzone:      f1
    Stack mid redzone:       f2
    Stack right redzone:     f3
    Stack after return:      f5
    Stack use after scope:   f8
    Global redzone:          f9
    Global init order:       f6
    Poisoned by user:        f7
    Container overflow:      fc
    Array cookie:            ac
    Intra object redzone:    bb
    ASan internal:           fe
    Left alloca redzone:     ca
    Right alloca redzone:    cb

Fix the bug by adding a sanity check and add a unit test.

Reported-by: oxsignal <awo@kakao.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c                |  9 +++++++++
 t/unit-tests/u-reftable-block.c | 44 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/reftable/block.c b/reftable/block.c
index 920b3f4486..b86cb9ec5a 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -260,6 +260,15 @@ int reftable_block_init(struct reftable_block *block,
 			goto done;
 	}
 
+	/*
+	 * Verify that the block size covers at least the table header, block
+	 * header and the 2 byte restart counter.
+	 */
+	if (block_size < header_size + 4 + 2) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
 	if (block_type == REFTABLE_BLOCK_TYPE_LOG) {
 		uint32_t block_header_skip = 4 + header_size;
 		uLong dst_len = block_size - block_header_skip;
diff --git a/t/unit-tests/u-reftable-block.c b/t/unit-tests/u-reftable-block.c
index f4bded7d26..40274af5c0 100644
--- a/t/unit-tests/u-reftable-block.c
+++ b/t/unit-tests/u-reftable-block.c
@@ -456,3 +456,47 @@ void test_reftable_block__iterator(void)
 	block_writer_release(&writer);
 	reftable_buf_release(&data);
 }
+
+void test_reftable_block__corrupt_log_block_size(void)
+{
+	struct reftable_block_source source = { 0 };
+	struct block_writer writer = {
+		.last_key = REFTABLE_BUF_INIT,
+	};
+	struct reftable_record rec = {
+		.type = REFTABLE_BLOCK_TYPE_LOG,
+		.u.log = {
+			.refname = (char *) "refs/heads/main",
+			.update_index = 1,
+			.value_type = REFTABLE_LOG_UPDATE,
+		},
+	};
+	struct reftable_block block = { 0 };
+	struct reftable_buf data;
+
+	data.len = 1024;
+	REFTABLE_CALLOC_ARRAY(data.buf, data.len);
+	cl_assert(data.buf != NULL);
+
+	cl_must_pass(block_writer_init(&writer, REFTABLE_BLOCK_TYPE_LOG,
+				       (uint8_t *) data.buf, data.len,
+				       0, hash_size(REFTABLE_HASH_SHA1)));
+	cl_must_pass(block_writer_add(&writer, &rec));
+	cl_assert(block_writer_finish(&writer) > 0);
+
+	/*
+	 * Log blocks store their inflated size as a big-endian 24-bit integer
+	 * right after the one-byte block type. Rewrite it to claim a size that
+	 * is smaller than the block header.
+	 */
+	reftable_put_be24((uint8_t *) data.buf + 1, 1);
+
+	block_source_from_buf(&source, &data);
+	cl_assert_equal_i(reftable_block_init(&block, &source, 0, 0, data.len,
+					      REFTABLE_HASH_SIZE_SHA1, REFTABLE_BLOCK_TYPE_LOG),
+			  REFTABLE_FORMAT_ERROR);
+
+	reftable_block_release(&block);
+	block_writer_release(&writer);
+	reftable_buf_release(&data);
+}

-- 
2.55.0.rc1.745.g43192e7977.dirty


^ permalink raw reply related

* [PATCH 04/11] reftable/record: don't abort when decoding invalid ref value type
From: Patrick Steinhardt @ 2026-06-24  8:23 UTC (permalink / raw)
  To: git; +Cc: oxsignal
In-Reply-To: <20260624-pks-reftable-hardening-v1-0-66e4ce87c6b9@pks.im>

When decoding a ref record we read its value type from the block. In
case the type itself is invalid we call `abort()`. This is rather
heavy-handed though: the data we're reading is untrusted, so we should
treat the issue as a normal and not as a programming error.

Fix this by handling the error gracefully. Note that this also requires
us to set the value type later, as otherwise we might store an invalid
type in the record.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/record.c                |  6 +++---
 t/unit-tests/u-reftable-record.c | 24 ++++++++++++++++++++++++
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/reftable/record.c b/reftable/record.c
index fcd387ba5d..1fce441930 100644
--- a/reftable/record.c
+++ b/reftable/record.c
@@ -388,7 +388,6 @@ static int reftable_ref_record_decode(void *rec, struct reftable_buf key,
 	r->refname[key.len] = 0;
 
 	r->update_index = update_index;
-	r->value_type = val_type;
 	switch (val_type) {
 	case REFTABLE_REF_VAL1:
 		if (in.len < hash_size) {
@@ -426,9 +425,10 @@ static int reftable_ref_record_decode(void *rec, struct reftable_buf key,
 	case REFTABLE_REF_DELETION:
 		break;
 	default:
-		abort();
-		break;
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
 	}
+	r->value_type = val_type;
 
 	return start.len - in.len;
 
diff --git a/t/unit-tests/u-reftable-record.c b/t/unit-tests/u-reftable-record.c
index 1bf2e170dc..9c95083ef4 100644
--- a/t/unit-tests/u-reftable-record.c
+++ b/t/unit-tests/u-reftable-record.c
@@ -11,6 +11,7 @@
 #include "reftable/basics.h"
 #include "reftable/constants.h"
 #include "reftable/record.h"
+#include "reftable/reftable-error.h"
 
 static void t_copy(struct reftable_record *rec)
 {
@@ -202,6 +203,29 @@ void test_reftable_record__ref_record_roundtrip(void)
 	reftable_buf_release(&scratch);
 }
 
+void test_reftable_record__ref_record_decode_invalid_value_type(void)
+{
+	struct reftable_buf scratch = REFTABLE_BUF_INIT;
+	struct reftable_record out = {
+		.type = REFTABLE_BLOCK_TYPE_REF,
+	};
+	struct reftable_buf key = REFTABLE_BUF_INIT;
+	uint8_t buffer[1024] = { 0 };
+	struct string_view dest = {
+		.buf = buffer,
+		.len = sizeof(buffer),
+	};
+
+	cl_must_pass(reftable_buf_addstr(&key, "refs/heads/master"));
+	cl_assert_equal_i(reftable_record_decode(&out, key, REFTABLE_NR_REF_VALUETYPES,
+						 dest, REFTABLE_HASH_SIZE_SHA1, &scratch),
+			  REFTABLE_FORMAT_ERROR);
+
+	reftable_record_release(&out);
+	reftable_buf_release(&key);
+	reftable_buf_release(&scratch);
+}
+
 void test_reftable_record__log_record_comparison(void)
 {
 	struct reftable_record in[3] = {

-- 
2.55.0.rc1.745.g43192e7977.dirty


^ permalink raw reply related

* [PATCH 03/11] reftable/basics: fix OOB read on binary search of empty range
From: Patrick Steinhardt @ 2026-06-24  8:23 UTC (permalink / raw)
  To: git; +Cc: oxsignal
In-Reply-To: <20260624-pks-reftable-hardening-v1-0-66e4ce87c6b9@pks.im>

`binsearch()` performs a binary search over a range of `sz` elements by
repeatedly calling the comparison function with indices into that range.
When the range is empty though, there is no valid index to call the
comparison function with. We still end up executing the comparison
function though with an index of 0, which of course will cause an
out-of-bounds read.

Return early when the range is empty.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/basics.c                |  3 +++
 t/unit-tests/u-reftable-basics.c | 11 +++++++++++
 2 files changed, 14 insertions(+)

diff --git a/reftable/basics.c b/reftable/basics.c
index e969927b61..f0442a46cf 100644
--- a/reftable/basics.c
+++ b/reftable/basics.c
@@ -152,6 +152,9 @@ size_t binsearch(size_t sz, int (*f)(size_t k, void *args), void *args)
 	size_t lo = 0;
 	size_t hi = sz;
 
+	if (!sz)
+		return 0;
+
 	/* Invariants:
 	 *
 	 *  (hi == sz) || f(hi) == true
diff --git a/t/unit-tests/u-reftable-basics.c b/t/unit-tests/u-reftable-basics.c
index 73566ed0eb..c5d83b6714 100644
--- a/t/unit-tests/u-reftable-basics.c
+++ b/t/unit-tests/u-reftable-basics.c
@@ -60,6 +60,17 @@ void test_reftable_basics__binsearch(void)
 	}
 }
 
+static int unreachable_lesseq(size_t i UNUSED, void *args UNUSED)
+{
+	cl_fail("comparison function called for empty range");
+	return 0;
+}
+
+void test_reftable_basics__binsearch_empty(void)
+{
+	cl_assert_equal_i(binsearch(0, &unreachable_lesseq, NULL), 0);
+}
+
 void test_reftable_basics__names_length(void)
 {
 	const char *a[] = { "a", "b", NULL };

-- 
2.55.0.rc1.745.g43192e7977.dirty


^ permalink raw reply related

* [PATCH 02/11] oss-fuzz: add fuzzer for parsing reftables
From: Patrick Steinhardt @ 2026-06-24  8:23 UTC (permalink / raw)
  To: git; +Cc: oxsignal
In-Reply-To: <20260624-pks-reftable-hardening-v1-0-66e4ce87c6b9@pks.im>

Add a new fuzzer that exercises our parsing of reftables. Fallout from
this fuzzer will be fixed over subsequent commits.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Makefile                            |  1 +
 ci/run-build-and-minimal-fuzzers.sh |  1 +
 oss-fuzz/.gitignore                 |  1 +
 oss-fuzz/fuzz-reftable.c            | 74 +++++++++++++++++++++++++++++++++++++
 oss-fuzz/meson.build                |  1 +
 5 files changed, 78 insertions(+)

diff --git a/Makefile b/Makefile
index 1cec251f43..89d3edd5ea 100644
--- a/Makefile
+++ b/Makefile
@@ -2599,6 +2599,7 @@ FUZZ_OBJS += oss-fuzz/fuzz-date.o
 FUZZ_OBJS += oss-fuzz/fuzz-pack-headers.o
 FUZZ_OBJS += oss-fuzz/fuzz-pack-idx.o
 FUZZ_OBJS += oss-fuzz/fuzz-parse-attr-line.o
+FUZZ_OBJS += oss-fuzz/fuzz-reftable.o
 FUZZ_OBJS += oss-fuzz/fuzz-url-decode-mem.o
 .PHONY: fuzz-objs
 fuzz-objs: $(FUZZ_OBJS)
diff --git a/ci/run-build-and-minimal-fuzzers.sh b/ci/run-build-and-minimal-fuzzers.sh
index e7b97952e7..37b24b092d 100755
--- a/ci/run-build-and-minimal-fuzzers.sh
+++ b/ci/run-build-and-minimal-fuzzers.sh
@@ -21,6 +21,7 @@ date
 pack-headers
 pack-idx
 parse-attr-line
+reftable
 url-decode-mem
 "
 
diff --git a/oss-fuzz/.gitignore b/oss-fuzz/.gitignore
index f2d74de457..dc7a127a62 100644
--- a/oss-fuzz/.gitignore
+++ b/oss-fuzz/.gitignore
@@ -5,4 +5,5 @@ fuzz-date
 fuzz-pack-headers
 fuzz-pack-idx
 fuzz-parse-attr-line
+fuzz-reftable
 fuzz-url-decode-mem
diff --git a/oss-fuzz/fuzz-reftable.c b/oss-fuzz/fuzz-reftable.c
new file mode 100644
index 0000000000..c46eac2c6b
--- /dev/null
+++ b/oss-fuzz/fuzz-reftable.c
@@ -0,0 +1,74 @@
+#include "git-compat-util.h"
+#include "reftable/basics.h"
+#include "reftable/blocksource.h"
+#include "reftable/reftable-blocksource.h"
+#include "reftable/reftable-error.h"
+#include "reftable/reftable-iterator.h"
+#include "reftable/reftable-record.h"
+#include "reftable/reftable-table.h"
+#include "reftable/reftable-writer.h"
+
+int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size);
+
+int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
+{
+	struct reftable_block_source source = { 0 };
+	struct reftable_buf buf = REFTABLE_BUF_INIT;
+	struct reftable_table *table = NULL;
+	int err;
+
+	if (reftable_buf_add(&buf, (const char *)data, size) < 0)
+		goto out;
+	block_source_from_buf(&source, &buf);
+
+	err = reftable_table_new(&table, &source, "fuzz-input");
+	if (err < 0)
+		goto out;
+
+	/*
+	 * Exercise the ref, log and raw block iterators so that we cover as
+	 * much of the parsing code as possible.
+	 */
+	{
+		struct reftable_ref_record ref = { 0 };
+		struct reftable_iterator it = { 0 };
+
+		reftable_table_init_ref_iterator(table, &it);
+		if (!reftable_iterator_seek_ref(&it, ""))
+			while (!reftable_iterator_next_ref(&it, &ref))
+				;
+
+		reftable_ref_record_release(&ref);
+		reftable_iterator_destroy(&it);
+	}
+
+	{
+		struct reftable_log_record log = { 0 };
+		struct reftable_iterator it = { 0 };
+
+		reftable_table_init_log_iterator(table, &it);
+		if (!reftable_iterator_seek_log(&it, ""))
+			while (!reftable_iterator_next_log(&it, &log))
+				;
+
+		reftable_log_record_release(&log);
+		reftable_iterator_destroy(&it);
+	}
+
+	{
+		struct reftable_table_iterator it = { 0 };
+		const struct reftable_block *block;
+
+		if (!reftable_table_iterator_init(&it, table))
+			while (!reftable_table_iterator_next(&it, &block))
+				;
+
+		reftable_table_iterator_release(&it);
+	}
+
+out:
+	if (table)
+		reftable_table_decref(table);
+	reftable_buf_release(&buf);
+	return 0;
+}
diff --git a/oss-fuzz/meson.build b/oss-fuzz/meson.build
index 10bcac2f6d..5a3854256b 100644
--- a/oss-fuzz/meson.build
+++ b/oss-fuzz/meson.build
@@ -6,6 +6,7 @@ fuzz_programs = [
   'fuzz-pack-headers.c',
   'fuzz-pack-idx.c',
   'fuzz-parse-attr-line.c',
+  'fuzz-reftable.c',
   'fuzz-url-decode-mem.c',
 ]
 

-- 
2.55.0.rc1.745.g43192e7977.dirty


^ permalink raw reply related

* [PATCH 01/11] meson: support building fuzzers with libFuzzer
From: Patrick Steinhardt @ 2026-06-24  8:23 UTC (permalink / raw)
  To: git; +Cc: oxsignal
In-Reply-To: <20260624-pks-reftable-hardening-v1-0-66e4ce87c6b9@pks.im>

To support fuzzing via libFuzzer one has to pass a couple of compiler
options:

  - It is mandatory to enable the "fuzzer-no-link" sanitizer for
    coverage feedback.

  - It is recommended to enable at least one more sanitizer to catch
    issues, like the "address" sanitizer.

  - The fuzzing executables need to be linked with "-fsanitize=fuzzer"
    to wire up libFuzzer itself.

The first two items can already be achieved via the "-Db_sanitize="
option. But the last item cannot easily be achieved, as we can only
configure global link arguments.

Introduce a new "-Dfuzzers_link_args=" build option to plug this gap.
Add documentation so that users know how to set up libFuzzer.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 meson.build          | 15 +++++++++++++++
 meson_options.txt    |  2 ++
 oss-fuzz/meson.build |  1 +
 3 files changed, 18 insertions(+)

diff --git a/meson.build b/meson.build
index 3247697f74..9df6fbb0a5 100644
--- a/meson.build
+++ b/meson.build
@@ -161,6 +161,21 @@
 # These machine files can be passed to `meson setup` via the `--native-file`
 # option.
 #
+# Fuzzing
+# =======
+#
+# Meson supports building the fuzzing targets by setting `-Dfuzzers=true`. By
+# default, the targets will be built without libFuzzer and thus won't be usable
+# for fuzzing. You have to configure a couple of options to properly wire up
+# libFuzzer:
+#
+#   $ meson setup build-fuzzers \
+#       -Db_sanitize=address,fuzzer-no-link \
+#       -Dfuzzers=true \
+#       -Dfuzzers_link_args=-fsanitize=fuzzer
+#   $ meson compile -C build-fuzzers
+#   $ ./build-fuzzers/oss-fuzz/fuzz-config <args>
+#
 # Cross compilation
 # =================
 #
diff --git a/meson_options.txt b/meson_options.txt
index d936ada098..dc88f130d7 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -131,3 +131,5 @@ option('test_utf8_locale', type: 'string',
   description: 'Name of a UTF-8 locale used for testing.')
 option('fuzzers', type: 'boolean', value: false,
   description: 'Enable building fuzzers.')
+option('fuzzers_link_args', type: 'array', value: [],
+  description: 'Linker arguments used to link fuzzers. Use -fsanitize=fuzzer for fuzzing.')
diff --git a/oss-fuzz/meson.build b/oss-fuzz/meson.build
index 878afd8426..10bcac2f6d 100644
--- a/oss-fuzz/meson.build
+++ b/oss-fuzz/meson.build
@@ -16,5 +16,6 @@ foreach fuzz_program : fuzz_programs
       fuzz_program,
     ],
     dependencies: [libgit_commonmain],
+    link_args: get_option('fuzzers_link_args'),
   )
 endforeach

-- 
2.55.0.rc1.745.g43192e7977.dirty


^ permalink raw reply related

* [PATCH 00/11] reftable: harden against corrupted tables
From: Patrick Steinhardt @ 2026-06-24  8:23 UTC (permalink / raw)
  To: git; +Cc: oxsignal

Hi,

this patch series addresses a bunch of errors that may happen when
trying to read corrupted tables. These errors include out-of-bounds
writes, out-of-bounds reads and the ability to hit abort(3p) calls.

The out-of-bounds write was originally reported by awo on the security
mailing list. As we never transfer reftables over the protocol it would
require local disk access to create such corrupted reftables, so there
isn't really an easy way to exploit these.

In any case, I took that chance and wrote a fuzzer for parsing the
tables, which surfaced a bunch of issues. At the end of this series
though the fuzzer can now run for an extended amount of time (2hrs+)
without surfacing any new issues.

Thanks!

Patrick

---
Patrick Steinhardt (11):
      meson: support building fuzzers with libFuzzer
      oss-fuzz: add fuzzer for parsing reftables
      reftable/basics: fix OOB read on binary search of empty range
      reftable/record: don't abort when decoding invalid ref value type
      reftable/block: fix OOB write with bogus inflated log size
      reftable/block: fix OOB read with bogus block size
      reftable/block: fix OOB read with bogus restart count
      reftable/block: fix use of uninitialized memory when binsearch fails
      reftable/block: fix OOB read with bogus restart offset
      reftable/table: fix NULL pointer access when seeking to bogus offsets
      reftable/table: fix OOB read on truncated table

 Makefile                            |   1 +
 ci/run-build-and-minimal-fuzzers.sh |   1 +
 meson.build                         |  15 +++
 meson_options.txt                   |   2 +
 oss-fuzz/.gitignore                 |   1 +
 oss-fuzz/fuzz-reftable.c            |  74 ++++++++++++++
 oss-fuzz/meson.build                |   2 +
 reftable/basics.c                   |   3 +
 reftable/block.c                    |  39 +++++++-
 reftable/record.c                   |   6 +-
 reftable/table.c                    |   7 ++
 t/unit-tests/u-reftable-basics.c    |  11 +++
 t/unit-tests/u-reftable-block.c     | 186 ++++++++++++++++++++++++++++++++++++
 t/unit-tests/u-reftable-record.c    |  24 +++++
 t/unit-tests/u-reftable-table.c     |  91 ++++++++++++++++++
 15 files changed, 456 insertions(+), 7 deletions(-)


---
base-commit: ab776a62a78576513ee121424adb19597fbb7613
change-id: 20260623-pks-reftable-hardening-f54de69fea63


^ permalink raw reply

* Re: [PATCH v3 4/4] notes: support an external command to display notes
From: Johannes Sixt @ 2026-06-24  7:49 UTC (permalink / raw)
  To: Siddh Raman Pant
  Cc: Kristoffer Haugsbakk, Junio C Hamano, Patrick Steinhardt,
	Elijah Newren, brian m. carlson, Jeff King, Oswald Buddenhagen,
	git
In-Reply-To: <7284a8bccb6bfb5734adb09f05ae4b61a63da2df.1779532562.git.siddh.raman.pant@oracle.com>

Am 23.05.26 um 12:38 schrieb Siddh Raman Pant:
> git notes is a very very helpful feature to show user-supplied
> information about a commit alongside its message transparently.
> 
> For distributed teams working on large git repos (huge number of
> branches/refs, files, etc.) and using the notes feature to mark
> information on git commits, the problem is often not that two users
> update the same note object at the same time. It is that the local
> notes state used while reading history can be stale.
> 
> In kernel work, the same logical upstream fix can appear as different
> commit objects across many downstream branches, such as the stable
> branches and vendor-specific branches (based on which the released
> kernel is actually built). Different developers may be working on those
> branches in parallel, and a review decision recorded for one backport
> is useful context for the others.
> 
> Today, seeing that decision in ordinary history output requires first
> synchronizing the local notes ref, and then interpreting those notes
> for the branch being inspected. The latter step is workflow-specific
> and can be cheap, but keeping the local notes state fresh enough can be
> expensive in a large kernel repository with a large shared notes
> history (and if we are to extrapolate, a slow git server conn/ops can
> be a factor too).
> 
> This TOCTOU problem exacerbates on scale (rapid updates, more devs,
> larger repos, more git server traffic, etc).
> 
> One solution to this is to move the freshness policy out of git so that
> it is someone else's problem. We can have a realtime fetch or faster
> updation via external helper means. But unfortunately we lose the
> coherence in the display of information, and so the user would end up
> reinventing git log in his quest to have same workflow.

You are presenting one solution here. But a more obvious solution would
have been to make Git's notes implementation capable enough to keep up
with the volume of notes that are produced by your team.

Another solution would be to track the information outside of Git notes
entirely, similar to how pull requests, issues, reviews, and
conversations are tracked by Git hosters in databases outside of Git.

> Let's add support for notes.externalCommand, a protected-configuration
> command that git runs as a long-lived helper when displaying notes. git
> sends commit IDs to the helper and displays any returned text through
> the existing notes formatting path. This keeps presentation in git
> while letting the helper decide how fresh note text is obtained.

To my eyes, this looks like an overengineered solution that helps one
user of a niche feature of Git.

-- Hannes


^ permalink raw reply

* Re: [PATCH] bundle-uri: drain remaining response on invalid bundle-uri lines
From: Toon Claes @ 2026-06-24  7:49 UTC (permalink / raw)
  To: Justin Tobler, Patrick Steinhardt; +Cc: git
In-Reply-To: <adkOJLfxs8TNGRjr@denethor>

Hi,

My apologies for digging up this old thread, but it was suddenly brought
back to my attention. Anyhow:

Justin Tobler <jltobler@gmail.com> writes:

> I think it is questionable for a Git server to be sending clients
> malformed bundle-uri configuration.

I will not argue about that.

> Do other Git implementations on the server-side exhibit this same
> behavior? If so, or we reasonably think they could and just want to be
> safe, then I agree that adjusting clients first to ignore invalid
> bundle-uri configuration from the server is reasonable.
>
> Generally, I'm of the mindset that when a server is sending
> malformed/garbage data that the client doesn't expect, the client should
> should be more strict and error out. In this case though, since there
> are known affected Git versions and bundle-uri is an optional feature to
> begin with, it probably doesn't hurt to be more permissive.

Yeah, that's the point I was trying to make. The use of bundle-uri is
optional, and clone can continue without it. The code was intentionally
written to continue when something goes wrong with bundle-uris. But
because of some underlaying issue I was trying to fix with this patch,
the process does not continue.

> On 26/04/10 08:31AM, Patrick Steinhardt wrote:

>> That being said, I also think that we should fix the server side.
>> Whether that needs to be part of this patch series though is a different
>> question. Based on the proposed patch you posted it seems to be trivial
>> enough though, so maybe it's worth it to just add that in as a second
>> patch.
>
> Ya, my main concern was that a client-side fix would mask its root
> cause. As long as it gets addressed though it's fine. I think it would
> be worth adding to this series, but if not I'm happy to send a follow up
> patch to fix it too.

I do not fully agree. My fix doesn't make the issue go away silently,
the user gets a warning message. I think this would cause (at least
some) users to complain to the owner of the server (especially because
bundle-URI is an opt-in feature). But I realize now, this warning isn't
checked in the tests, adding that would have made that more clear.

I do agree though a server-side fix would be advised. But I have no idea
how to best address this. In a previous mail you wrote:

> Naively, I would assume the easiest way to fix the issue on the
> server-side would be the following:
> 
> --- >8 ---
> diff --git a/bundle-uri.c b/bundle-uri.c
> index 3b2e347288..96d38bb80f 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -946,7 +946,7 @@ static int config_to_packet_line(const char *key, const char *value,
>  {
>         struct packet_reader *writer = data;
> 
> -       if (starts_with(key, "bundle."))
> +       if (starts_with(key, "bundle.") && value && *value)
>                 packet_write_fmt(writer->fd, "%s=%s", key, value);
> 
>         return 0;
> ---- >8 ---
>
> A quick check using the tests provided in this patch seems to show them
> passing with the above. If we want, we could also have the server print
> a warning on its end regarding the missing value too.

I don't like this fix, because it papers over the issue, silently. But
then again, what is the best way to inform the server admin there's
something wrong? Adding one line to the log files is easily to be
missed.

-- 
Cheers,
Toon

^ permalink raw reply

* [PATCH 4/6] odb/transaction: propagate commit errors
From: Justin Tobler @ 2026-06-24  4:19 UTC (permalink / raw)
  To: git; +Cc: ps, Justin Tobler
In-Reply-To: <20260624041920.2601961-1-jltobler@gmail.com>

When `odb_transaction_commit()` is invoked, the return value of the
backend commit callback is silently discarded. A backend has no way
to signal that committing failed, such as when the "files" backend
cannot migrate its temporary object directory into the permanent
ODB.

In a subsequent commit, git-receive-pack(1) starts using ODB transaction
to stage objects and consequently cares about such failures so it can
handle the error appropriately. Change the commit callback signature to
return an int error code and have `odb_transaction_commit()` forward it
accordingly.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 odb/transaction.c | 13 ++++++++++---
 odb/transaction.h |  2 +-
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/odb/transaction.c b/odb/transaction.c
index d3de01db50..b20d6a16f8 100644
--- a/odb/transaction.c
+++ b/odb/transaction.c
@@ -18,19 +18,26 @@ int odb_transaction_begin(struct object_database *odb,
 	return ret;
 }
 
-void odb_transaction_commit(struct odb_transaction *transaction)
+int odb_transaction_commit(struct odb_transaction *transaction)
 {
+	int ret;
+
 	if (!transaction)
-		return;
+		return 0;
 
 	/*
 	 * Ensure the transaction ending matches the pending transaction.
 	 */
 	ASSERT(transaction == transaction->source->odb->transaction);
 
-	transaction->commit(transaction);
+	ret = transaction->commit(transaction);
+	if (ret)
+		return ret;
+
 	transaction->source->odb->transaction = NULL;
 	free(transaction);
+
+	return 0;
 }
 
 int odb_transaction_write_object_stream(struct odb_transaction *transaction,
diff --git a/odb/transaction.h b/odb/transaction.h
index cd6d50f2e5..7898770071 100644
--- a/odb/transaction.h
+++ b/odb/transaction.h
@@ -54,7 +54,7 @@ static inline void odb_transaction_begin_or_die(struct object_database *odb,
  * Commits an ODB transaction making the written objects visible. If the
  * specified transaction is NULL, the function is a no-op.
  */
-void odb_transaction_commit(struct odb_transaction *transaction);
+int odb_transaction_commit(struct odb_transaction *transaction);
 
 /*
  * Writes the object in the provided stream into the transaction. The resulting
-- 
2.54.0.105.g59ff4886a5


^ permalink raw reply related

* [PATCH 6/6] builtin/receive-pack: stage incoming objects via ODB transactions
From: Justin Tobler @ 2026-06-24  4:19 UTC (permalink / raw)
  To: git; +Cc: ps, Justin Tobler
In-Reply-To: <20260624041920.2601961-1-jltobler@gmail.com>

Objects received by git-receive-pack(1) are quarantined in a temporary
"incoming" directory and migrated into the object database prior to the
reference updates. The quarantine is currently managed through
`tmp_objdir` directly. In a pluggable ODB future, how exactly an object
gets written to a transaction may vary for a given ODB source. Refactor
git-receive-pack(1) to use the ODB transaction interfaces to manage the
object staging area in a more agnostic manner accordingly.

Note that the temporary directory created for git-receive-pack(1) is
eagerly created and uses a different prefix name. This behavior is
special cased in the "files" backend by having `odb_transaction_begin()`
callers that require this behavior provide an `ODB_TRANSACTION_RECEIVE`
flag.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/add.c            |  2 +-
 builtin/receive-pack.c   | 46 ++++++++++++++++------------------------
 builtin/unpack-objects.c |  2 +-
 builtin/update-index.c   |  2 +-
 cache-tree.c             |  2 +-
 object-file.c            | 22 ++++++++++++++++---
 object-file.h            |  4 +++-
 odb/source-files.c       |  5 +++--
 odb/source-inmemory.c    |  3 ++-
 odb/source-loose.c       |  3 ++-
 odb/source.h             |  9 +++++---
 odb/transaction.c        |  5 +++--
 odb/transaction.h        | 13 ++++++++----
 read-cache.c             |  2 +-
 14 files changed, 70 insertions(+), 50 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index 3d5d9cfdb9..60ffbede2b 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -581,7 +581,7 @@ int cmd_add(int argc,
 		string_list_clear(&only_match_skip_worktree, 0);
 	}
 
-	odb_transaction_begin_or_die(repo->objects, &transaction);
+	odb_transaction_begin_or_die(repo->objects, &transaction, 0);
 
 	ps_matched = xcalloc(pathspec.nr, 1);
 	if (add_renormalize)
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 19eb6a1b61..ee8e03e2ab 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -112,8 +112,6 @@ static enum {
 } use_keepalive;
 static int keepalive_in_sec = 5;
 
-static struct tmp_objdir *tmp_objdir;
-
 static struct proc_receive_ref {
 	unsigned int want_add:1,
 		     want_delete:1,
@@ -959,8 +957,8 @@ static int run_receive_hook(struct command *commands,
 		strvec_push(&opt.env, "GIT_PUSH_OPTION_COUNT");
 	}
 
-	if (tmp_objdir)
-		strvec_pushv(&opt.env, tmp_objdir_env(tmp_objdir));
+	if (the_repository->objects->transaction)
+		strvec_pushv(&opt.env, odb_transaction_env(the_repository->objects->transaction));
 
 	prepare_push_cert_sha1(&opt);
 
@@ -1363,7 +1361,7 @@ static int update_shallow_ref(struct command *cmd, struct shallow_info *si)
 		    !delayed_reachability_test(si, i))
 			oid_array_append(&extra, &si->shallow->oid[i]);
 
-	opt.env = tmp_objdir_env(tmp_objdir);
+	opt.env = odb_transaction_env(the_repository->objects->transaction);
 	setup_alternate_shallow(&shallow_lock, &opt.shallow_file, &extra);
 	if (check_connected(command_singleton_iterator, cmd, &opt)) {
 		rollback_shallow_file(the_repository, &shallow_lock);
@@ -1802,7 +1800,7 @@ static void set_connectivity_errors(struct command *commands,
 			/* to be checked in update_shallow_ref() */
 			continue;
 
-		opt.env = tmp_objdir_env(tmp_objdir);
+		opt.env = odb_transaction_env(the_repository->objects->transaction);
 		if (!check_connected(command_singleton_iterator, &singleton,
 				     &opt))
 			continue;
@@ -2057,7 +2055,7 @@ static void execute_commands(struct command *commands,
 		data.si = si;
 		opt.err_fd = err_fd;
 		opt.progress = err_fd && !quiet;
-		opt.env = tmp_objdir_env(tmp_objdir);
+		opt.env = odb_transaction_env(the_repository->objects->transaction);
 		opt.exclude_hidden_refs_section = "receive";
 
 		if (check_connected(iterate_receive_command_list, &data, &opt))
@@ -2106,14 +2104,13 @@ static void execute_commands(struct command *commands,
 	 * Now we'll start writing out refs, which means the objects need
 	 * to be in their final positions so that other processes can see them.
 	 */
-	if (tmp_objdir_migrate(tmp_objdir) < 0) {
+	if (odb_transaction_commit(the_repository->objects->transaction)) {
 		for (cmd = commands; cmd; cmd = cmd->next) {
 			if (!cmd->error_string)
 				cmd->error_string = "unable to migrate objects to permanent storage";
 		}
 		return;
 	}
-	tmp_objdir = NULL;
 
 	check_aliased_updates(commands);
 
@@ -2326,7 +2323,8 @@ static void push_header_arg(struct strvec *args, struct pack_header *hdr)
 		     ntohl(hdr->hdr_version), ntohl(hdr->hdr_entries));
 }
 
-static const char *unpack(int err_fd, struct shallow_info *si)
+static const char *unpack(int err_fd, struct shallow_info *si,
+			  struct odb_transaction *transaction)
 {
 	struct pack_header hdr;
 	const char *hdr_err;
@@ -2351,20 +2349,7 @@ static const char *unpack(int err_fd, struct shallow_info *si)
 		strvec_push(&child.args, alt_shallow_file);
 	}
 
-	tmp_objdir = tmp_objdir_create(the_repository, "incoming");
-	if (!tmp_objdir) {
-		if (err_fd > 0)
-			close(err_fd);
-		return "unable to create temporary object directory";
-	}
-	strvec_pushv(&child.env, tmp_objdir_env(tmp_objdir));
-
-	/*
-	 * Normally we just pass the tmp_objdir environment to the child
-	 * processes that do the heavy lifting, but we may need to see these
-	 * objects ourselves to set up shallow information.
-	 */
-	tmp_objdir_add_as_alternate(tmp_objdir);
+	strvec_pushv(&child.env, odb_transaction_env(transaction));
 
 	if (ntohl(hdr.hdr_entries) < unpack_limit) {
 		strvec_push(&child.args, "unpack-objects");
@@ -2431,13 +2416,14 @@ static const char *unpack(int err_fd, struct shallow_info *si)
 	return NULL;
 }
 
-static const char *unpack_with_sideband(struct shallow_info *si)
+static const char *unpack_with_sideband(struct shallow_info *si,
+					struct odb_transaction *transaction)
 {
 	struct async muxer;
 	const char *ret;
 
 	if (!use_sideband)
-		return unpack(0, si);
+		return unpack(0, si, transaction);
 
 	use_keepalive = KEEPALIVE_AFTER_NUL;
 	memset(&muxer, 0, sizeof(muxer));
@@ -2446,7 +2432,7 @@ static const char *unpack_with_sideband(struct shallow_info *si)
 	if (start_async(&muxer))
 		return NULL;
 
-	ret = unpack(muxer.in, si);
+	ret = unpack(muxer.in, si, transaction);
 
 	finish_async(&muxer);
 	return ret;
@@ -2623,6 +2609,7 @@ int cmd_receive_pack(int argc,
 	struct oid_array ref = OID_ARRAY_INIT;
 	struct shallow_info si;
 	struct packet_reader reader;
+	struct odb_transaction *transaction = NULL;
 
 	struct option options[] = {
 		OPT__QUIET(&quiet, N_("quiet")),
@@ -2707,7 +2694,10 @@ int cmd_receive_pack(int argc,
 		if (!si.nr_ours && !si.nr_theirs)
 			shallow_update = 0;
 		if (!delete_only(commands)) {
-			unpack_status = unpack_with_sideband(&si);
+			if (odb_transaction_begin(the_repository->objects, &transaction, ODB_TRANSACTION_RECEIVE))
+				unpack_status = "unable to start ODB transaction";
+			else
+				unpack_status = unpack_with_sideband(&si, transaction);
 			update_shallow_info(commands, &si, &ref);
 		}
 		use_keepalive = KEEPALIVE_ALWAYS;
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index d0136cdd99..c3d0fc7507 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -598,7 +598,7 @@ static void unpack_all(void)
 		progress = start_progress(the_repository,
 					  _("Unpacking objects"), nr_objects);
 	CALLOC_ARRAY(obj_list, nr_objects);
-	odb_transaction_begin_or_die(the_repository->objects, &transaction);
+	odb_transaction_begin_or_die(the_repository->objects, &transaction, 0);
 	for (i = 0; i < nr_objects; i++) {
 		unpack_one(i);
 		display_progress(progress, i + 1);
diff --git a/builtin/update-index.c b/builtin/update-index.c
index 17f3ea284c..bf6ea60ef4 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -1124,7 +1124,7 @@ int cmd_update_index(int argc,
 	 * Allow the object layer to optimize adding multiple objects in
 	 * a batch.
 	 */
-	odb_transaction_begin_or_die(the_repository->objects, &transaction);
+	odb_transaction_begin_or_die(the_repository->objects, &transaction, 0);
 	while (ctx.argc) {
 		if (parseopt_state != PARSE_OPT_DONE)
 			parseopt_state = parse_options_step(&ctx, options,
diff --git a/cache-tree.c b/cache-tree.c
index 1a7dfed9cf..ed05acc4c7 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -490,7 +490,7 @@ int cache_tree_update(struct index_state *istate, int flags)
 
 	trace_performance_enter();
 	trace2_region_enter("cache_tree", "update", istate->repo);
-	odb_transaction_begin_or_die(the_repository->objects, &transaction);
+	odb_transaction_begin_or_die(the_repository->objects, &transaction, 0);
 	i = update_one(istate->cache_tree, istate->cache, istate->cache_nr,
 		       "", 0, &skip, flags);
 	odb_transaction_commit(transaction);
diff --git a/object-file.c b/object-file.c
index 14064d188a..e7958753ec 100644
--- a/object-file.c
+++ b/object-file.c
@@ -497,6 +497,7 @@ struct odb_transaction_files {
 
 	struct tmp_objdir *objdir;
 	struct transaction_packfile packfile;
+	const char *prefix;
 };
 
 static int odb_transaction_files_prepare(struct odb_transaction *base)
@@ -513,7 +514,7 @@ static int odb_transaction_files_prepare(struct odb_transaction *base)
 	if (!transaction || transaction->objdir)
 		return 0;
 
-	transaction->objdir = tmp_objdir_create(base->source->odb->repo, "bulk-fsync");
+	transaction->objdir = tmp_objdir_create(base->source->odb->repo, transaction->prefix);
 	if (!transaction->objdir)
 		return -1;
 
@@ -1391,7 +1392,7 @@ int index_fd(struct index_state *istate, struct object_id *oid,
 			struct object_database *odb = the_repository->objects;
 			struct odb_transaction *transaction;
 
-			odb_transaction_begin_or_die(odb, &transaction);
+			odb_transaction_begin_or_die(odb, &transaction, 0);
 			ret = odb_transaction_write_object_stream(odb->transaction,
 								  &stream,
 								  xsize_t(st->st_size),
@@ -1702,7 +1703,8 @@ static const char **odb_transaction_files_env(struct odb_transaction *base)
 }
 
 int odb_transaction_files_begin(struct odb_source *source,
-				struct odb_transaction **out)
+				struct odb_transaction **out,
+				enum odb_transaction_flags flags)
 {
 	struct odb_transaction_files *transaction;
 	struct object_database *odb = source->odb;
@@ -1717,6 +1719,20 @@ int odb_transaction_files_begin(struct odb_source *source,
 	transaction->base.commit = odb_transaction_files_commit;
 	transaction->base.write_object_stream = odb_transaction_files_write_object_stream;
 	transaction->base.env = odb_transaction_files_env;
+
+	transaction->prefix = "bulk-fsync";
+	if (flags & ODB_TRANSACTION_RECEIVE) {
+		/*
+		 * ODB transactions for git-receive-pack(1) eagerly create a
+		 * temporary directory and use a different prefix.
+		 */
+		transaction->prefix = "incoming";
+		if (odb_transaction_files_prepare(&transaction->base)) {
+			free(transaction);
+			return -1;
+		}
+	}
+
 	*out = &transaction->base;
 
 	return 0;
diff --git a/object-file.h b/object-file.h
index ac927fec07..fe098d54cb 100644
--- a/object-file.h
+++ b/object-file.h
@@ -5,6 +5,7 @@
 #include "object.h"
 #include "odb.h"
 #include "odb/source-loose.h"
+#include "odb/transaction.h"
 
 /* The maximum size for an object header. */
 #define MAX_HEADER_LEN 32
@@ -198,6 +199,7 @@ struct odb_transaction;
  * pending, out is set to NULL.
  */
 int odb_transaction_files_begin(struct odb_source *source,
-				struct odb_transaction **out);
+				struct odb_transaction **out,
+				enum odb_transaction_flags flags);
 
 #endif /* OBJECT_FILE_H */
diff --git a/odb/source-files.c b/odb/source-files.c
index 2545bd81d4..534f48aad9 100644
--- a/odb/source-files.c
+++ b/odb/source-files.c
@@ -180,9 +180,10 @@ static int odb_source_files_write_object_stream(struct odb_source *source,
 }
 
 static int odb_source_files_begin_transaction(struct odb_source *source,
-					      struct odb_transaction **out)
+					      struct odb_transaction **out,
+					      enum odb_transaction_flags flags)
 {
-	return odb_transaction_files_begin(source, out);
+	return odb_transaction_files_begin(source, out, flags);
 }
 
 static int odb_source_files_read_alternates(struct odb_source *source,
diff --git a/odb/source-inmemory.c b/odb/source-inmemory.c
index e004566d76..9644d9d474 100644
--- a/odb/source-inmemory.c
+++ b/odb/source-inmemory.c
@@ -304,7 +304,8 @@ static int odb_source_inmemory_freshen_object(struct odb_source *source,
 }
 
 static int odb_source_inmemory_begin_transaction(struct odb_source *source UNUSED,
-						 struct odb_transaction **out UNUSED)
+						 struct odb_transaction **out UNUSED,
+						 enum odb_transaction_flags flags UNUSED)
 {
 	return error("in-memory source does not support transactions");
 }
diff --git a/odb/source-loose.c b/odb/source-loose.c
index 66e6bb8d3f..57c91986b4 100644
--- a/odb/source-loose.c
+++ b/odb/source-loose.c
@@ -638,7 +638,8 @@ static int odb_source_loose_write_object_stream(struct odb_source *source,
 }
 
 static int odb_source_loose_begin_transaction(struct odb_source *source UNUSED,
-					      struct odb_transaction **out UNUSED)
+					      struct odb_transaction **out UNUSED,
+					      enum odb_transaction_flags flags UNUSED)
 {
 	/* TODO: this is a known omission that we'll want to address eventually. */
 	return error("loose source does not support transactions");
diff --git a/odb/source.h b/odb/source.h
index 2192a101b8..3790d03ff2 100644
--- a/odb/source.h
+++ b/odb/source.h
@@ -3,6 +3,7 @@
 
 #include "object.h"
 #include "odb.h"
+#include "odb/transaction.h"
 
 enum odb_source_type {
 	/*
@@ -228,7 +229,8 @@ struct odb_source {
 	 * negative error code otherwise.
 	 */
 	int (*begin_transaction)(struct odb_source *source,
-				 struct odb_transaction **out);
+				 struct odb_transaction **out,
+				 enum odb_transaction_flags flags);
 
 	/*
 	 * This callback is expected to read the list of alternate object
@@ -467,9 +469,10 @@ static inline int odb_source_write_alternate(struct odb_source *source,
  * Returns 0 on success, a negative error code otherwise.
  */
 static inline int odb_source_begin_transaction(struct odb_source *source,
-					       struct odb_transaction **out)
+					       struct odb_transaction **out,
+					       enum odb_transaction_flags flags)
 {
-	return source->begin_transaction(source, out);
+	return source->begin_transaction(source, out, flags);
 }
 
 #endif
diff --git a/odb/transaction.c b/odb/transaction.c
index 20d3f43f54..34c212020c 100644
--- a/odb/transaction.c
+++ b/odb/transaction.c
@@ -3,7 +3,8 @@
 #include "odb/transaction.h"
 
 int odb_transaction_begin(struct object_database *odb,
-			  struct odb_transaction **out)
+			  struct odb_transaction **out,
+			  enum odb_transaction_flags flags)
 {
 	int ret;
 
@@ -12,7 +13,7 @@ int odb_transaction_begin(struct object_database *odb,
 		return 0;
 	}
 
-	ret = odb_source_begin_transaction(odb->sources, out);
+	ret = odb_source_begin_transaction(odb->sources, out, flags);
 	odb->transaction = *out;
 
 	return ret;
diff --git a/odb/transaction.h b/odb/transaction.h
index 536458297b..78392ff13d 100644
--- a/odb/transaction.h
+++ b/odb/transaction.h
@@ -4,7 +4,6 @@
 #include "git-compat-util.h"
 #include "gettext.h"
 #include "odb.h"
-#include "odb/source.h"
 
 /*
  * A transaction may be started for an object database prior to writing new
@@ -44,6 +43,10 @@ struct odb_transaction {
 	const char **(*env)(struct odb_transaction *transaction);
 };
 
+enum odb_transaction_flags {
+	ODB_TRANSACTION_RECEIVE = (1 << 0),
+};
+
 /*
  * Starts an ODB transaction and returns it via `out`. Subsequent objects are
  * written to the transaction and not committed until odb_transaction_commit()
@@ -51,12 +54,14 @@ struct odb_transaction {
  * error. If the ODB already has a pending transaction, `out` is set to NULL.
  */
 int odb_transaction_begin(struct object_database *odb,
-			  struct odb_transaction **out);
+			  struct odb_transaction **out,
+			  enum odb_transaction_flags flags);
 
 static inline void odb_transaction_begin_or_die(struct object_database *odb,
-						struct odb_transaction **out)
+						struct odb_transaction **out,
+						enum odb_transaction_flags flags)
 {
-	if (odb_transaction_begin(odb, out))
+	if (odb_transaction_begin(odb, out, flags))
 		die(_("failed to start ODB transaction"));
 }
 
diff --git a/read-cache.c b/read-cache.c
index db0bfa60fe..35bfb25576 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -4042,7 +4042,7 @@ int add_files_to_cache(struct repository *repo, const char *prefix,
 	 * This function is invoked from commands other than 'add', which
 	 * may not have their own transaction active.
 	 */
-	odb_transaction_begin_or_die(repo->objects, &transaction);
+	odb_transaction_begin_or_die(repo->objects, &transaction, 0);
 	run_diff_files(&rev, DIFF_RACY_IS_MODIFIED);
 	odb_transaction_commit(transaction);
 
-- 
2.54.0.105.g59ff4886a5


^ permalink raw reply related

* [PATCH 5/6] odb/transaction: add transaction env interface
From: Justin Tobler @ 2026-06-24  4:19 UTC (permalink / raw)
  To: git; +Cc: ps, Justin Tobler
In-Reply-To: <20260624041920.2601961-1-jltobler@gmail.com>

The ODB transaction backend is responsible for creating/managing its own
staging area for writing objects. Other child processes spawned by Git
may need to access to uncommitted objects or write new objects in the
staging area though.

Introduce `odb_transaction_env()` which is expected to provide the set
of environment variables needed by a child process to access the
transaction staging area.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 object-file.c     | 11 +++++++++++
 odb/transaction.c |  8 ++++++++
 odb/transaction.h | 19 +++++++++++++++++++
 3 files changed, 38 insertions(+)

diff --git a/object-file.c b/object-file.c
index 696f05dc2d..14064d188a 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1691,6 +1691,16 @@ static int odb_transaction_files_commit(struct odb_transaction *base)
 	return 0;
 }
 
+static const char **odb_transaction_files_env(struct odb_transaction *base)
+{
+	struct odb_transaction_files *transaction =
+		container_of(base, struct odb_transaction_files, base);
+
+	odb_transaction_files_prepare(&transaction->base);
+
+	return tmp_objdir_env(transaction->objdir);
+}
+
 int odb_transaction_files_begin(struct odb_source *source,
 				struct odb_transaction **out)
 {
@@ -1706,6 +1716,7 @@ int odb_transaction_files_begin(struct odb_source *source,
 	transaction->base.source = source;
 	transaction->base.commit = odb_transaction_files_commit;
 	transaction->base.write_object_stream = odb_transaction_files_write_object_stream;
+	transaction->base.env = odb_transaction_files_env;
 	*out = &transaction->base;
 
 	return 0;
diff --git a/odb/transaction.c b/odb/transaction.c
index b20d6a16f8..20d3f43f54 100644
--- a/odb/transaction.c
+++ b/odb/transaction.c
@@ -46,3 +46,11 @@ int odb_transaction_write_object_stream(struct odb_transaction *transaction,
 {
 	return transaction->write_object_stream(transaction, stream, len, oid);
 }
+
+const char **odb_transaction_env(struct odb_transaction *transaction)
+{
+	if (!transaction)
+		return NULL;
+
+	return transaction->env(transaction);
+}
diff --git a/odb/transaction.h b/odb/transaction.h
index 7898770071..536458297b 100644
--- a/odb/transaction.h
+++ b/odb/transaction.h
@@ -32,6 +32,16 @@ struct odb_transaction {
 	int (*write_object_stream)(struct odb_transaction *transaction,
 				   struct odb_write_stream *stream, size_t len,
 				   struct object_id *oid);
+
+	/*
+	 * This callback is expected to return a NULL-terminated array of
+	 * environment variables that a child process should inherit so
+	 * that its object writes participate in the transaction. The
+	 * returned array is owned by the backend and remains valid until
+	 * the transaction ends. May return NULL when the backend does not
+	 * need to expose any state to child processes.
+	 */
+	const char **(*env)(struct odb_transaction *transaction);
 };
 
 /*
@@ -65,4 +75,13 @@ int odb_transaction_write_object_stream(struct odb_transaction *transaction,
 					struct odb_write_stream *stream,
 					size_t len, struct object_id *oid);
 
+/*
+ * Returns a NULL-terminated array of environment variables that a child
+ * process should inherit so that its object writes participate in the
+ * transaction, suitable for passing via child_process.env. Returns NULL if
+ * the transaction is NULL or the backend does not expose any state to child
+ * processes.
+ */
+const char **odb_transaction_env(struct odb_transaction *transaction);
+
 #endif
-- 
2.54.0.105.g59ff4886a5


^ permalink raw reply related

* [PATCH 3/6] odb/transaction: propagate begin errors
From: Justin Tobler @ 2026-06-24  4:19 UTC (permalink / raw)
  To: git; +Cc: ps, Justin Tobler
In-Reply-To: <20260624041920.2601961-1-jltobler@gmail.com>

When `odb_transaction_begin()` is invoked, the function returns the
transaction pointer directly. There is no way for the backend to
signal that it failed to set up its state, such as when creating the
temporary object directory backing the transaction.

In a subsequent commit, git-receive-pack(1) starts using ODB
transactions and needs to be able to report such failures rather
than silently ignore them. Refactor `odb_transaction_begin()` to
return an int error code and write the resulting transaction into an
out parameter. Also introduce `odb_transaction_begin_or_die()` as a
convenience for callsites that do not need to handle errors
explicitly.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/add.c            |  2 +-
 builtin/unpack-objects.c |  2 +-
 builtin/update-index.c   |  2 +-
 cache-tree.c             |  2 +-
 object-file.c            |  3 ++-
 odb/transaction.c        | 16 +++++++++++-----
 odb/transaction.h        | 19 +++++++++++++++----
 read-cache.c             |  2 +-
 8 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index c859f66519..3d5d9cfdb9 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -581,7 +581,7 @@ int cmd_add(int argc,
 		string_list_clear(&only_match_skip_worktree, 0);
 	}
 
-	transaction = odb_transaction_begin(repo->objects);
+	odb_transaction_begin_or_die(repo->objects, &transaction);
 
 	ps_matched = xcalloc(pathspec.nr, 1);
 	if (add_renormalize)
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index f3849bb654..d0136cdd99 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -598,7 +598,7 @@ static void unpack_all(void)
 		progress = start_progress(the_repository,
 					  _("Unpacking objects"), nr_objects);
 	CALLOC_ARRAY(obj_list, nr_objects);
-	transaction = odb_transaction_begin(the_repository->objects);
+	odb_transaction_begin_or_die(the_repository->objects, &transaction);
 	for (i = 0; i < nr_objects; i++) {
 		unpack_one(i);
 		display_progress(progress, i + 1);
diff --git a/builtin/update-index.c b/builtin/update-index.c
index 3d6646c318..17f3ea284c 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -1124,7 +1124,7 @@ int cmd_update_index(int argc,
 	 * Allow the object layer to optimize adding multiple objects in
 	 * a batch.
 	 */
-	transaction = odb_transaction_begin(the_repository->objects);
+	odb_transaction_begin_or_die(the_repository->objects, &transaction);
 	while (ctx.argc) {
 		if (parseopt_state != PARSE_OPT_DONE)
 			parseopt_state = parse_options_step(&ctx, options,
diff --git a/cache-tree.c b/cache-tree.c
index 184f7e2635..1a7dfed9cf 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -490,7 +490,7 @@ int cache_tree_update(struct index_state *istate, int flags)
 
 	trace_performance_enter();
 	trace2_region_enter("cache_tree", "update", istate->repo);
-	transaction = odb_transaction_begin(the_repository->objects);
+	odb_transaction_begin_or_die(the_repository->objects, &transaction);
 	i = update_one(istate->cache_tree, istate->cache, istate->cache_nr,
 		       "", 0, &skip, flags);
 	odb_transaction_commit(transaction);
diff --git a/object-file.c b/object-file.c
index 18c2df75fb..696f05dc2d 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1389,8 +1389,9 @@ int index_fd(struct index_state *istate, struct object_id *oid,
 
 		if (flags & INDEX_WRITE_OBJECT) {
 			struct object_database *odb = the_repository->objects;
-			struct odb_transaction *transaction = odb_transaction_begin(odb);
+			struct odb_transaction *transaction;
 
+			odb_transaction_begin_or_die(odb, &transaction);
 			ret = odb_transaction_write_object_stream(odb->transaction,
 								  &stream,
 								  xsize_t(st->st_size),
diff --git a/odb/transaction.c b/odb/transaction.c
index b16e07aebf..d3de01db50 100644
--- a/odb/transaction.c
+++ b/odb/transaction.c
@@ -2,14 +2,20 @@
 #include "odb/source.h"
 #include "odb/transaction.h"
 
-struct odb_transaction *odb_transaction_begin(struct object_database *odb)
+int odb_transaction_begin(struct object_database *odb,
+			  struct odb_transaction **out)
 {
-	if (odb->transaction)
-		return NULL;
+	int ret;
 
-	odb_source_begin_transaction(odb->sources, &odb->transaction);
+	if (odb->transaction) {
+		*out = NULL;
+		return 0;
+	}
 
-	return odb->transaction;
+	ret = odb_source_begin_transaction(odb->sources, out);
+	odb->transaction = *out;
+
+	return ret;
 }
 
 void odb_transaction_commit(struct odb_transaction *transaction)
diff --git a/odb/transaction.h b/odb/transaction.h
index f4c1ebfaaa..cd6d50f2e5 100644
--- a/odb/transaction.h
+++ b/odb/transaction.h
@@ -1,6 +1,8 @@
 #ifndef ODB_TRANSACTION_H
 #define ODB_TRANSACTION_H
 
+#include "git-compat-util.h"
+#include "gettext.h"
 #include "odb.h"
 #include "odb/source.h"
 
@@ -33,11 +35,20 @@ struct odb_transaction {
 };
 
 /*
- * Starts an ODB transaction. Subsequent objects are written to the transaction
- * and not committed until odb_transaction_commit() is invoked on the
- * transaction. If the ODB already has a pending transaction, NULL is returned.
+ * Starts an ODB transaction and returns it via `out`. Subsequent objects are
+ * written to the transaction and not committed until odb_transaction_commit()
+ * is invoked on the transaction. Returns 0 on success and a negative value on
+ * error. If the ODB already has a pending transaction, `out` is set to NULL.
  */
-struct odb_transaction *odb_transaction_begin(struct object_database *odb);
+int odb_transaction_begin(struct object_database *odb,
+			  struct odb_transaction **out);
+
+static inline void odb_transaction_begin_or_die(struct object_database *odb,
+						struct odb_transaction **out)
+{
+	if (odb_transaction_begin(odb, out))
+		die(_("failed to start ODB transaction"));
+}
 
 /*
  * Commits an ODB transaction making the written objects visible. If the
diff --git a/read-cache.c b/read-cache.c
index 21ca58beea..db0bfa60fe 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -4042,7 +4042,7 @@ int add_files_to_cache(struct repository *repo, const char *prefix,
 	 * This function is invoked from commands other than 'add', which
 	 * may not have their own transaction active.
 	 */
-	transaction = odb_transaction_begin(repo->objects);
+	odb_transaction_begin_or_die(repo->objects, &transaction);
 	run_diff_files(&rev, DIFF_RACY_IS_MODIFIED);
 	odb_transaction_commit(transaction);
 
-- 
2.54.0.105.g59ff4886a5


^ permalink raw reply related

* [PATCH 2/6] object-file: propagate files transaction errors
From: Justin Tobler @ 2026-06-24  4:19 UTC (permalink / raw)
  To: git; +Cc: ps, Justin Tobler
In-Reply-To: <20260624041920.2601961-1-jltobler@gmail.com>

The "files" transaction backend may encounter errors related to managing
the temporary directory used to stage objects, but silently ignores
these errors. Instead return errors encountered in the
`odb_transaction_files_{prepare,begin,commit}()` interfaces to allow
callers to handle as needed.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 object-file.c      | 41 ++++++++++++++++++++++++++++-------------
 object-file.h      |  5 +++--
 odb/source-files.c |  6 +-----
 odb/transaction.h  |  2 +-
 4 files changed, 33 insertions(+), 21 deletions(-)

diff --git a/object-file.c b/object-file.c
index a3eb8d71dd..18c2df75fb 100644
--- a/object-file.c
+++ b/object-file.c
@@ -499,7 +499,7 @@ struct odb_transaction_files {
 	struct transaction_packfile packfile;
 };
 
-static void odb_transaction_files_prepare(struct odb_transaction *base)
+static int odb_transaction_files_prepare(struct odb_transaction *base)
 {
 	struct odb_transaction_files *transaction =
 		container_of_or_null(base, struct odb_transaction_files, base);
@@ -511,11 +511,15 @@ static void odb_transaction_files_prepare(struct odb_transaction *base)
 	 * added at the time they call odb_transaction_files_begin.
 	 */
 	if (!transaction || transaction->objdir)
-		return;
+		return 0;
 
 	transaction->objdir = tmp_objdir_create(base->source->odb->repo, "bulk-fsync");
-	if (transaction->objdir)
-		tmp_objdir_replace_primary_odb(transaction->objdir, 0);
+	if (!transaction->objdir)
+		return -1;
+
+	tmp_objdir_replace_primary_odb(transaction->objdir, 0);
+
+	return 0;
 }
 
 static void fsync_loose_object_transaction(struct odb_transaction *base,
@@ -542,13 +546,13 @@ static void fsync_loose_object_transaction(struct odb_transaction *base,
 /*
  * Cleanup after batch-mode fsync_object_files.
  */
-static void flush_loose_object_transaction(struct odb_transaction_files *transaction)
+static int flush_loose_object_transaction(struct odb_transaction_files *transaction)
 {
 	struct strbuf temp_path = STRBUF_INIT;
 	struct tempfile *temp;
 
 	if (!transaction->objdir)
-		return;
+		return 0;
 
 	/*
 	 * Issue a full hardware flush against a temporary file to ensure
@@ -570,8 +574,12 @@ static void flush_loose_object_transaction(struct odb_transaction_files *transac
 	 * Make the object files visible in the primary ODB after their data is
 	 * fully durable.
 	 */
-	tmp_objdir_migrate(transaction->objdir);
+	if (tmp_objdir_migrate(transaction->objdir))
+		return -1;
+
 	transaction->objdir = NULL;
+
+	return 0;
 }
 
 /* Finalize a file on disk, and close it. */
@@ -1670,27 +1678,34 @@ int read_loose_object(struct repository *repo,
 	return ret;
 }
 
-static void odb_transaction_files_commit(struct odb_transaction *base)
+static int odb_transaction_files_commit(struct odb_transaction *base)
 {
 	struct odb_transaction_files *transaction =
 		container_of(base, struct odb_transaction_files, base);
 
-	flush_loose_object_transaction(transaction);
+	if (flush_loose_object_transaction(transaction))
+		return -1;
 	flush_packfile_transaction(transaction);
+
+	return 0;
 }
 
-struct odb_transaction *odb_transaction_files_begin(struct odb_source *source)
+int odb_transaction_files_begin(struct odb_source *source,
+				struct odb_transaction **out)
 {
 	struct odb_transaction_files *transaction;
 	struct object_database *odb = source->odb;
 
-	if (odb->transaction)
-		return NULL;
+	if (odb->transaction) {
+		*out = NULL;
+		return 0;
+	}
 
 	transaction = xcalloc(1, sizeof(*transaction));
 	transaction->base.source = source;
 	transaction->base.commit = odb_transaction_files_commit;
 	transaction->base.write_object_stream = odb_transaction_files_write_object_stream;
+	*out = &transaction->base;
 
-	return &transaction->base;
+	return 0;
 }
diff --git a/object-file.h b/object-file.h
index 528c4e6e69..ac927fec07 100644
--- a/object-file.h
+++ b/object-file.h
@@ -195,8 +195,9 @@ struct odb_transaction;
  * Tell the object database to optimize for adding
  * multiple objects. odb_transaction_files_commit must be called
  * to make new objects visible. If a transaction is already
- * pending, NULL is returned.
+ * pending, out is set to NULL.
  */
-struct odb_transaction *odb_transaction_files_begin(struct odb_source *source);
+int odb_transaction_files_begin(struct odb_source *source,
+				struct odb_transaction **out);
 
 #endif /* OBJECT_FILE_H */
diff --git a/odb/source-files.c b/odb/source-files.c
index 5bdd042922..2545bd81d4 100644
--- a/odb/source-files.c
+++ b/odb/source-files.c
@@ -182,11 +182,7 @@ static int odb_source_files_write_object_stream(struct odb_source *source,
 static int odb_source_files_begin_transaction(struct odb_source *source,
 					      struct odb_transaction **out)
 {
-	struct odb_transaction *tx = odb_transaction_files_begin(source);
-	if (!tx)
-		return -1;
-	*out = tx;
-	return 0;
+	return odb_transaction_files_begin(source, out);
 }
 
 static int odb_source_files_read_alternates(struct odb_source *source,
diff --git a/odb/transaction.h b/odb/transaction.h
index 854fda06f5..f4c1ebfaaa 100644
--- a/odb/transaction.h
+++ b/odb/transaction.h
@@ -17,7 +17,7 @@ struct odb_transaction {
 	struct odb_source *source;
 
 	/* The ODB source specific callback invoked to commit a transaction. */
-	void (*commit)(struct odb_transaction *transaction);
+	int (*commit)(struct odb_transaction *transaction);
 
 	/*
 	 * This callback is expected to write the given object stream into
-- 
2.54.0.105.g59ff4886a5


^ permalink raw reply related

* [PATCH 1/6] object-file: rename files transaction prepare function
From: Justin Tobler @ 2026-06-24  4:19 UTC (permalink / raw)
  To: git; +Cc: ps, Justin Tobler
In-Reply-To: <20260624041920.2601961-1-jltobler@gmail.com>

The "files" ODB transaction backend lazily creates a temporary object
directory when the first loose object is written to the transaction via
`prepare_loose_object_transaction()`. In a subsequent commit, the
temporary directory is used to also write packfiles to.

Rename the function to `odb_transaction_files_prepare()` accordingly.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 object-file.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/object-file.c b/object-file.c
index e3d92bbda2..a3eb8d71dd 100644
--- a/object-file.c
+++ b/object-file.c
@@ -499,7 +499,7 @@ struct odb_transaction_files {
 	struct transaction_packfile packfile;
 };
 
-static void prepare_loose_object_transaction(struct odb_transaction *base)
+static void odb_transaction_files_prepare(struct odb_transaction *base)
 {
 	struct odb_transaction_files *transaction =
 		container_of_or_null(base, struct odb_transaction_files, base);
@@ -761,7 +761,7 @@ int write_loose_object(struct odb_source_loose *loose,
 	static struct strbuf filename = STRBUF_INIT;
 
 	if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT))
-		prepare_loose_object_transaction(loose->base.odb->transaction);
+		odb_transaction_files_prepare(loose->base.odb->transaction);
 
 	odb_loose_path(loose, &filename, oid);
 
@@ -825,7 +825,7 @@ int odb_source_loose_write_stream(struct odb_source_loose *loose,
 	int hdrlen;
 
 	if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT))
-		prepare_loose_object_transaction(loose->base.odb->transaction);
+		odb_transaction_files_prepare(loose->base.odb->transaction);
 
 	/* Since oid is not determined, save tmp file to odb path. */
 	strbuf_addf(&filename, "%s/", loose->base.path);
-- 
2.54.0.105.g59ff4886a5


^ permalink raw reply related

* [PATCH 0/6] receive-pack: use ODB transactions to stage object writes
From: Justin Tobler @ 2026-06-24  4:19 UTC (permalink / raw)
  To: git; +Cc: ps, Justin Tobler

Greetings,

This patch series replaces direct usage of the `tmp_objdir` interfaces
in git-receive-pack(1) to instead use the `odb_transaction` interfaces
to create/manage a staging area to write objects to. The purpose of this
change is to get git-receive-pack(1) one step closer to being ODB
backend agnostic. For now, the object writes themselves are still
"files" backend specific due to being handled by the git-index-pack(1)
and git-unpack-objects(1) child processes. This will be tackled in a
separate series though.

Thanks,
-Justin

Justin Tobler (6):
  object-file: rename files transaction prepare function
  object-file: propagate files transaction errors
  odb/transaction: propagate begin errors
  odb/transaction: propagate commit errors
  odb/transaction: add transaction env interface
  builtin/receive-pack: stage incoming objects via ODB transactions

 builtin/add.c            |  2 +-
 builtin/receive-pack.c   | 46 ++++++++++--------------
 builtin/unpack-objects.c |  2 +-
 builtin/update-index.c   |  2 +-
 cache-tree.c             |  2 +-
 object-file.c            | 77 +++++++++++++++++++++++++++++++---------
 object-file.h            |  7 ++--
 odb/source-files.c       |  9 ++---
 odb/source-inmemory.c    |  3 +-
 odb/source-loose.c       |  3 +-
 odb/source.h             |  9 +++--
 odb/transaction.c        | 38 +++++++++++++++-----
 odb/transaction.h        | 49 +++++++++++++++++++++----
 read-cache.c             |  2 +-
 14 files changed, 173 insertions(+), 78 deletions(-)


base-commit: ab776a62a78576513ee121424adb19597fbb7613
-- 
2.54.0.105.g59ff4886a5


^ permalink raw reply

* Re: [GSoC Patch v7 1/3] path: extract append_formatted_path() and use in rev-parse
From: K Jayatheerth @ 2026-06-24  3:49 UTC (permalink / raw)
  To: phillip.wood
  Cc: a3205153416, git, gitster, jltobler, kumarayushjha123,
	lucasseikioshiro, sandals
In-Reply-To: <084ad4d0-d872-4c7f-94a8-ec2383c7a8ca@gmail.com>

Hey Phillip,

On Tue, Jun 23, 2026 at 9:27 PM Phillip Wood <phillip.wood123@gmail.com> wrote:
>
> On 21/06/2026 06:55, K Jayatheerth wrote:
> > Path formatting logic in builtin/rev-parse.c writes directly to
> > stdout. Other builtins cannot reuse it.
> >
> > Extract this logic into append_formatted_path() in path.c and expose
> > a path_format enum in path.h.
> >
> > Convert rev-parse to use the new helper in the same step to validate
> > the API against existing tests and avoid introducing dead code.
>
> The new API looks good now, and so does the conversion of the existing
> code. I'm very happy with this version and don't have anything to add to
> Junio's comments
>
> Thanks
>
> Phillip
>

I have sent a v8 with Junio's feedback addressed.
I wouldn't have a problem with either of the versions getting merged.

Both of them are good in their own ways.

Thank you,
- K Jayatheerth

^ permalink raw reply

* [GSoC Patch v8 3/3] repo: add path.gitdir with absolute and relative suffix formatting
From: K Jayatheerth @ 2026-06-24  3:37 UTC (permalink / raw)
  To: jayatheerthkulkarni2005
  Cc: a3205153416, git, gitster, jltobler, kumarayushjha123,
	lucasseikioshiro, phillip.wood, sandals
In-Reply-To: <20260624033748.108281-1-jayatheerthkulkarni2005@gmail.com>

Scripts need a stable way to locate the git directory without
parsing rev-parse output or relying on its flag-driven path format
selection. There is no way to retrieve this path from git repo info
today.

Introduce path.gitdir.absolute and path.gitdir.relative keys,
consistent with the path.commondir keys added in the previous patch.
Reuse the test_repo_info_path helper introduced there to validate
both variants.

Mentored-by: Justin Tobler <jltobler@gmail.com>
Mentored-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
---
 Documentation/git-repo.adoc |  6 ++++++
 builtin/repo.c              | 24 ++++++++++++++++++++++++
 t/t1900-repo-info.sh        |  6 ++++++
 3 files changed, 36 insertions(+)

diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 890c34051d..ed7d80c690 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -113,6 +113,12 @@ values that they return:
 	The path to the Git repository's common directory relative to
 	the current working directory.
 
+`path.gitdir.absolute`::
+	The canonical absolute path to the Git repository directory (the `.git` directory).
+
+`path.gitdir.relative`::
+	The path to the Git repository directory relative to the current working directory.
+
 `references.format`::
 	The reference storage format. The valid values are:
 +
diff --git a/builtin/repo.c b/builtin/repo.c
index 4c3fbc26b9..27c8caff38 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -99,6 +99,28 @@ static int get_path_commondir_relative(struct repository *repo, struct strbuf *b
 	return 0;
 }
 
+static int get_path_gitdir_absolute(struct repository *repo, struct strbuf *buf)
+{
+	const char *git_dir = repo_get_git_dir(repo);
+
+	if (!git_dir)
+		return error(_("unable to get git directory"));
+
+	format_path(buf, git_dir, startup_info->prefix, PATH_FORMAT_CANONICAL);
+	return 0;
+}
+
+static int get_path_gitdir_relative(struct repository *repo, struct strbuf *buf)
+{
+	const char *git_dir = repo_get_git_dir(repo);
+
+	if (!git_dir)
+		return error(_("unable to get git directory"));
+
+	format_path(buf, git_dir, startup_info->prefix, PATH_FORMAT_RELATIVE);
+	return 0;
+}
+
 static int get_references_format(struct repository *repo, struct strbuf *buf)
 {
 	strbuf_addstr(buf,
@@ -113,6 +135,8 @@ static const struct repo_info_field repo_info_field[] = {
 	{ "object.format", get_object_format },
 	{ "path.commondir.absolute", get_path_commondir_absolute },
 	{ "path.commondir.relative", get_path_commondir_relative },
+	{ "path.gitdir.absolute", get_path_gitdir_absolute },
+	{ "path.gitdir.relative", get_path_gitdir_relative },
 	{ "references.format", get_references_format },
 };
 
diff --git a/t/t1900-repo-info.sh b/t/t1900-repo-info.sh
index 09158d29f9..ae8c22c817 100755
--- a/t/t1900-repo-info.sh
+++ b/t/t1900-repo-info.sh
@@ -207,4 +207,10 @@ test_repo_info_path 'commondir with only GIT_DIR' 'commondir' \
 	'.git' \
 	'GIT_DIR="../.git" && export GIT_DIR'
 
+test_repo_info_path 'gitdir standard' 'gitdir' '.git'
+
+test_repo_info_path 'gitdir with explicit GIT_DIR' 'gitdir' \
+	'.git' \
+	'GIT_DIR="../.git" && export GIT_DIR'
+
 test_done
-- 
2.55.0-rc1


^ permalink raw reply related

* [GSoC Patch v8 2/3] repo: add path.commondir with absolute and relative suffix formatting
From: K Jayatheerth @ 2026-06-24  3:37 UTC (permalink / raw)
  To: jayatheerthkulkarni2005
  Cc: a3205153416, git, gitster, jltobler, kumarayushjha123,
	lucasseikioshiro, phillip.wood, sandals
In-Reply-To: <20260624033748.108281-1-jayatheerthkulkarni2005@gmail.com>

Scripts working with worktree setups need a reliable way to discover
the common directory, which diverges from the git directory when
multiple worktrees are in use. There is no way to retrieve this path
from git repo info today.

Introduce path.commondir.absolute and path.commondir.relative keys.
Exposing explicit format variants rather than a single key with a
default avoids ambiguity for scripts that require predictable output.

Mentored-by: Justin Tobler <jltobler@gmail.com>
Mentored-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
---
 Documentation/git-repo.adoc |  9 +++++++
 builtin/repo.c              | 26 +++++++++++++++++++
 t/t1900-repo-info.sh        | 52 +++++++++++++++++++++++++++++++++++++
 3 files changed, 87 insertions(+)

diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
index 42262c1983..890c34051d 100644
--- a/Documentation/git-repo.adoc
+++ b/Documentation/git-repo.adoc
@@ -104,6 +104,15 @@ values that they return:
 `object.format`::
 	The object format (hash algorithm) used in the repository.
 
+`path.commondir.absolute`::
+	The canonical absolute path to the Git repository's common
+	directory (the shared `.git` directory containing objects,
+	refs, and global configuration).
+
+`path.commondir.relative`::
+	The path to the Git repository's common directory relative to
+	the current working directory.
+
 `references.format`::
 	The reference storage format. The valid values are:
 +
diff --git a/builtin/repo.c b/builtin/repo.c
index 71a5c1c29c..4c3fbc26b9 100644
--- a/builtin/repo.c
+++ b/builtin/repo.c
@@ -7,12 +7,14 @@
 #include "hex.h"
 #include "odb.h"
 #include "parse-options.h"
+#include "path.h"
 #include "path-walk.h"
 #include "progress.h"
 #include "quote.h"
 #include "ref-filter.h"
 #include "refs.h"
 #include "revision.h"
+#include "setup.h"
 #include "strbuf.h"
 #include "string-list.h"
 #include "shallow.h"
@@ -75,6 +77,28 @@ static int get_object_format(struct repository *repo, struct strbuf *buf)
 	return 0;
 }
 
+static int get_path_commondir_absolute(struct repository *repo, struct strbuf *buf)
+{
+	const char *common_dir = repo_get_common_dir(repo);
+
+	if (!common_dir)
+		return error(_("unable to get common directory"));
+
+	format_path(buf, common_dir, startup_info->prefix, PATH_FORMAT_CANONICAL);
+	return 0;
+}
+
+static int get_path_commondir_relative(struct repository *repo, struct strbuf *buf)
+{
+	const char *common_dir = repo_get_common_dir(repo);
+
+	if (!common_dir)
+		return error(_("unable to get common directory"));
+
+	format_path(buf, common_dir, startup_info->prefix, PATH_FORMAT_RELATIVE);
+	return 0;
+}
+
 static int get_references_format(struct repository *repo, struct strbuf *buf)
 {
 	strbuf_addstr(buf,
@@ -87,6 +111,8 @@ static const struct repo_info_field repo_info_field[] = {
 	{ "layout.bare", get_layout_bare },
 	{ "layout.shallow", get_layout_shallow },
 	{ "object.format", get_object_format },
+	{ "path.commondir.absolute", get_path_commondir_absolute },
+	{ "path.commondir.relative", get_path_commondir_relative },
 	{ "references.format", get_references_format },
 };
 
diff --git a/t/t1900-repo-info.sh b/t/t1900-repo-info.sh
index 39bb77dda0..09158d29f9 100755
--- a/t/t1900-repo-info.sh
+++ b/t/t1900-repo-info.sh
@@ -155,4 +155,56 @@ test_expect_success 'git repo info -h shows only repo info usage' '
 	test_grep ! "git repo structure" actual
 '
 
+# Helper function to test path keys in both absolute and relative formats.
+# $1: label for the test
+# $2: field_name (e.g., commondir)
+# $3: expected_dir (the directory name, e.g., .git or custom-common)
+# $4: init_command (extra setup like exporting env vars)
+test_repo_info_path () {
+	label=$1
+	field_name=$2
+	expected_dir=$3
+	init_command=$4
+
+	test_expect_success "absolute: $label" '
+		test_when_finished "rm -rf repo" &&
+		git init repo &&
+		(
+			mkdir -p repo/sub &&
+			cd repo/sub &&
+			ROOT="$(test-tool path-utils real_path ..)" && export ROOT &&
+			eval "$init_command" &&
+			echo "path.$field_name.absolute=$ROOT/$expected_dir" >expect &&
+			git repo info "path.$field_name.absolute" >actual &&
+			test_cmp expect actual
+		)
+	'
+
+	test_expect_success "relative: $label" '
+		test_when_finished "rm -rf repo" &&
+		git init repo &&
+		(
+			mkdir -p repo/sub &&
+			cd repo/sub &&
+			ROOT="$(test-tool path-utils real_path ..)" && export ROOT &&
+			eval "$init_command" &&
+			echo "path.$field_name.relative=../$expected_dir" >expect &&
+			git repo info "path.$field_name.relative" >actual &&
+			test_cmp expect actual
+		)
+	'
+}
+
+test_repo_info_path 'commondir standard' 'commondir' '.git'
+
+test_repo_info_path 'commondir with GIT_COMMON_DIR and GIT_DIR' 'commondir' \
+	'custom-common' \
+	'GIT_COMMON_DIR="$ROOT/custom-common" && export GIT_COMMON_DIR &&
+	 GIT_DIR="../.git" && export GIT_DIR &&
+	 git init --bare "$ROOT/custom-common"'
+
+test_repo_info_path 'commondir with only GIT_DIR' 'commondir' \
+	'.git' \
+	'GIT_DIR="../.git" && export GIT_DIR'
+
 test_done
-- 
2.55.0-rc1


^ permalink raw reply related

* [GSoC Patch v8 1/3] path: extract format_path() and use in rev-parse
From: K Jayatheerth @ 2026-06-24  3:37 UTC (permalink / raw)
  To: jayatheerthkulkarni2005
  Cc: a3205153416, git, gitster, jltobler, kumarayushjha123,
	lucasseikioshiro, phillip.wood, sandals
In-Reply-To: <20260624033748.108281-1-jayatheerthkulkarni2005@gmail.com>

Path formatting logic in builtin/rev-parse.c writes directly to
stdout. Other builtins cannot reuse it.

Extract this logic into format_path() in path.c and expose
a path_format enum in path.h.

Convert rev-parse to use the new helper in the same step to validate
the API against existing tests and avoid introducing dead code.

Mentored-by: Justin Tobler <jltobler@gmail.com>
Mentored-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
---
 builtin/rev-parse.c | 79 +++++++++++++++++++++------------------------
 path.c              | 69 +++++++++++++++++++++++++++++++++++++++
 path.h              | 30 +++++++++++++++++
 3 files changed, 135 insertions(+), 43 deletions(-)

diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index bb882678fe..7d6ac92038 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -653,53 +653,46 @@ enum default_type {
 	DEFAULT_UNMODIFIED,
 };
 
-static void print_path(const char *path, const char *prefix, enum format_type format, enum default_type def)
+static void print_path(const char *path, const char *prefix,
+		       enum format_type format, enum default_type def)
 {
-	char *cwd = NULL;
-	/*
-	 * We don't ever produce a relative path if prefix is NULL, so set the
-	 * prefix to the current directory so that we can produce a relative
-	 * path whenever possible.  If we're using RELATIVE_IF_SHARED mode, then
-	 * we want an absolute path unless the two share a common prefix, so don't
-	 * set it in that case, since doing so causes a relative path to always
-	 * be produced if possible.
-	 */
-	if (!prefix && (format != FORMAT_DEFAULT || def != DEFAULT_RELATIVE_IF_SHARED))
-		prefix = cwd = xgetcwd();
-	if (format == FORMAT_DEFAULT && def == DEFAULT_UNMODIFIED) {
-		puts(path);
-	} else if (format == FORMAT_RELATIVE ||
-		  (format == FORMAT_DEFAULT && def == DEFAULT_RELATIVE)) {
-		/*
-		 * In order for relative_path to work as expected, we need to
-		 * make sure that both paths are absolute paths.  If we don't,
-		 * we can end up with an unexpected absolute path that the user
-		 * didn't want.
-		 */
-		struct strbuf buf = STRBUF_INIT, realbuf = STRBUF_INIT, prefixbuf = STRBUF_INIT;
-		if (!is_absolute_path(path)) {
-			strbuf_realpath_forgiving(&realbuf, path,  1);
-			path = realbuf.buf;
-		}
-		if (!is_absolute_path(prefix)) {
-			strbuf_realpath_forgiving(&prefixbuf, prefix, 1);
-			prefix = prefixbuf.buf;
+	struct strbuf sb = STRBUF_INIT;
+	enum path_format fmt;
+
+	if (format == FORMAT_DEFAULT) {
+		switch (def) {
+		case DEFAULT_RELATIVE:
+			fmt = PATH_FORMAT_RELATIVE;
+			break;
+		case DEFAULT_RELATIVE_IF_SHARED:
+			fmt = PATH_FORMAT_RELATIVE_IF_SHARED;
+			break;
+		case DEFAULT_CANONICAL:
+			fmt = PATH_FORMAT_CANONICAL;
+			break;
+		case DEFAULT_UNMODIFIED:
+		default:
+			fmt = PATH_FORMAT_UNMODIFIED;
+			break;
 		}
-		puts(relative_path(path, prefix, &buf));
-		strbuf_release(&buf);
-		strbuf_release(&realbuf);
-		strbuf_release(&prefixbuf);
-	} else if (format == FORMAT_DEFAULT && def == DEFAULT_RELATIVE_IF_SHARED) {
-		struct strbuf buf = STRBUF_INIT;
-		puts(relative_path(path, prefix, &buf));
-		strbuf_release(&buf);
 	} else {
-		struct strbuf buf = STRBUF_INIT;
-		strbuf_realpath_forgiving(&buf, path, 1);
-		puts(buf.buf);
-		strbuf_release(&buf);
+		switch (format) {
+		case FORMAT_RELATIVE:
+			fmt = PATH_FORMAT_RELATIVE;
+			break;
+		case FORMAT_CANONICAL:
+			fmt = PATH_FORMAT_CANONICAL;
+			break;
+		default:
+			fmt = PATH_FORMAT_UNMODIFIED;
+			break;
+		}
 	}
-	free(cwd);
+
+	format_path(&sb, path, prefix, fmt);
+	puts(sb.buf);
+
+	strbuf_release(&sb);
 }
 
 int cmd_rev_parse(int argc,
diff --git a/path.c b/path.c
index d7e17bf174..c3a709a928 100644
--- a/path.c
+++ b/path.c
@@ -1579,6 +1579,75 @@ char *xdg_cache_home(const char *filename)
 	return NULL;
 }
 
+void format_path(struct strbuf *dest, const char *path,
+		 const char *prefix, enum path_format format)
+{
+	strbuf_reset(dest);
+
+	switch (format) {
+	case PATH_FORMAT_UNMODIFIED:
+		strbuf_addstr(dest, path);
+		break;
+
+	case PATH_FORMAT_RELATIVE: {
+		struct strbuf relative_buf = STRBUF_INIT;
+		struct strbuf real_path = STRBUF_INIT;
+		struct strbuf real_prefix = STRBUF_INIT;
+		char *cwd = NULL;
+
+		/*
+		 * We don't ever produce a relative path if prefix is NULL,
+		 * so set the prefix to the current directory so that we can
+		 * produce a relative path whenever possible.
+		 */
+		if (!prefix)
+			prefix = cwd = xgetcwd();
+
+		if (!is_absolute_path(path)) {
+			strbuf_realpath_forgiving(&real_path, path, 1);
+			path = real_path.buf;
+		}
+		if (!is_absolute_path(prefix)) {
+			strbuf_realpath_forgiving(&real_prefix, prefix, 1);
+			prefix = real_prefix.buf;
+		}
+
+		strbuf_addstr(dest, relative_path(path, prefix, &relative_buf));
+
+		strbuf_release(&relative_buf);
+		strbuf_release(&real_path);
+		strbuf_release(&real_prefix);
+		free(cwd);
+		break;
+	}
+
+	case PATH_FORMAT_RELATIVE_IF_SHARED: {
+		struct strbuf relative_buf = STRBUF_INIT;
+
+		/*
+		 * If we're using RELATIVE_IF_SHARED mode, then we want an
+		 * absolute path unless the two share a common prefix, so don't
+		 * default the prefix to the current working directory. Doing so
+		 * would cause a relative path to always be produced if possible.
+		 */
+		strbuf_addstr(dest, relative_path(path, prefix, &relative_buf));
+		strbuf_release(&relative_buf);
+		break;
+	}
+
+	case PATH_FORMAT_CANONICAL:
+		/*
+		 * strbuf_realpath_forgiving inherently resets the destination
+		 * buffer, safely aligning with our replace semantics.
+		 */
+		strbuf_realpath_forgiving(dest, path, 1);
+		break;
+
+	default:
+		BUG("unknown path_format value %d", format);
+	}
+}
+
 REPO_GIT_PATH_FUNC(squash_msg, "SQUASH_MSG")
 REPO_GIT_PATH_FUNC(merge_msg, "MERGE_MSG")
 REPO_GIT_PATH_FUNC(merge_rr, "MERGE_RR")
diff --git a/path.h b/path.h
index 4c2958a903..7e7408dd05 100644
--- a/path.h
+++ b/path.h
@@ -262,6 +262,36 @@ enum scld_error safe_create_leading_directories_no_share(char *path);
 int safe_create_file_with_leading_directories(struct repository *repo,
 					      const char *path);
 
+/**
+ * The formatting strategy to apply when writing a path into a buffer.
+ */
+enum path_format {
+	/* Output the path exactly as-is without any modifications. */
+	PATH_FORMAT_UNMODIFIED,
+
+	/* Output a path relative to the provided directory prefix. */
+	PATH_FORMAT_RELATIVE,
+
+	/* Output a relative path only if the path shares a root with the prefix. */
+	PATH_FORMAT_RELATIVE_IF_SHARED,
+
+	/* Output a fully resolved, absolute canonical path. */
+	PATH_FORMAT_CANONICAL
+};
+
+/**
+ * Format a path according to the specified formatting strategy and store
+ * the result in the given strbuf, replacing any existing contents.
+ *
+ * `dest`   : The string buffer to store the formatted path into.
+ * `path`   : The path string that needs to be formatted.
+ * `prefix` : The directory prefix to calculate relative offsets against.
+ * Pass NULL to default to the current working directory where applicable.
+ * `format` : The formatting behavior rule to execute.
+ */
+void format_path(struct strbuf *dest, const char *path,
+		 const char *prefix, enum path_format format);
+
 # ifdef USE_THE_REPOSITORY_VARIABLE
 #  include "strbuf.h"
 #  include "repository.h"
-- 
2.55.0-rc1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox