* [PATCH 0/8] Improvements for reading object info
@ 2025-12-18 6:28 Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 1/8] object-file: always set OI_LOOSE when " Patrick Steinhardt
` (12 more replies)
0 siblings, 13 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 6:28 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
Hi,
this patch series contains various small improvements for reading object
info for either loose or packed objects. These improvements were split
out of a larger patch series where I'm about to introduce a new generic
`odb_for_each_object()` function.
This series has a conflict with ps/packfile-store-in-odb-source. I
decided to not make this a dependency though because those two topics
are independent from one another, and I expect that this series here
will be merged down faster than the conflicting one. Furthermore, the
conflict itself is quite minor:
diff --cc packfile.c
index 8daa5a5ee7,ce6716fbea..0000000000
--- a/packfile.c
+++ b/packfile.c
@@@ -2157,10 -2132,11 +2151,10 @@@ int packfile_store_read_object_info(str
struct object_info *oi,
unsigned flags UNUSED)
{
- static struct object_info blank_oi = OBJECT_INFO_INIT;
struct pack_entry e;
- int rtype;
+ int ret;
- if (!find_pack_entry(store->odb->repo, oid, &e))
+ if (!find_pack_entry(store, oid, &e))
return 1;
/*
@@@ -2549,9 -2555,8 +2571,9 @@@ int packfile_store_read_object_stream(s
oi.sizep = &size;
if (packfile_store_read_object_info(store, oid, &oi, 0) ||
- oi.u.packed.is_delta ||
+ oi.u.packed.type == PACKED_OBJECT_TYPE_REF_DELTA ||
+ oi.u.packed.type == PACKED_OBJECT_TYPE_OFS_DELTA ||
- repo_settings_get_big_file_threshold(store->odb->repo) >= size)
+ repo_settings_get_big_file_threshold(store->source->odb->repo) >= size)
return -1;
in_pack_type = unpack_object_header(oi.u.packed.pack,
I'd thus propose to merge this series via an evil merge, but if this
proves to be burdensome I'm happy to defer it to a later point. Just let
me know and I'll adapt accordingly, thanks!
This also fixes the issue reported in <f4ba7e89-4717-4b36-921f-56537131fd69@nvidia.com>.
Patrick
---
Patrick Steinhardt (8):
object-file: always set OI_LOOSE when reading object info
packfile: always declare object info to be OI_PACKED
packfile: extend `is_delta` field to allow for "unknown" state
packfile: always populate pack-specific info when reading object info
packfile: disentangle return value of `packed_object_info()`
packfile: skip unpacking object header for disk size requests
packfile: fix short-circuiting of empty requests
packfile: drop repository parameter from `packed_object_info()`
builtin/cat-file.c | 3 +--
builtin/pack-objects.c | 4 +--
commit-graph.c | 2 +-
object-file.c | 13 ++++++++--
odb.h | 18 +++++++++++--
pack-bitmap.c | 3 +--
packfile.c | 69 +++++++++++++++++++++++++++++++-------------------
packfile.h | 7 +++--
8 files changed, 80 insertions(+), 39 deletions(-)
---
base-commit: c4a0c8845e2426375ad257b6c221a3a7d92ecfda
change-id: 20251215-b4-pks-odb-read-object-info-improvements-0e031ef827d2
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH 1/8] object-file: always set OI_LOOSE when reading object info
2025-12-18 6:28 [PATCH 0/8] Improvements for reading object info Patrick Steinhardt
@ 2025-12-18 6:28 ` Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 2/8] packfile: always declare object info to be OI_PACKED Patrick Steinhardt
` (11 subsequent siblings)
12 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 6:28 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
There are some early returns in ``odb_source_loose_read_object_info()`
in cases where we don't have to open the loose object. These return
paths do not set `struct object_info::whence` to `OI_LOOSE` though, so
it becomes impossible for the caller to tell the format of such an
object.
Nobody seems to care about this right now, but it's a bug waiting to
happen. Fix this by always setting `whence` on success.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
object-file.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/object-file.c b/object-file.c
index af1c3f972d..716b325669 100644
--- a/object-file.c
+++ b/object-file.c
@@ -439,12 +439,21 @@ int odb_source_loose_read_object_info(struct odb_source *source,
*/
if (!oi->typep && !oi->sizep && !oi->contentp) {
struct stat st;
- if (!oi->disk_sizep && (flags & OBJECT_INFO_QUICK))
- return quick_has_loose(source->loose, oid) ? 0 : -1;
+
+ if (!oi->disk_sizep && (flags & OBJECT_INFO_QUICK)) {
+ status = quick_has_loose(source->loose, oid) ? 0 : -1;
+ if (!status)
+ oi->whence = OI_LOOSE;
+ return status;
+ }
+
if (stat_loose_object(source->loose, oid, &st, &path) < 0)
return -1;
+
if (oi->disk_sizep)
*oi->disk_sizep = st.st_size;
+
+ oi->whence = OI_LOOSE;
return 0;
}
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 2/8] packfile: always declare object info to be OI_PACKED
2025-12-18 6:28 [PATCH 0/8] Improvements for reading object info Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 1/8] object-file: always set OI_LOOSE when " Patrick Steinhardt
@ 2025-12-18 6:28 ` Patrick Steinhardt
2025-12-18 7:23 ` Junio C Hamano
2025-12-18 6:28 ` [PATCH 3/8] packfile: extend `is_delta` field to allow for "unknown" state Patrick Steinhardt
` (10 subsequent siblings)
12 siblings, 1 reply; 58+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 6:28 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
When reading object info via a packfile we yield one of two types:
- The object can either be OI_PACKED, which is what a caller would
typically expect.
- Or it can be OI_DBCACHED if it is stored in the delta base cache.
The latter really is an implementation detail though, and callers
typically don't care at all about the difference. Furthermore, the
information whether or not it is part of the delta base cache can
already be derived via the `is_delta` field, so the fact that we discern
between OI_PACKED and OI_DBCACHED only further complicates the
interface.
Drop the OI_DBCACHED enum completely. There don't seem to be any callers
that care about the distinction.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.h | 1 -
packfile.c | 3 +--
2 files changed, 1 insertion(+), 3 deletions(-)
diff --git a/odb.h b/odb.h
index 014cd9585a..73b0b87ad5 100644
--- a/odb.h
+++ b/odb.h
@@ -330,7 +330,6 @@ struct object_info {
OI_CACHED,
OI_LOOSE,
OI_PACKED,
- OI_DBCACHED
} whence;
union {
/*
diff --git a/packfile.c b/packfile.c
index c88bd92619..79ad9d7179 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1656,8 +1656,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
oidclr(oi->delta_base_oid, p->repo->hash_algo);
}
- oi->whence = in_delta_base_cache(p, obj_offset) ? OI_DBCACHED :
- OI_PACKED;
+ oi->whence = OI_PACKED;
out:
unuse_pack(&w_curs);
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 3/8] packfile: extend `is_delta` field to allow for "unknown" state
2025-12-18 6:28 [PATCH 0/8] Improvements for reading object info Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 1/8] object-file: always set OI_LOOSE when " Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 2/8] packfile: always declare object info to be OI_PACKED Patrick Steinhardt
@ 2025-12-18 6:28 ` Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 4/8] packfile: always populate pack-specific info when reading object info Patrick Steinhardt
` (9 subsequent siblings)
12 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 6:28 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
The `struct object_info::u::packed::is_delta` field determines whether
or not a specific object is stored as a delta. It only stores whether or
not the object is stored as delta, so it is treated as a boolean value.
This boolean is insufficient though: when reading a packed object via
`packfile_store_read_object_info()` we know to skip parsing the actual
object when the user didn't request any object-specific data. In that
case we won't read the object itself, but will only look up its position
in the packfile. Consequently, we do not know whether it is a delta or
not.
This isn't really an issue right now, as the check for an empty request
is broken. But a subsequent commit will fix it, and once we do we will
have the need to also represent an "unknown" delta state.
Prepare for this change by introducing a new enum that encodes the
object type. We don't use the "unknown" state just yet, but will start
to do so in the next commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.h | 7 ++++++-
packfile.c | 17 ++++++++++++++---
2 files changed, 20 insertions(+), 4 deletions(-)
diff --git a/odb.h b/odb.h
index 73b0b87ad5..afae5e5c01 100644
--- a/odb.h
+++ b/odb.h
@@ -343,7 +343,12 @@ struct object_info {
struct {
struct packed_git *pack;
off_t offset;
- unsigned int is_delta;
+ enum packed_object_type {
+ PACKED_OBJECT_TYPE_UNKNOWN,
+ PACKED_OBJECT_TYPE_FULL,
+ PACKED_OBJECT_TYPE_OFS_DELTA,
+ PACKED_OBJECT_TYPE_REF_DELTA,
+ } type;
} packed;
} u;
};
diff --git a/packfile.c b/packfile.c
index 79ad9d7179..9bce52f912 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2160,8 +2160,18 @@ int packfile_store_read_object_info(struct packfile_store *store,
if (oi->whence == OI_PACKED) {
oi->u.packed.offset = e.offset;
oi->u.packed.pack = e.p;
- oi->u.packed.is_delta = (rtype == OBJ_REF_DELTA ||
- rtype == OBJ_OFS_DELTA);
+
+ switch (rtype) {
+ case OBJ_REF_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
+ break;
+ case OBJ_OFS_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA;
+ break;
+ default:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_FULL;
+ break;
+ }
}
return 0;
@@ -2532,7 +2542,8 @@ int packfile_store_read_object_stream(struct odb_read_stream **out,
oi.sizep = &size;
if (packfile_store_read_object_info(store, oid, &oi, 0) ||
- oi.u.packed.is_delta ||
+ oi.u.packed.type == PACKED_OBJECT_TYPE_REF_DELTA ||
+ oi.u.packed.type == PACKED_OBJECT_TYPE_OFS_DELTA ||
repo_settings_get_big_file_threshold(store->odb->repo) >= size)
return -1;
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 4/8] packfile: always populate pack-specific info when reading object info
2025-12-18 6:28 [PATCH 0/8] Improvements for reading object info Patrick Steinhardt
` (2 preceding siblings ...)
2025-12-18 6:28 ` [PATCH 3/8] packfile: extend `is_delta` field to allow for "unknown" state Patrick Steinhardt
@ 2025-12-18 6:28 ` Patrick Steinhardt
2025-12-18 7:32 ` Junio C Hamano
2025-12-18 6:28 ` [PATCH 5/8] packfile: disentangle return value of `packed_object_info()` Patrick Steinhardt
` (8 subsequent siblings)
12 siblings, 1 reply; 58+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 6:28 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
When reading object information from a packfile we are not always
populating the pack-specific information. This happens in two cases:
- When calling `packed_object_info()` directly instead of
`packfile_store_read_object_info()`.
- When we've got the empty request.
Fix both of these issues so that we can always assume the pack info to
be populated when reading object info from a pack.
Note that we don't really care about the second case right now, as the
condition will always evaluate to false anyway. This will be fixed in
the next commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 38 ++++++++++++++++++++------------------
1 file changed, 20 insertions(+), 18 deletions(-)
diff --git a/packfile.c b/packfile.c
index 9bce52f912..6e66c90c46 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1657,6 +1657,20 @@ int packed_object_info(struct repository *r, struct packed_git *p,
}
oi->whence = OI_PACKED;
+ oi->u.packed.offset = obj_offset;
+ oi->u.packed.pack = p;
+
+ switch (type) {
+ case OBJ_REF_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
+ break;
+ case OBJ_OFS_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA;
+ break;
+ default:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_FULL;
+ break;
+ }
out:
unuse_pack(&w_curs);
@@ -2148,8 +2162,13 @@ int packfile_store_read_object_info(struct packfile_store *store,
* We know that the caller doesn't actually need the
* information below, so return early.
*/
- if (oi == &blank_oi)
+ if (oi == &blank_oi) {
+ oi->whence = OI_PACKED;
+ oi->u.packed.offset = e.offset;
+ oi->u.packed.pack = e.p;
+ oi->u.packed.type = PACKED_OBJECT_TYPE_UNKNOWN;
return 0;
+ }
rtype = packed_object_info(store->odb->repo, e.p, e.offset, oi);
if (rtype < 0) {
@@ -2157,23 +2176,6 @@ int packfile_store_read_object_info(struct packfile_store *store,
return -1;
}
- if (oi->whence == OI_PACKED) {
- oi->u.packed.offset = e.offset;
- oi->u.packed.pack = e.p;
-
- switch (rtype) {
- case OBJ_REF_DELTA:
- oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
- break;
- case OBJ_OFS_DELTA:
- oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA;
- break;
- default:
- oi->u.packed.type = PACKED_OBJECT_TYPE_FULL;
- break;
- }
- }
-
return 0;
}
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 5/8] packfile: disentangle return value of `packed_object_info()`
2025-12-18 6:28 [PATCH 0/8] Improvements for reading object info Patrick Steinhardt
` (3 preceding siblings ...)
2025-12-18 6:28 ` [PATCH 4/8] packfile: always populate pack-specific info when reading object info Patrick Steinhardt
@ 2025-12-18 6:28 ` Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 6/8] packfile: skip unpacking object header for disk size requests Patrick Steinhardt
` (7 subsequent siblings)
12 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 6:28 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
The `packed_object_info()` function returns the type of the packed
object. While we use an `enum object_type` to store the return value,
this type is not to be confused with the actual object type. It _may_
contain the object type, but it may just as well encode that the given
packed object is stored as a delta.
We have removed the only caller that relied on this returned object type
in the preceding commit, so let's simplify semantics and return either 0
on success or a negative error code otherwise.
This unblocks a small optimization where we can skip reading the object
type altogether.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 21 ++++++++++++---------
packfile.h | 4 ++++
2 files changed, 16 insertions(+), 9 deletions(-)
diff --git a/packfile.c b/packfile.c
index 6e66c90c46..c141b8a7b1 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1587,6 +1587,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
unsigned long size;
off_t curpos = obj_offset;
enum object_type type;
+ int ret;
/*
* We always get the representation type, but only convert it to
@@ -1607,12 +1608,12 @@ int packed_object_info(struct repository *r, struct packed_git *p,
off_t base_offset = get_delta_base(p, &w_curs, &tmp_pos,
type, obj_offset);
if (!base_offset) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
*oi->sizep = get_size_from_delta(p, &w_curs, tmp_pos);
if (*oi->sizep == 0) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
} else {
@@ -1625,7 +1626,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (offset_to_pack_pos(p, obj_offset, &pos) < 0) {
error("could not find object at offset %"PRIuMAX" "
"in pack %s", (uintmax_t)obj_offset, p->pack_name);
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
@@ -1639,7 +1640,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (oi->typep)
*oi->typep = ptot;
if (ptot < 0) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
}
@@ -1649,7 +1650,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (get_delta_base_oid(p, &w_curs, curpos,
oi->delta_base_oid,
type, obj_offset) < 0) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
} else
@@ -1672,9 +1673,11 @@ int packed_object_info(struct repository *r, struct packed_git *p,
break;
}
+ ret = 0;
+
out:
unuse_pack(&w_curs);
- return type;
+ return ret;
}
static void *unpack_compressed_entry(struct packed_git *p,
@@ -2153,7 +2156,7 @@ int packfile_store_read_object_info(struct packfile_store *store,
{
static struct object_info blank_oi = OBJECT_INFO_INIT;
struct pack_entry e;
- int rtype;
+ int ret;
if (!find_pack_entry(store->odb->repo, oid, &e))
return 1;
@@ -2170,8 +2173,8 @@ int packfile_store_read_object_info(struct packfile_store *store,
return 0;
}
- rtype = packed_object_info(store->odb->repo, e.p, e.offset, oi);
- if (rtype < 0) {
+ ret = packed_object_info(store->odb->repo, e.p, e.offset, oi);
+ if (ret < 0) {
mark_bad_packed_object(e.p, oid);
return -1;
}
diff --git a/packfile.h b/packfile.h
index 59d162a3f4..07f5bfbc4f 100644
--- a/packfile.h
+++ b/packfile.h
@@ -378,6 +378,10 @@ void release_pack_memory(size_t);
/* global flag to enable extra checks when accessing packed objects */
extern int do_check_packed_object_crc;
+/*
+ * Look up the object info for a specific offset in the packfile.
+ * success, a negative error code otherwise.
+ */
int packed_object_info(struct repository *r,
struct packed_git *pack,
off_t offset, struct object_info *);
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 6/8] packfile: skip unpacking object header for disk size requests
2025-12-18 6:28 [PATCH 0/8] Improvements for reading object info Patrick Steinhardt
` (4 preceding siblings ...)
2025-12-18 6:28 ` [PATCH 5/8] packfile: disentangle return value of `packed_object_info()` Patrick Steinhardt
@ 2025-12-18 6:28 ` Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 7/8] packfile: fix short-circuiting of empty requests Patrick Steinhardt
` (6 subsequent siblings)
12 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 6:28 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
While most of the object info requests for a packed object require us to
unpack its headers, reading its disk size doesn't. We still unpack the
object header in that case though, which is unnecessary work.
Skip reading the header if only the disk size is requested. This leads
to a small speedup when reading disk size, only. The following benchmark
was done in the Git repository:
Benchmark 1: ./git rev-list --disk-usage HEAD (rev = HEAD~)
Time (mean ± σ): 105.2 ms ± 0.6 ms [User: 91.4 ms, System: 13.3 ms]
Range (min … max): 103.7 ms … 106.0 ms 27 runs
Benchmark 2: ./git rev-list --disk-usage HEAD (rev = HEAD)
Time (mean ± σ): 96.7 ms ± 0.4 ms [User: 86.2 ms, System: 10.0 ms]
Range (min … max): 96.2 ms … 98.1 ms 30 runs
Summary
./git rev-list --disk-usage HEAD (rev = HEAD) ran
1.09 ± 0.01 times faster than ./git rev-list --disk-usage HEAD (rev = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/packfile.c b/packfile.c
index c141b8a7b1..d2ae2432eb 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1586,7 +1586,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
struct pack_window *w_curs = NULL;
unsigned long size;
off_t curpos = obj_offset;
- enum object_type type;
+ enum object_type type = OBJ_NONE;
int ret;
/*
@@ -1598,7 +1598,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
&type);
if (!*oi->contentp)
type = OBJ_BAD;
- } else {
+ } else if (oi->sizep || oi->typep || oi->delta_base_oid) {
type = unpack_object_header(p, &w_curs, &curpos, &size);
}
@@ -1662,6 +1662,9 @@ int packed_object_info(struct repository *r, struct packed_git *p,
oi->u.packed.pack = p;
switch (type) {
+ case OBJ_NONE:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_UNKNOWN;
+ break;
case OBJ_REF_DELTA:
oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
break;
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 7/8] packfile: fix short-circuiting of empty requests
2025-12-18 6:28 [PATCH 0/8] Improvements for reading object info Patrick Steinhardt
` (5 preceding siblings ...)
2025-12-18 6:28 ` [PATCH 6/8] packfile: skip unpacking object header for disk size requests Patrick Steinhardt
@ 2025-12-18 6:28 ` Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 8/8] packfile: drop repository parameter from `packed_object_info()` Patrick Steinhardt
` (5 subsequent siblings)
12 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 6:28 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
When reading object information from the packfile store we have logic
that tries to bail out early on empty requests. This is supposed to be a
performance optimization so that we don't even have to unpack the object
header stored in the packfile.
This optimization doesn't work though: we compare the passed-in object
info pointer with the pointer of an on-stack variable, which of course
cannot ever become true. This issue was introduced via d9f517d051
(object-file: split out functions relating to object store subsystem,
2025-04-15): before this commit, we checked whether the passed-in object
info was a `NULL` pointer, and if so, we set it to point to `blank_oi`
instead. The commit then split up these the logic so that we continue to
set up `blank_oi` in `do_oid_object_info_extended()`, but then do the
check in `packfile_store_read_object_info()`. But even before that
commit the logic was only partially working, as it could very well be
that callers pass a blank object info themselves.
Fix this bug by introducing a new `object_info_is_blank_request()`
helper, which simply verifies that none of the contained request
pointers are populated.
Reported-by: Aaron Plattner <aplattner@nvidia.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.h | 10 ++++++++++
packfile.c | 3 +--
2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/odb.h b/odb.h
index afae5e5c01..b869c054c1 100644
--- a/odb.h
+++ b/odb.h
@@ -353,6 +353,16 @@ struct object_info {
} u;
};
+/*
+ * Given an object info structure, figure out whether any of its request
+ * pointers are populated.
+ */
+static inline bool object_info_is_blank_request(struct object_info *oi)
+{
+ return !oi->typep && !oi->sizep && !oi->disk_sizep &&
+ !oi->delta_base_oid && !oi->contentp;
+}
+
/*
* Initializer for a "struct object_info" that wants no items. You may
* also memset() the memory to all-zeroes.
diff --git a/packfile.c b/packfile.c
index d2ae2432eb..ce83e77899 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2157,7 +2157,6 @@ int packfile_store_read_object_info(struct packfile_store *store,
struct object_info *oi,
unsigned flags UNUSED)
{
- static struct object_info blank_oi = OBJECT_INFO_INIT;
struct pack_entry e;
int ret;
@@ -2168,7 +2167,7 @@ int packfile_store_read_object_info(struct packfile_store *store,
* We know that the caller doesn't actually need the
* information below, so return early.
*/
- if (oi == &blank_oi) {
+ if (object_info_is_blank_request(oi)) {
oi->whence = OI_PACKED;
oi->u.packed.offset = e.offset;
oi->u.packed.pack = e.p;
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 8/8] packfile: drop repository parameter from `packed_object_info()`
2025-12-18 6:28 [PATCH 0/8] Improvements for reading object info Patrick Steinhardt
` (6 preceding siblings ...)
2025-12-18 6:28 ` [PATCH 7/8] packfile: fix short-circuiting of empty requests Patrick Steinhardt
@ 2025-12-18 6:28 ` Patrick Steinhardt
2025-12-18 8:09 ` [PATCH 0/8] Improvements for reading object info Junio C Hamano
` (4 subsequent siblings)
12 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 6:28 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
The function `packed_object_info()` takes a packfile and offset and
returns the object info for the corresponding object. Despite these two
parameters though it also takes a repository pointer. This is redundant
information though, as `struct packed_git` already has a repository
pointer that is always populated.
Drop the redundant parameter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/cat-file.c | 3 +--
builtin/pack-objects.c | 4 ++--
commit-graph.c | 2 +-
pack-bitmap.c | 3 +--
packfile.c | 8 ++++----
packfile.h | 3 +--
6 files changed, 10 insertions(+), 13 deletions(-)
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 505ddaa12f..2ad712e9f8 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -487,8 +487,7 @@ static void batch_object_write(const char *obj_name,
data->info.sizep = &data->size;
if (pack)
- ret = packed_object_info(the_repository, pack,
- offset, &data->info);
+ ret = packed_object_info(pack, offset, &data->info);
else
ret = odb_read_object_info_extended(the_repository->objects,
&data->oid, &data->info,
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 1ce8d6ee21..85762f8c4f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2411,7 +2411,7 @@ static void drop_reused_delta(struct object_entry *entry)
oi.sizep = &size;
oi.typep = &type;
- if (packed_object_info(the_repository, IN_PACK(entry), entry->in_pack_offset, &oi) < 0) {
+ if (packed_object_info(IN_PACK(entry), entry->in_pack_offset, &oi) < 0) {
/*
* We failed to get the info from this pack for some reason;
* fall back to odb_read_object_info, which may find another copy.
@@ -3748,7 +3748,7 @@ static int add_object_entry_from_pack(const struct object_id *oid,
struct object_info oi = OBJECT_INFO_INIT;
oi.typep = &type;
- if (packed_object_info(the_repository, p, ofs, &oi) < 0) {
+ if (packed_object_info(p, ofs, &oi) < 0) {
die(_("could not get type of object %s in pack %s"),
oid_to_hex(oid), p->pack_name);
} else if (type == OBJ_COMMIT) {
diff --git a/commit-graph.c b/commit-graph.c
index 80be2ff2c3..f572670bd0 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1499,7 +1499,7 @@ static int add_packed_commits(const struct object_id *oid,
display_progress(ctx->progress, ++ctx->progress_done);
oi.typep = &type;
- if (packed_object_info(ctx->r, pack, offset, &oi) < 0)
+ if (packed_object_info(pack, offset, &oi) < 0)
die(_("unable to get type of object %s"), oid_to_hex(oid));
if (type != OBJ_COMMIT)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 8ca79725b1..972203f12b 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1876,8 +1876,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
ofs = pack_pos_to_offset(pack, pos);
}
- if (packed_object_info(bitmap_repo(bitmap_git), pack, ofs,
- &oi) < 0) {
+ if (packed_object_info(pack, ofs, &oi) < 0) {
struct object_id oid;
nth_bitmap_object_oid(bitmap_git, &oid,
pack_pos_to_index(pack, pos));
diff --git a/packfile.c b/packfile.c
index ce83e77899..8daa5a5ee7 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1580,7 +1580,7 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
hashmap_add(&delta_base_cache, &ent->ent);
}
-int packed_object_info(struct repository *r, struct packed_git *p,
+int packed_object_info(struct packed_git *p,
off_t obj_offset, struct object_info *oi)
{
struct pack_window *w_curs = NULL;
@@ -1594,7 +1594,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
* a "real" type later if the caller is interested.
*/
if (oi->contentp) {
- *oi->contentp = cache_or_unpack_entry(r, p, obj_offset, oi->sizep,
+ *oi->contentp = cache_or_unpack_entry(p->repo, p, obj_offset, oi->sizep,
&type);
if (!*oi->contentp)
type = OBJ_BAD;
@@ -1635,7 +1635,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (oi->typep) {
enum object_type ptot;
- ptot = packed_to_object_type(r, p, obj_offset,
+ ptot = packed_to_object_type(p->repo, p, obj_offset,
type, &w_curs, curpos);
if (oi->typep)
*oi->typep = ptot;
@@ -2175,7 +2175,7 @@ int packfile_store_read_object_info(struct packfile_store *store,
return 0;
}
- ret = packed_object_info(store->odb->repo, e.p, e.offset, oi);
+ ret = packed_object_info(e.p, e.offset, oi);
if (ret < 0) {
mark_bad_packed_object(e.p, oid);
return -1;
diff --git a/packfile.h b/packfile.h
index 07f5bfbc4f..573d06f6ba 100644
--- a/packfile.h
+++ b/packfile.h
@@ -382,8 +382,7 @@ extern int do_check_packed_object_crc;
* Look up the object info for a specific offset in the packfile.
* success, a negative error code otherwise.
*/
-int packed_object_info(struct repository *r,
- struct packed_git *pack,
+int packed_object_info(struct packed_git *pack,
off_t offset, struct object_info *);
void mark_bad_packed_object(struct packed_git *, const struct object_id *);
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH 2/8] packfile: always declare object info to be OI_PACKED
2025-12-18 6:28 ` [PATCH 2/8] packfile: always declare object info to be OI_PACKED Patrick Steinhardt
@ 2025-12-18 7:23 ` Junio C Hamano
2025-12-18 9:10 ` Patrick Steinhardt
0 siblings, 1 reply; 58+ messages in thread
From: Junio C Hamano @ 2025-12-18 7:23 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Aaron Plattner
Patrick Steinhardt <ps@pks.im> writes:
> When reading object info via a packfile we yield one of two types:
>
> - The object can either be OI_PACKED, which is what a caller would
> typically expect.
>
> - Or it can be OI_DBCACHED if it is stored in the delta base cache.
>
> The latter really is an implementation detail though, and callers
> typically don't care at all about the difference. Furthermore, the
> information whether or not it is part of the delta base cache can
> already be derived via the `is_delta` field, so the fact that we discern
> between OI_PACKED and OI_DBCACHED only further complicates the
> interface.
If this were "and no existing callers check at all", it would be
trivial to decide for this change. In fact you do say that but in a
weaker form just below.
> Drop the OI_DBCACHED enum completely. There don't seem to be any callers
> that care about the distinction.
"git grep OI_DBCACHED" shows only a single hit, which is what you
are getting rid of in this patch, but I cannot claim that we did a
sufficient audit, as this change will break code paths that check if
they got OI_PACKED and do something differently (or if what they got
is different from OI_PACKED, for that matter).
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> odb.h | 1 -
> packfile.c | 3 +--
> 2 files changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/odb.h b/odb.h
> index 014cd9585a..73b0b87ad5 100644
> --- a/odb.h
> +++ b/odb.h
> @@ -330,7 +330,6 @@ struct object_info {
> OI_CACHED,
> OI_LOOSE,
> OI_PACKED,
> - OI_DBCACHED
> } whence;
> union {
> /*
> diff --git a/packfile.c b/packfile.c
> index c88bd92619..79ad9d7179 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -1656,8 +1656,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
> oidclr(oi->delta_base_oid, p->repo->hash_algo);
> }
>
> - oi->whence = in_delta_base_cache(p, obj_offset) ? OI_DBCACHED :
> - OI_PACKED;
> + oi->whence = OI_PACKED;
>
> out:
> unuse_pack(&w_curs);
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 4/8] packfile: always populate pack-specific info when reading object info
2025-12-18 6:28 ` [PATCH 4/8] packfile: always populate pack-specific info when reading object info Patrick Steinhardt
@ 2025-12-18 7:32 ` Junio C Hamano
0 siblings, 0 replies; 58+ messages in thread
From: Junio C Hamano @ 2025-12-18 7:32 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Aaron Plattner
Patrick Steinhardt <ps@pks.im> writes:
> @@ -2148,8 +2162,13 @@ int packfile_store_read_object_info(struct packfile_store *store,
> * We know that the caller doesn't actually need the
> * information below, so return early.
> */
> - if (oi == &blank_oi)
> + if (oi == &blank_oi) {
> + oi->whence = OI_PACKED;
> + oi->u.packed.offset = e.offset;
> + oi->u.packed.pack = e.p;
> + oi->u.packed.type = PACKED_OBJECT_TYPE_UNKNOWN;
It cannot be seen as it is before the precontext, but if blank_oi is
still a function scope static that is initialized only once by
assigning OBJECT_INFO_INIT, this will leave a timg bomb waiting to
go off, as it violates the "blank"-ness promise for the next caller
of this function who calls NULL in oi.
I'd prefer we fix this nonsense "we only declared a function scope
static, but without actually using it for anything, other than to
compare its address with the caller supplied parameter" well before
this step.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 0/8] Improvements for reading object info
2025-12-18 6:28 [PATCH 0/8] Improvements for reading object info Patrick Steinhardt
` (7 preceding siblings ...)
2025-12-18 6:28 ` [PATCH 8/8] packfile: drop repository parameter from `packed_object_info()` Patrick Steinhardt
@ 2025-12-18 8:09 ` Junio C Hamano
2025-12-18 8:30 ` Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 0/7] " Patrick Steinhardt
` (3 subsequent siblings)
12 siblings, 1 reply; 58+ messages in thread
From: Junio C Hamano @ 2025-12-18 8:09 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Aaron Plattner
Patrick Steinhardt <ps@pks.im> writes:
> This series has a conflict with ps/packfile-store-in-odb-source. I
> decided to not make this a dependency though because those two topics
> are independent from one another, and I expect that this series here
> will be merged down faster than the conflicting one. Furthermore, the
> conflict itself is quite minor:
>
> diff --cc packfile.c
> index 8daa5a5ee7,ce6716fbea..0000000000
> --- a/packfile.c
> +++ b/packfile.c
> @@@ -2157,10 -2132,11 +2151,10 @@@ int packfile_store_read_object_info(str
> struct object_info *oi,
> unsigned flags UNUSED)
> {
> - static struct object_info blank_oi = OBJECT_INFO_INIT;
> struct pack_entry e;
> - int rtype;
> + int ret;
>
> - if (!find_pack_entry(store->odb->repo, oid, &e))
> + if (!find_pack_entry(store, oid, &e))
> return 1;
>
> /*
> @@@ -2549,9 -2555,8 +2571,9 @@@ int packfile_store_read_object_stream(s
> oi.sizep = &size;
>
> if (packfile_store_read_object_info(store, oid, &oi, 0) ||
> - oi.u.packed.is_delta ||
> + oi.u.packed.type == PACKED_OBJECT_TYPE_REF_DELTA ||
> + oi.u.packed.type == PACKED_OBJECT_TYPE_OFS_DELTA ||
> - repo_settings_get_big_file_threshold(store->odb->repo) >= size)
> + repo_settings_get_big_file_threshold(store->source->odb->repo) >= size)
> return -1;
>
> in_pack_type = unpack_object_header(oi.u.packed.pack,
>
> I'd thus propose to merge this series via an evil merge, but if this
> proves to be burdensome I'm happy to defer it to a later point. Just let
> me know and I'll adapt accordingly, thanks!
Indeed the conflicts above are miniscule that it does not even need
any evil merge. The surviving lines are all from either ours or
theirs, that changes are close enough to be shown in --cc.
But let me first concentrate more on fixing performance regression
that already made down to 'master'. It is a shame that nobody
caught it while it was cooking in 'next'.
Thanks.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 0/8] Improvements for reading object info
2025-12-18 8:09 ` [PATCH 0/8] Improvements for reading object info Junio C Hamano
@ 2025-12-18 8:30 ` Patrick Steinhardt
0 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 8:30 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Aaron Plattner
On Thu, Dec 18, 2025 at 05:09:21PM +0900, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > diff --cc packfile.c
> > index 8daa5a5ee7,ce6716fbea..0000000000
> > --- a/packfile.c
> > +++ b/packfile.c
> > @@@ -2549,9 -2555,8 +2571,9 @@@ int packfile_store_read_object_stream(s
> > oi.sizep = &size;
> >
> > if (packfile_store_read_object_info(store, oid, &oi, 0) ||
> > - oi.u.packed.is_delta ||
> > + oi.u.packed.type == PACKED_OBJECT_TYPE_REF_DELTA ||
> > + oi.u.packed.type == PACKED_OBJECT_TYPE_OFS_DELTA ||
> > - repo_settings_get_big_file_threshold(store->odb->repo) >= size)
> > + repo_settings_get_big_file_threshold(store->source->odb->repo) >= size)
> > return -1;
> >
> > in_pack_type = unpack_object_header(oi.u.packed.pack,
> >
> > I'd thus propose to merge this series via an evil merge, but if this
> > proves to be burdensome I'm happy to defer it to a later point. Just let
> > me know and I'll adapt accordingly, thanks!
>
> Indeed the conflicts above are miniscule that it does not even need
> any evil merge. The surviving lines are all from either ours or
> theirs, that changes are close enough to be shown in --cc.
>
> But let me first concentrate more on fixing performance regression
> that already made down to 'master'. It is a shame that nobody
> caught it while it was cooking in 'next'.
Fair enough, so that means that you'd want to merge your patch down
first, right? If so I'll rebase my series on top of your patch and then
resend it soonish.
In any case, I noticed a slight regression in one of the benchmarks that
prints all objects, but I attributed it to CI flakiness [1]. The uptick
didn't seem strong enough to really be a regression, and I'm still not
sure whether it's related to this patch series or not. Chances are it
is. I'll investigate and make sure to extend the benchmarking suite
accordingly so that we have a clearer signal there.
Thanks!
Patrick
[1]: https://bencher.dev/perf/git?branches=595859eb-071c-48e9-97cf-195e0a3d6ed1&testbeds=02dcb8ad-6873-494c-aabc-9a6237601308&benchmarks=0da3d87a-ce30-4125-86e9-12d84ec4bc49&measures=63dafffb-98c4-4c27-ba43-7112cae627fc
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 2/8] packfile: always declare object info to be OI_PACKED
2025-12-18 7:23 ` Junio C Hamano
@ 2025-12-18 9:10 ` Patrick Steinhardt
0 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 9:10 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Aaron Plattner
On Thu, Dec 18, 2025 at 04:23:03PM +0900, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
>
> > When reading object info via a packfile we yield one of two types:
> >
> > - The object can either be OI_PACKED, which is what a caller would
> > typically expect.
> >
> > - Or it can be OI_DBCACHED if it is stored in the delta base cache.
> >
> > The latter really is an implementation detail though, and callers
> > typically don't care at all about the difference. Furthermore, the
> > information whether or not it is part of the delta base cache can
> > already be derived via the `is_delta` field, so the fact that we discern
> > between OI_PACKED and OI_DBCACHED only further complicates the
> > interface.
>
> If this were "and no existing callers check at all", it would be
> trivial to decide for this change. In fact you do say that but in a
> weaker form just below.
>
> > Drop the OI_DBCACHED enum completely. There don't seem to be any callers
> > that care about the distinction.
>
> "git grep OI_DBCACHED" shows only a single hit, which is what you
> are getting rid of in this patch, but I cannot claim that we did a
> sufficient audit, as this change will break code paths that check if
> they got OI_PACKED and do something differently (or if what they got
> is different from OI_PACKED, for that matter).
That's a fair complaint. I'll adapt the commit message to include the
investigation.
Thanks!
Patrick
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH v2 0/7] Improvements for reading object info
2025-12-18 6:28 [PATCH 0/8] Improvements for reading object info Patrick Steinhardt
` (8 preceding siblings ...)
2025-12-18 8:09 ` [PATCH 0/8] Improvements for reading object info Junio C Hamano
@ 2025-12-18 10:54 ` Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 1/7] object-file: always set OI_LOOSE when " Patrick Steinhardt
` (6 more replies)
2026-01-06 6:54 ` [PATCH v3 0/7] Improvements for reading object info Patrick Steinhardt
` (2 subsequent siblings)
12 siblings, 7 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 10:54 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
Hi,
this patch series contains various small improvements for reading object
info for either loose or packed objects. These improvements were split
out of a larger patch series where I'm about to introduce a new generic
`odb_for_each_object()` function.
Changes in v2:
- Rebase the series on top of master with jc/object-read-stream-fix
merged into it. I've also evicted the patch that fixes the same
underlying issue.
- Improve the commit message that drops OI_DBCACHED to explain why
this is a safe refactoring.
- Link to v1: https://lore.kernel.org/r/20251218-b4-pks-odb-read-object-info-improvements-v1-0-81c8368492be@pks.im
Thanks!
Patrick
---
Patrick Steinhardt (7):
object-file: always set OI_LOOSE when reading object info
packfile: always declare object info to be OI_PACKED
packfile: extend `is_delta` field to allow for "unknown" state
packfile: always populate pack-specific info when reading object info
packfile: disentangle return value of `packed_object_info()`
packfile: skip unpacking object header for disk size requests
packfile: drop repository parameter from `packed_object_info()`
builtin/cat-file.c | 3 +--
builtin/pack-objects.c | 4 ++--
commit-graph.c | 2 +-
object-file.c | 19 ++++++++++++----
odb.h | 8 +++++--
pack-bitmap.c | 3 +--
packfile.c | 61 ++++++++++++++++++++++++++++++--------------------
packfile.h | 7 ++++--
8 files changed, 68 insertions(+), 39 deletions(-)
Range-diff versus v1:
1: 0c1a4a4745 < -: ---------- object-file: always set OI_LOOSE when reading object info
-: ---------- > 1: 2287c0cbd9 object-file: always set OI_LOOSE when reading object info
2: 98962428cf ! 2: a1cd99af9c packfile: always declare object info to be OI_PACKED
@@ Commit message
between OI_PACKED and OI_DBCACHED only further complicates the
interface.
- Drop the OI_DBCACHED enum completely. There don't seem to be any callers
- that care about the distinction.
+ There aren't all that many callers that care about the `whence` field in
+ the first place. In fact, there's only three:
+
+ - `packfile_store_read_object_info()` checks for `whence == OI_PACKED`
+ and then populates the packfile information of the object info
+ structure. We now start to do this also for deltified objects, which
+ gives its callers strictly more information.
+
+ - `repack_local_links()` wants to determine whether the object is part
+ of a promisor pack and checks for `whence == OI_PACKED`. If so, it
+ verifies that the packfile is a promisor pack. It's arguably wrong
+ to declare that an object is not part of a promisor pack only
+ because it is stored in the delta base cache.
+
+ - `is_not_in_promisor_pack_obj()` does the same, but checks that a
+ specific object is _not_ part of a promisor pack. The same reasoning
+ as above applies.
+
+ Drop the OI_DBCACHED enum completely. None of the callers seem to care
+ about the distinction.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
3: 0a5b806934 = 3: 7a043c09ee packfile: extend `is_delta` field to allow for "unknown" state
4: 6a05c85683 ! 4: 448511cb19 packfile: always populate pack-specific info when reading object info
@@ Metadata
## Commit message ##
packfile: always populate pack-specific info when reading object info
- When reading object information from a packfile we are not always
- populating the pack-specific information. This happens in two cases:
+ When reading object information via `packed_object_info()` we may not
+ populate the object info's packfile-specific fields. This leads to
+ inconsistent object info depending on whether the info was populated via
+ `packfile_store_read_object_info()` or `packed_object_info()`.
- - When calling `packed_object_info()` directly instead of
- `packfile_store_read_object_info()`.
-
- - When we've got the empty request.
-
- Fix both of these issues so that we can always assume the pack info to
- be populated when reading object info from a pack.
-
- Note that we don't really care about the second case right now, as the
- condition will always evaluate to false anyway. This will be fixed in
- the next commit.
+ Fix this inconsistecny so that we can always assume the pack info to be
+ populated when reading object info from a pack.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
@@ packfile.c: int packed_object_info(struct repository *r, struct packed_git *p,
out:
unuse_pack(&w_curs);
-@@ packfile.c: int packfile_store_read_object_info(struct packfile_store *store,
- * We know that the caller doesn't actually need the
- * information below, so return early.
- */
-- if (oi == &blank_oi)
-+ if (oi == &blank_oi) {
-+ oi->whence = OI_PACKED;
-+ oi->u.packed.offset = e.offset;
-+ oi->u.packed.pack = e.p;
-+ oi->u.packed.type = PACKED_OBJECT_TYPE_UNKNOWN;
- return 0;
-+ }
-
- rtype = packed_object_info(store->odb->repo, e.p, e.offset, oi);
- if (rtype < 0) {
@@ packfile.c: int packfile_store_read_object_info(struct packfile_store *store,
return -1;
}
5: b09f37400c ! 5: e1ee6c7841 packfile: disentangle return value of `packed_object_info()`
@@ packfile.c: int packed_object_info(struct repository *r, struct packed_git *p,
static void *unpack_compressed_entry(struct packed_git *p,
@@ packfile.c: int packfile_store_read_object_info(struct packfile_store *store,
+ unsigned flags UNUSED)
{
- static struct object_info blank_oi = OBJECT_INFO_INIT;
struct pack_entry e;
- int rtype;
+ int ret;
@@ packfile.c: int packfile_store_read_object_info(struct packfile_store *store,
if (!find_pack_entry(store->odb->repo, oid, &e))
return 1;
@@ packfile.c: int packfile_store_read_object_info(struct packfile_store *store,
+ if (!oi)
return 0;
- }
- rtype = packed_object_info(store->odb->repo, e.p, e.offset, oi);
- if (rtype < 0) {
6: 253c0d47ab = 6: 77589e84b5 packfile: skip unpacking object header for disk size requests
7: 4dac51d4be < -: ---------- packfile: fix short-circuiting of empty requests
8: 2cf441de0d ! 7: 08f4b865e5 packfile: drop repository parameter from `packed_object_info()`
@@ packfile.c: int packed_object_info(struct repository *r, struct packed_git *p,
if (oi->typep)
*oi->typep = ptot;
@@ packfile.c: int packfile_store_read_object_info(struct packfile_store *store,
+ if (!oi)
return 0;
- }
- ret = packed_object_info(store->odb->repo, e.p, e.offset, oi);
+ ret = packed_object_info(e.p, e.offset, oi);
---
base-commit: 7df68b50e49b6a1b576abb19b2e5d457749bc28b
change-id: 20251215-b4-pks-odb-read-object-info-improvements-0e031ef827d2
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH v2 1/7] object-file: always set OI_LOOSE when reading object info
2025-12-18 10:54 ` [PATCH v2 0/7] " Patrick Steinhardt
@ 2025-12-18 10:54 ` Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 2/7] packfile: always declare object info to be OI_PACKED Patrick Steinhardt
` (5 subsequent siblings)
6 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 10:54 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
There are some early returns in ``odb_source_loose_read_object_info()`
in cases where we don't have to open the loose object. These return
paths do not set `struct object_info::whence` to `OI_LOOSE` though, so
it becomes impossible for the caller to tell the format of such an
object.
Nobody seems to care about this right now, but it's a bug waiting to
happen. Fix this by always setting `whence` on success.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
object-file.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)
diff --git a/object-file.c b/object-file.c
index 6280e42f34..d566df427a 100644
--- a/object-file.c
+++ b/object-file.c
@@ -439,12 +439,23 @@ int odb_source_loose_read_object_info(struct odb_source *source,
*/
if (!oi || (!oi->typep && !oi->sizep && !oi->contentp)) {
struct stat st;
- if ((!oi || !oi->disk_sizep) && (flags & OBJECT_INFO_QUICK))
- return quick_has_loose(source->loose, oid) ? 0 : -1;
+
+ if ((!oi || !oi->disk_sizep) && (flags & OBJECT_INFO_QUICK)) {
+ status = quick_has_loose(source->loose, oid) ? 0 : -1;
+ if (!status && oi)
+ oi->whence = OI_LOOSE;
+ return status;
+ }
+
if (stat_loose_object(source->loose, oid, &st, &path) < 0)
return -1;
- if (oi && oi->disk_sizep)
- *oi->disk_sizep = st.st_size;
+
+ if (oi) {
+ if (oi->disk_sizep)
+ *oi->disk_sizep = st.st_size;
+ oi->whence = OI_LOOSE;
+ }
+
return 0;
}
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v2 2/7] packfile: always declare object info to be OI_PACKED
2025-12-18 10:54 ` [PATCH v2 0/7] " Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 1/7] object-file: always set OI_LOOSE when " Patrick Steinhardt
@ 2025-12-18 10:54 ` Patrick Steinhardt
2026-01-05 14:29 ` Toon Claes
2025-12-18 10:54 ` [PATCH v2 3/7] packfile: extend `is_delta` field to allow for "unknown" state Patrick Steinhardt
` (4 subsequent siblings)
6 siblings, 1 reply; 58+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 10:54 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
When reading object info via a packfile we yield one of two types:
- The object can either be OI_PACKED, which is what a caller would
typically expect.
- Or it can be OI_DBCACHED if it is stored in the delta base cache.
The latter really is an implementation detail though, and callers
typically don't care at all about the difference. Furthermore, the
information whether or not it is part of the delta base cache can
already be derived via the `is_delta` field, so the fact that we discern
between OI_PACKED and OI_DBCACHED only further complicates the
interface.
There aren't all that many callers that care about the `whence` field in
the first place. In fact, there's only three:
- `packfile_store_read_object_info()` checks for `whence == OI_PACKED`
and then populates the packfile information of the object info
structure. We now start to do this also for deltified objects, which
gives its callers strictly more information.
- `repack_local_links()` wants to determine whether the object is part
of a promisor pack and checks for `whence == OI_PACKED`. If so, it
verifies that the packfile is a promisor pack. It's arguably wrong
to declare that an object is not part of a promisor pack only
because it is stored in the delta base cache.
- `is_not_in_promisor_pack_obj()` does the same, but checks that a
specific object is _not_ part of a promisor pack. The same reasoning
as above applies.
Drop the OI_DBCACHED enum completely. None of the callers seem to care
about the distinction.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.h | 1 -
packfile.c | 3 +--
2 files changed, 1 insertion(+), 3 deletions(-)
diff --git a/odb.h b/odb.h
index 014cd9585a..73b0b87ad5 100644
--- a/odb.h
+++ b/odb.h
@@ -330,7 +330,6 @@ struct object_info {
OI_CACHED,
OI_LOOSE,
OI_PACKED,
- OI_DBCACHED
} whence;
union {
/*
diff --git a/packfile.c b/packfile.c
index 08a0863fc3..b0c6665c87 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1656,8 +1656,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
oidclr(oi->delta_base_oid, p->repo->hash_algo);
}
- oi->whence = in_delta_base_cache(p, obj_offset) ? OI_DBCACHED :
- OI_PACKED;
+ oi->whence = OI_PACKED;
out:
unuse_pack(&w_curs);
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v2 3/7] packfile: extend `is_delta` field to allow for "unknown" state
2025-12-18 10:54 ` [PATCH v2 0/7] " Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 1/7] object-file: always set OI_LOOSE when " Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 2/7] packfile: always declare object info to be OI_PACKED Patrick Steinhardt
@ 2025-12-18 10:54 ` Patrick Steinhardt
2026-01-05 15:35 ` Toon Claes
2025-12-18 10:54 ` [PATCH v2 4/7] packfile: always populate pack-specific info when reading object info Patrick Steinhardt
` (3 subsequent siblings)
6 siblings, 1 reply; 58+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 10:54 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
The `struct object_info::u::packed::is_delta` field determines whether
or not a specific object is stored as a delta. It only stores whether or
not the object is stored as delta, so it is treated as a boolean value.
This boolean is insufficient though: when reading a packed object via
`packfile_store_read_object_info()` we know to skip parsing the actual
object when the user didn't request any object-specific data. In that
case we won't read the object itself, but will only look up its position
in the packfile. Consequently, we do not know whether it is a delta or
not.
This isn't really an issue right now, as the check for an empty request
is broken. But a subsequent commit will fix it, and once we do we will
have the need to also represent an "unknown" delta state.
Prepare for this change by introducing a new enum that encodes the
object type. We don't use the "unknown" state just yet, but will start
to do so in the next commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.h | 7 ++++++-
packfile.c | 17 ++++++++++++++---
2 files changed, 20 insertions(+), 4 deletions(-)
diff --git a/odb.h b/odb.h
index 73b0b87ad5..afae5e5c01 100644
--- a/odb.h
+++ b/odb.h
@@ -343,7 +343,12 @@ struct object_info {
struct {
struct packed_git *pack;
off_t offset;
- unsigned int is_delta;
+ enum packed_object_type {
+ PACKED_OBJECT_TYPE_UNKNOWN,
+ PACKED_OBJECT_TYPE_FULL,
+ PACKED_OBJECT_TYPE_OFS_DELTA,
+ PACKED_OBJECT_TYPE_REF_DELTA,
+ } type;
} packed;
} u;
};
diff --git a/packfile.c b/packfile.c
index b0c6665c87..cc797b2b6a 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2159,8 +2159,18 @@ int packfile_store_read_object_info(struct packfile_store *store,
if (oi->whence == OI_PACKED) {
oi->u.packed.offset = e.offset;
oi->u.packed.pack = e.p;
- oi->u.packed.is_delta = (rtype == OBJ_REF_DELTA ||
- rtype == OBJ_OFS_DELTA);
+
+ switch (rtype) {
+ case OBJ_REF_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
+ break;
+ case OBJ_OFS_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA;
+ break;
+ default:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_FULL;
+ break;
+ }
}
return 0;
@@ -2531,7 +2541,8 @@ int packfile_store_read_object_stream(struct odb_read_stream **out,
oi.sizep = &size;
if (packfile_store_read_object_info(store, oid, &oi, 0) ||
- oi.u.packed.is_delta ||
+ oi.u.packed.type == PACKED_OBJECT_TYPE_REF_DELTA ||
+ oi.u.packed.type == PACKED_OBJECT_TYPE_OFS_DELTA ||
repo_settings_get_big_file_threshold(store->odb->repo) >= size)
return -1;
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v2 4/7] packfile: always populate pack-specific info when reading object info
2025-12-18 10:54 ` [PATCH v2 0/7] " Patrick Steinhardt
` (2 preceding siblings ...)
2025-12-18 10:54 ` [PATCH v2 3/7] packfile: extend `is_delta` field to allow for "unknown" state Patrick Steinhardt
@ 2025-12-18 10:54 ` Patrick Steinhardt
2025-12-30 17:03 ` Kristoffer Haugsbakk
2025-12-18 10:54 ` [PATCH v2 5/7] packfile: disentangle return value of `packed_object_info()` Patrick Steinhardt
` (2 subsequent siblings)
6 siblings, 1 reply; 58+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 10:54 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
When reading object information via `packed_object_info()` we may not
populate the object info's packfile-specific fields. This leads to
inconsistent object info depending on whether the info was populated via
`packfile_store_read_object_info()` or `packed_object_info()`.
Fix this inconsistecny so that we can always assume the pack info to be
populated when reading object info from a pack.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 31 ++++++++++++++-----------------
1 file changed, 14 insertions(+), 17 deletions(-)
diff --git a/packfile.c b/packfile.c
index cc797b2b6a..f7c33a2f77 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1657,6 +1657,20 @@ int packed_object_info(struct repository *r, struct packed_git *p,
}
oi->whence = OI_PACKED;
+ oi->u.packed.offset = obj_offset;
+ oi->u.packed.pack = p;
+
+ switch (type) {
+ case OBJ_REF_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
+ break;
+ case OBJ_OFS_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA;
+ break;
+ default:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_FULL;
+ break;
+ }
out:
unuse_pack(&w_curs);
@@ -2156,23 +2170,6 @@ int packfile_store_read_object_info(struct packfile_store *store,
return -1;
}
- if (oi->whence == OI_PACKED) {
- oi->u.packed.offset = e.offset;
- oi->u.packed.pack = e.p;
-
- switch (rtype) {
- case OBJ_REF_DELTA:
- oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
- break;
- case OBJ_OFS_DELTA:
- oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA;
- break;
- default:
- oi->u.packed.type = PACKED_OBJECT_TYPE_FULL;
- break;
- }
- }
-
return 0;
}
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v2 5/7] packfile: disentangle return value of `packed_object_info()`
2025-12-18 10:54 ` [PATCH v2 0/7] " Patrick Steinhardt
` (3 preceding siblings ...)
2025-12-18 10:54 ` [PATCH v2 4/7] packfile: always populate pack-specific info when reading object info Patrick Steinhardt
@ 2025-12-18 10:54 ` Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 6/7] packfile: skip unpacking object header for disk size requests Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 7/7] packfile: drop repository parameter from `packed_object_info()` Patrick Steinhardt
6 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 10:54 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
The `packed_object_info()` function returns the type of the packed
object. While we use an `enum object_type` to store the return value,
this type is not to be confused with the actual object type. It _may_
contain the object type, but it may just as well encode that the given
packed object is stored as a delta.
We have removed the only caller that relied on this returned object type
in the preceding commit, so let's simplify semantics and return either 0
on success or a negative error code otherwise.
This unblocks a small optimization where we can skip reading the object
type altogether.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 21 ++++++++++++---------
packfile.h | 4 ++++
2 files changed, 16 insertions(+), 9 deletions(-)
diff --git a/packfile.c b/packfile.c
index f7c33a2f77..8c6ef45a67 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1587,6 +1587,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
unsigned long size;
off_t curpos = obj_offset;
enum object_type type;
+ int ret;
/*
* We always get the representation type, but only convert it to
@@ -1607,12 +1608,12 @@ int packed_object_info(struct repository *r, struct packed_git *p,
off_t base_offset = get_delta_base(p, &w_curs, &tmp_pos,
type, obj_offset);
if (!base_offset) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
*oi->sizep = get_size_from_delta(p, &w_curs, tmp_pos);
if (*oi->sizep == 0) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
} else {
@@ -1625,7 +1626,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (offset_to_pack_pos(p, obj_offset, &pos) < 0) {
error("could not find object at offset %"PRIuMAX" "
"in pack %s", (uintmax_t)obj_offset, p->pack_name);
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
@@ -1639,7 +1640,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (oi->typep)
*oi->typep = ptot;
if (ptot < 0) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
}
@@ -1649,7 +1650,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (get_delta_base_oid(p, &w_curs, curpos,
oi->delta_base_oid,
type, obj_offset) < 0) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
} else
@@ -1672,9 +1673,11 @@ int packed_object_info(struct repository *r, struct packed_git *p,
break;
}
+ ret = 0;
+
out:
unuse_pack(&w_curs);
- return type;
+ return ret;
}
static void *unpack_compressed_entry(struct packed_git *p,
@@ -2152,7 +2155,7 @@ int packfile_store_read_object_info(struct packfile_store *store,
unsigned flags UNUSED)
{
struct pack_entry e;
- int rtype;
+ int ret;
if (!find_pack_entry(store->odb->repo, oid, &e))
return 1;
@@ -2164,8 +2167,8 @@ int packfile_store_read_object_info(struct packfile_store *store,
if (!oi)
return 0;
- rtype = packed_object_info(store->odb->repo, e.p, e.offset, oi);
- if (rtype < 0) {
+ ret = packed_object_info(store->odb->repo, e.p, e.offset, oi);
+ if (ret < 0) {
mark_bad_packed_object(e.p, oid);
return -1;
}
diff --git a/packfile.h b/packfile.h
index 59d162a3f4..07f5bfbc4f 100644
--- a/packfile.h
+++ b/packfile.h
@@ -378,6 +378,10 @@ void release_pack_memory(size_t);
/* global flag to enable extra checks when accessing packed objects */
extern int do_check_packed_object_crc;
+/*
+ * Look up the object info for a specific offset in the packfile.
+ * success, a negative error code otherwise.
+ */
int packed_object_info(struct repository *r,
struct packed_git *pack,
off_t offset, struct object_info *);
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v2 6/7] packfile: skip unpacking object header for disk size requests
2025-12-18 10:54 ` [PATCH v2 0/7] " Patrick Steinhardt
` (4 preceding siblings ...)
2025-12-18 10:54 ` [PATCH v2 5/7] packfile: disentangle return value of `packed_object_info()` Patrick Steinhardt
@ 2025-12-18 10:54 ` Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 7/7] packfile: drop repository parameter from `packed_object_info()` Patrick Steinhardt
6 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 10:54 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
While most of the object info requests for a packed object require us to
unpack its headers, reading its disk size doesn't. We still unpack the
object header in that case though, which is unnecessary work.
Skip reading the header if only the disk size is requested. This leads
to a small speedup when reading disk size, only. The following benchmark
was done in the Git repository:
Benchmark 1: ./git rev-list --disk-usage HEAD (rev = HEAD~)
Time (mean ± σ): 105.2 ms ± 0.6 ms [User: 91.4 ms, System: 13.3 ms]
Range (min … max): 103.7 ms … 106.0 ms 27 runs
Benchmark 2: ./git rev-list --disk-usage HEAD (rev = HEAD)
Time (mean ± σ): 96.7 ms ± 0.4 ms [User: 86.2 ms, System: 10.0 ms]
Range (min … max): 96.2 ms … 98.1 ms 30 runs
Summary
./git rev-list --disk-usage HEAD (rev = HEAD) ran
1.09 ± 0.01 times faster than ./git rev-list --disk-usage HEAD (rev = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/packfile.c b/packfile.c
index 8c6ef45a67..a2ba237ce7 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1586,7 +1586,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
struct pack_window *w_curs = NULL;
unsigned long size;
off_t curpos = obj_offset;
- enum object_type type;
+ enum object_type type = OBJ_NONE;
int ret;
/*
@@ -1598,7 +1598,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
&type);
if (!*oi->contentp)
type = OBJ_BAD;
- } else {
+ } else if (oi->sizep || oi->typep || oi->delta_base_oid) {
type = unpack_object_header(p, &w_curs, &curpos, &size);
}
@@ -1662,6 +1662,9 @@ int packed_object_info(struct repository *r, struct packed_git *p,
oi->u.packed.pack = p;
switch (type) {
+ case OBJ_NONE:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_UNKNOWN;
+ break;
case OBJ_REF_DELTA:
oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
break;
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v2 7/7] packfile: drop repository parameter from `packed_object_info()`
2025-12-18 10:54 ` [PATCH v2 0/7] " Patrick Steinhardt
` (5 preceding siblings ...)
2025-12-18 10:54 ` [PATCH v2 6/7] packfile: skip unpacking object header for disk size requests Patrick Steinhardt
@ 2025-12-18 10:54 ` Patrick Steinhardt
6 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 10:54 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
The function `packed_object_info()` takes a packfile and offset and
returns the object info for the corresponding object. Despite these two
parameters though it also takes a repository pointer. This is redundant
information though, as `struct packed_git` already has a repository
pointer that is always populated.
Drop the redundant parameter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/cat-file.c | 3 +--
builtin/pack-objects.c | 4 ++--
commit-graph.c | 2 +-
pack-bitmap.c | 3 +--
packfile.c | 8 ++++----
packfile.h | 3 +--
6 files changed, 10 insertions(+), 13 deletions(-)
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 505ddaa12f..2ad712e9f8 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -487,8 +487,7 @@ static void batch_object_write(const char *obj_name,
data->info.sizep = &data->size;
if (pack)
- ret = packed_object_info(the_repository, pack,
- offset, &data->info);
+ ret = packed_object_info(pack, offset, &data->info);
else
ret = odb_read_object_info_extended(the_repository->objects,
&data->oid, &data->info,
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 1ce8d6ee21..85762f8c4f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2411,7 +2411,7 @@ static void drop_reused_delta(struct object_entry *entry)
oi.sizep = &size;
oi.typep = &type;
- if (packed_object_info(the_repository, IN_PACK(entry), entry->in_pack_offset, &oi) < 0) {
+ if (packed_object_info(IN_PACK(entry), entry->in_pack_offset, &oi) < 0) {
/*
* We failed to get the info from this pack for some reason;
* fall back to odb_read_object_info, which may find another copy.
@@ -3748,7 +3748,7 @@ static int add_object_entry_from_pack(const struct object_id *oid,
struct object_info oi = OBJECT_INFO_INIT;
oi.typep = &type;
- if (packed_object_info(the_repository, p, ofs, &oi) < 0) {
+ if (packed_object_info(p, ofs, &oi) < 0) {
die(_("could not get type of object %s in pack %s"),
oid_to_hex(oid), p->pack_name);
} else if (type == OBJ_COMMIT) {
diff --git a/commit-graph.c b/commit-graph.c
index 80be2ff2c3..f572670bd0 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1499,7 +1499,7 @@ static int add_packed_commits(const struct object_id *oid,
display_progress(ctx->progress, ++ctx->progress_done);
oi.typep = &type;
- if (packed_object_info(ctx->r, pack, offset, &oi) < 0)
+ if (packed_object_info(pack, offset, &oi) < 0)
die(_("unable to get type of object %s"), oid_to_hex(oid));
if (type != OBJ_COMMIT)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 8ca79725b1..972203f12b 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1876,8 +1876,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
ofs = pack_pos_to_offset(pack, pos);
}
- if (packed_object_info(bitmap_repo(bitmap_git), pack, ofs,
- &oi) < 0) {
+ if (packed_object_info(pack, ofs, &oi) < 0) {
struct object_id oid;
nth_bitmap_object_oid(bitmap_git, &oid,
pack_pos_to_index(pack, pos));
diff --git a/packfile.c b/packfile.c
index a2ba237ce7..39899aec49 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1580,7 +1580,7 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
hashmap_add(&delta_base_cache, &ent->ent);
}
-int packed_object_info(struct repository *r, struct packed_git *p,
+int packed_object_info(struct packed_git *p,
off_t obj_offset, struct object_info *oi)
{
struct pack_window *w_curs = NULL;
@@ -1594,7 +1594,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
* a "real" type later if the caller is interested.
*/
if (oi->contentp) {
- *oi->contentp = cache_or_unpack_entry(r, p, obj_offset, oi->sizep,
+ *oi->contentp = cache_or_unpack_entry(p->repo, p, obj_offset, oi->sizep,
&type);
if (!*oi->contentp)
type = OBJ_BAD;
@@ -1635,7 +1635,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (oi->typep) {
enum object_type ptot;
- ptot = packed_to_object_type(r, p, obj_offset,
+ ptot = packed_to_object_type(p->repo, p, obj_offset,
type, &w_curs, curpos);
if (oi->typep)
*oi->typep = ptot;
@@ -2170,7 +2170,7 @@ int packfile_store_read_object_info(struct packfile_store *store,
if (!oi)
return 0;
- ret = packed_object_info(store->odb->repo, e.p, e.offset, oi);
+ ret = packed_object_info(e.p, e.offset, oi);
if (ret < 0) {
mark_bad_packed_object(e.p, oid);
return -1;
diff --git a/packfile.h b/packfile.h
index 07f5bfbc4f..573d06f6ba 100644
--- a/packfile.h
+++ b/packfile.h
@@ -382,8 +382,7 @@ extern int do_check_packed_object_crc;
* Look up the object info for a specific offset in the packfile.
* success, a negative error code otherwise.
*/
-int packed_object_info(struct repository *r,
- struct packed_git *pack,
+int packed_object_info(struct packed_git *pack,
off_t offset, struct object_info *);
void mark_bad_packed_object(struct packed_git *, const struct object_id *);
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH v2 4/7] packfile: always populate pack-specific info when reading object info
2025-12-18 10:54 ` [PATCH v2 4/7] packfile: always populate pack-specific info when reading object info Patrick Steinhardt
@ 2025-12-30 17:03 ` Kristoffer Haugsbakk
2026-01-05 11:38 ` Patrick Steinhardt
0 siblings, 1 reply; 58+ messages in thread
From: Kristoffer Haugsbakk @ 2025-12-30 17:03 UTC (permalink / raw)
To: Patrick Steinhardt, git; +Cc: Junio C Hamano, Aaron Plattner
On Thu, Dec 18, 2025, at 11:54, Patrick Steinhardt wrote:
> When reading object information via `packed_object_info()` we may not
> populate the object info's packfile-specific fields. This leads to
> inconsistent object info depending on whether the info was populated via
> `packfile_store_read_object_info()` or `packed_object_info()`.
>
> Fix this inconsistecny so that we can always assume the pack info to be
s/inconsistecny/inconsistency/
> populated when reading object info from a pack.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
>[snip]
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v2 4/7] packfile: always populate pack-specific info when reading object info
2025-12-30 17:03 ` Kristoffer Haugsbakk
@ 2026-01-05 11:38 ` Patrick Steinhardt
0 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-05 11:38 UTC (permalink / raw)
To: Kristoffer Haugsbakk; +Cc: git, Junio C Hamano, Aaron Plattner
On Tue, Dec 30, 2025 at 06:03:24PM +0100, Kristoffer Haugsbakk wrote:
> On Thu, Dec 18, 2025, at 11:54, Patrick Steinhardt wrote:
> > When reading object information via `packed_object_info()` we may not
> > populate the object info's packfile-specific fields. This leads to
> > inconsistent object info depending on whether the info was populated via
> > `packfile_store_read_object_info()` or `packed_object_info()`.
> >
> > Fix this inconsistecny so that we can always assume the pack info to be
>
> s/inconsistecny/inconsistency/
Thanks, I've queued this change locally now. I'll hold off sending a new
iteration for now though.
Patrick
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v2 2/7] packfile: always declare object info to be OI_PACKED
2025-12-18 10:54 ` [PATCH v2 2/7] packfile: always declare object info to be OI_PACKED Patrick Steinhardt
@ 2026-01-05 14:29 ` Toon Claes
0 siblings, 0 replies; 58+ messages in thread
From: Toon Claes @ 2026-01-05 14:29 UTC (permalink / raw)
To: Patrick Steinhardt, git; +Cc: Junio C Hamano, Aaron Plattner
Patrick Steinhardt <ps@pks.im> writes:
> When reading object info via a packfile we yield one of two types:
>
> - The object can either be OI_PACKED, which is what a caller would
> typically expect.
>
> - Or it can be OI_DBCACHED if it is stored in the delta base cache.
>
> The latter really is an implementation detail though, and callers
> typically don't care at all about the difference. Furthermore, the
> information whether or not it is part of the delta base cache can
> already be derived via the `is_delta` field, so the fact that we discern
> between OI_PACKED and OI_DBCACHED only further complicates the
> interface.
>
> There aren't all that many callers that care about the `whence` field in
> the first place. In fact, there's only three:
>
> - `packfile_store_read_object_info()` checks for `whence == OI_PACKED`
> and then populates the packfile information of the object info
> structure. We now start to do this also for deltified objects, which
> gives its callers strictly more information.
>
> - `repack_local_links()` wants to determine whether the object is part
> of a promisor pack and checks for `whence == OI_PACKED`. If so, it
> verifies that the packfile is a promisor pack. It's arguably wrong
> to declare that an object is not part of a promisor pack only
> because it is stored in the delta base cache.
>
> - `is_not_in_promisor_pack_obj()` does the same, but checks that a
> specific object is _not_ part of a promisor pack. The same reasoning
> as above applies.
>
> Drop the OI_DBCACHED enum completely. None of the callers seem to care
> about the distinction.
Thanks for clarifying these. I agree it makes sense to drop it.
--
Cheers,
Toon
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v2 3/7] packfile: extend `is_delta` field to allow for "unknown" state
2025-12-18 10:54 ` [PATCH v2 3/7] packfile: extend `is_delta` field to allow for "unknown" state Patrick Steinhardt
@ 2026-01-05 15:35 ` Toon Claes
2026-01-06 6:34 ` Patrick Steinhardt
0 siblings, 1 reply; 58+ messages in thread
From: Toon Claes @ 2026-01-05 15:35 UTC (permalink / raw)
To: Patrick Steinhardt, git; +Cc: Junio C Hamano, Aaron Plattner
Patrick Steinhardt <ps@pks.im> writes:
> The `struct object_info::u::packed::is_delta` field determines whether
> or not a specific object is stored as a delta. It only stores whether or
> not the object is stored as delta, so it is treated as a boolean value.
>
> This boolean is insufficient though: when reading a packed object via
> `packfile_store_read_object_info()` we know to skip parsing the actual
> object when the user didn't request any object-specific data. In that
> case we won't read the object itself, but will only look up its position
> in the packfile. Consequently, we do not know whether it is a delta or
> not.
This explains why you're introducing "unknown", but I'm having trouble
understanding why we need distinction between ofs-delta and ref-delta?
(To any other reader: If you want to know what those two are, check
"Deltified representation" in Documentation/gitformat-pack.adoc)
> This isn't really an issue right now, as the check for an empty request
> is broken. But a subsequent commit will fix it, and once we do we will
> have the need to also represent an "unknown" delta state.
>
> Prepare for this change by introducing a new enum that encodes the
> object type. We don't use the "unknown" state just yet, but will start
> to do so in the next commit.
A little bit confusing this "next commit" is [PATCH 6/7], but that's a
note to any other reader and not so much a nitpick worth addressing.
--
Cheers,
Toon
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v2 3/7] packfile: extend `is_delta` field to allow for "unknown" state
2026-01-05 15:35 ` Toon Claes
@ 2026-01-06 6:34 ` Patrick Steinhardt
0 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-06 6:34 UTC (permalink / raw)
To: Toon Claes; +Cc: git, Junio C Hamano, Aaron Plattner
On Mon, Jan 05, 2026 at 04:35:57PM +0100, Toon Claes wrote:
> Patrick Steinhardt <ps@pks.im> writes:
>
> > The `struct object_info::u::packed::is_delta` field determines whether
> > or not a specific object is stored as a delta. It only stores whether or
> > not the object is stored as delta, so it is treated as a boolean value.
> >
> > This boolean is insufficient though: when reading a packed object via
> > `packfile_store_read_object_info()` we know to skip parsing the actual
> > object when the user didn't request any object-specific data. In that
> > case we won't read the object itself, but will only look up its position
> > in the packfile. Consequently, we do not know whether it is a delta or
> > not.
>
> This explains why you're introducing "unknown", but I'm having trouble
> understanding why we need distinction between ofs-delta and ref-delta?
>
> (To any other reader: If you want to know what those two are, check
> "Deltified representation" in Documentation/gitformat-pack.adoc)
Good question, and indeed we don't need that information at any
callsite right now. We do discern the delta type in various locations,
but mostly do so internally at "packfile.c" so that we know how to read
the given object. And there we rely on the `OBJ_REF_DELTA` and
`OBJ_OFS_DELTA` types.
We could of course adapt this to only use `PACKED_OBJECT_TYPE_DELTA`.
But we already have the information readily available at our figertips,
and over time I'd ideally rather want to get rid of the `OBJ_*_DELTA`
values as they leak internal implementation details of the packfile
store into the generic object interfaces. That's way down the road, but
by keeping around the information now it makes such a later conversion
easier.
> > This isn't really an issue right now, as the check for an empty request
> > is broken. But a subsequent commit will fix it, and once we do we will
> > have the need to also represent an "unknown" delta state.
> >
> > Prepare for this change by introducing a new enum that encodes the
> > object type. We don't use the "unknown" state just yet, but will start
> > to do so in the next commit.
>
> A little bit confusing this "next commit" is [PATCH 6/7], but that's a
> note to any other reader and not so much a nitpick worth addressing.
Fair. I've fixed this up locally to say "subsequent commit".
Thanks!
Patrick
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH v3 0/7] Improvements for reading object info
2025-12-18 6:28 [PATCH 0/8] Improvements for reading object info Patrick Steinhardt
` (9 preceding siblings ...)
2025-12-18 10:54 ` [PATCH v2 0/7] " Patrick Steinhardt
@ 2026-01-06 6:54 ` Patrick Steinhardt
2026-01-06 6:54 ` [PATCH v3 1/7] object-file: always set OI_LOOSE when " Patrick Steinhardt
` (7 more replies)
2026-01-07 13:07 ` [PATCH v4 " Patrick Steinhardt
2026-01-12 9:00 ` [PATCH v5 " Patrick Steinhardt
12 siblings, 8 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-06 6:54 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes
Hi,
this patch series contains various small improvements for reading object
info for either loose or packed objects. These improvements were split
out of a larger patch series where I'm about to introduce a new generic
`odb_for_each_object()` function.
Changes in v3:
- Fix a commit message typo.
- Fix a function comment missing some words.
- Link to v2: https://lore.kernel.org/r/20251218-b4-pks-odb-read-object-info-improvements-v2-0-62e3e49072bc@pks.im
Changes in v2:
- Rebase the series on top of master with jc/object-read-stream-fix
merged into it. I've also evicted the patch that fixes the same
underlying issue.
- Improve the commit message that drops OI_DBCACHED to explain why
this is a safe refactoring.
- Link to v1: https://lore.kernel.org/r/20251218-b4-pks-odb-read-object-info-improvements-v1-0-81c8368492be@pks.im
Thanks!
Patrick
---
Patrick Steinhardt (7):
object-file: always set OI_LOOSE when reading object info
packfile: always declare object info to be OI_PACKED
packfile: extend `is_delta` field to allow for "unknown" state
packfile: always populate pack-specific info when reading object info
packfile: disentangle return value of `packed_object_info()`
packfile: skip unpacking object header for disk size requests
packfile: drop repository parameter from `packed_object_info()`
builtin/cat-file.c | 3 +--
builtin/pack-objects.c | 4 ++--
commit-graph.c | 2 +-
object-file.c | 19 ++++++++++++----
odb.h | 8 +++++--
pack-bitmap.c | 3 +--
packfile.c | 61 ++++++++++++++++++++++++++++++--------------------
packfile.h | 7 ++++--
8 files changed, 68 insertions(+), 39 deletions(-)
Range-diff versus v2:
1: 8b6b891c2f = 1: 9efc7d00c1 object-file: always set OI_LOOSE when reading object info
2: b83dd3d689 = 2: efd29f0e27 packfile: always declare object info to be OI_PACKED
3: 6815b23dd7 ! 3: 1448cd37b3 packfile: extend `is_delta` field to allow for "unknown" state
@@ Commit message
Prepare for this change by introducing a new enum that encodes the
object type. We don't use the "unknown" state just yet, but will start
- to do so in the next commit.
+ to do so in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
4: ec18d71d07 ! 4: 8eb063df04 packfile: always populate pack-specific info when reading object info
@@ Commit message
inconsistent object info depending on whether the info was populated via
`packfile_store_read_object_info()` or `packed_object_info()`.
- Fix this inconsistecny so that we can always assume the pack info to be
+ Fix this inconsistency so that we can always assume the pack info to be
populated when reading object info from a pack.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
5: cc0694bb0f ! 5: 33582ef5e0 packfile: disentangle return value of `packed_object_info()`
@@ packfile.h: void release_pack_memory(size_t);
+/*
+ * Look up the object info for a specific offset in the packfile.
-+ * success, a negative error code otherwise.
++ * Returns zero on success, a negative error code otherwise.
+ */
int packed_object_info(struct repository *r,
struct packed_git *pack,
6: 98eee570b8 = 6: 42ec9b8170 packfile: skip unpacking object header for disk size requests
7: 9c9e71d7b2 ! 7: 03bc55e74f packfile: drop repository parameter from `packed_object_info()`
@@ packfile.c: int packfile_store_read_object_info(struct packfile_store *store,
## packfile.h ##
@@ packfile.h: extern int do_check_packed_object_crc;
* Look up the object info for a specific offset in the packfile.
- * success, a negative error code otherwise.
+ * Returns zero on success, a negative error code otherwise.
*/
-int packed_object_info(struct repository *r,
- struct packed_git *pack,
---
base-commit: 7df68b50e49b6a1b576abb19b2e5d457749bc28b
change-id: 20251215-b4-pks-odb-read-object-info-improvements-0e031ef827d2
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH v3 1/7] object-file: always set OI_LOOSE when reading object info
2026-01-06 6:54 ` [PATCH v3 0/7] Improvements for reading object info Patrick Steinhardt
@ 2026-01-06 6:54 ` Patrick Steinhardt
2026-01-07 8:50 ` Karthik Nayak
2026-01-06 6:54 ` [PATCH v3 2/7] packfile: always declare object info to be OI_PACKED Patrick Steinhardt
` (6 subsequent siblings)
7 siblings, 1 reply; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-06 6:54 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes
There are some early returns in ``odb_source_loose_read_object_info()`
in cases where we don't have to open the loose object. These return
paths do not set `struct object_info::whence` to `OI_LOOSE` though, so
it becomes impossible for the caller to tell the format of such an
object.
Nobody seems to care about this right now, but it's a bug waiting to
happen. Fix this by always setting `whence` on success.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
object-file.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)
diff --git a/object-file.c b/object-file.c
index 6280e42f34..d566df427a 100644
--- a/object-file.c
+++ b/object-file.c
@@ -439,12 +439,23 @@ int odb_source_loose_read_object_info(struct odb_source *source,
*/
if (!oi || (!oi->typep && !oi->sizep && !oi->contentp)) {
struct stat st;
- if ((!oi || !oi->disk_sizep) && (flags & OBJECT_INFO_QUICK))
- return quick_has_loose(source->loose, oid) ? 0 : -1;
+
+ if ((!oi || !oi->disk_sizep) && (flags & OBJECT_INFO_QUICK)) {
+ status = quick_has_loose(source->loose, oid) ? 0 : -1;
+ if (!status && oi)
+ oi->whence = OI_LOOSE;
+ return status;
+ }
+
if (stat_loose_object(source->loose, oid, &st, &path) < 0)
return -1;
- if (oi && oi->disk_sizep)
- *oi->disk_sizep = st.st_size;
+
+ if (oi) {
+ if (oi->disk_sizep)
+ *oi->disk_sizep = st.st_size;
+ oi->whence = OI_LOOSE;
+ }
+
return 0;
}
--
2.52.0.508.g883dcfc63e.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v3 2/7] packfile: always declare object info to be OI_PACKED
2026-01-06 6:54 ` [PATCH v3 0/7] Improvements for reading object info Patrick Steinhardt
2026-01-06 6:54 ` [PATCH v3 1/7] object-file: always set OI_LOOSE when " Patrick Steinhardt
@ 2026-01-06 6:54 ` Patrick Steinhardt
2026-01-06 6:54 ` [PATCH v3 3/7] packfile: extend `is_delta` field to allow for "unknown" state Patrick Steinhardt
` (5 subsequent siblings)
7 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-06 6:54 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes
When reading object info via a packfile we yield one of two types:
- The object can either be OI_PACKED, which is what a caller would
typically expect.
- Or it can be OI_DBCACHED if it is stored in the delta base cache.
The latter really is an implementation detail though, and callers
typically don't care at all about the difference. Furthermore, the
information whether or not it is part of the delta base cache can
already be derived via the `is_delta` field, so the fact that we discern
between OI_PACKED and OI_DBCACHED only further complicates the
interface.
There aren't all that many callers that care about the `whence` field in
the first place. In fact, there's only three:
- `packfile_store_read_object_info()` checks for `whence == OI_PACKED`
and then populates the packfile information of the object info
structure. We now start to do this also for deltified objects, which
gives its callers strictly more information.
- `repack_local_links()` wants to determine whether the object is part
of a promisor pack and checks for `whence == OI_PACKED`. If so, it
verifies that the packfile is a promisor pack. It's arguably wrong
to declare that an object is not part of a promisor pack only
because it is stored in the delta base cache.
- `is_not_in_promisor_pack_obj()` does the same, but checks that a
specific object is _not_ part of a promisor pack. The same reasoning
as above applies.
Drop the OI_DBCACHED enum completely. None of the callers seem to care
about the distinction.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.h | 1 -
packfile.c | 3 +--
2 files changed, 1 insertion(+), 3 deletions(-)
diff --git a/odb.h b/odb.h
index 014cd9585a..73b0b87ad5 100644
--- a/odb.h
+++ b/odb.h
@@ -330,7 +330,6 @@ struct object_info {
OI_CACHED,
OI_LOOSE,
OI_PACKED,
- OI_DBCACHED
} whence;
union {
/*
diff --git a/packfile.c b/packfile.c
index 08a0863fc3..b0c6665c87 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1656,8 +1656,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
oidclr(oi->delta_base_oid, p->repo->hash_algo);
}
- oi->whence = in_delta_base_cache(p, obj_offset) ? OI_DBCACHED :
- OI_PACKED;
+ oi->whence = OI_PACKED;
out:
unuse_pack(&w_curs);
--
2.52.0.508.g883dcfc63e.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v3 3/7] packfile: extend `is_delta` field to allow for "unknown" state
2026-01-06 6:54 ` [PATCH v3 0/7] Improvements for reading object info Patrick Steinhardt
2026-01-06 6:54 ` [PATCH v3 1/7] object-file: always set OI_LOOSE when " Patrick Steinhardt
2026-01-06 6:54 ` [PATCH v3 2/7] packfile: always declare object info to be OI_PACKED Patrick Steinhardt
@ 2026-01-06 6:54 ` Patrick Steinhardt
2026-01-07 10:12 ` Karthik Nayak
2026-01-06 6:55 ` [PATCH v3 4/7] packfile: always populate pack-specific info when reading object info Patrick Steinhardt
` (4 subsequent siblings)
7 siblings, 1 reply; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-06 6:54 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes
The `struct object_info::u::packed::is_delta` field determines whether
or not a specific object is stored as a delta. It only stores whether or
not the object is stored as delta, so it is treated as a boolean value.
This boolean is insufficient though: when reading a packed object via
`packfile_store_read_object_info()` we know to skip parsing the actual
object when the user didn't request any object-specific data. In that
case we won't read the object itself, but will only look up its position
in the packfile. Consequently, we do not know whether it is a delta or
not.
This isn't really an issue right now, as the check for an empty request
is broken. But a subsequent commit will fix it, and once we do we will
have the need to also represent an "unknown" delta state.
Prepare for this change by introducing a new enum that encodes the
object type. We don't use the "unknown" state just yet, but will start
to do so in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.h | 7 ++++++-
packfile.c | 17 ++++++++++++++---
2 files changed, 20 insertions(+), 4 deletions(-)
diff --git a/odb.h b/odb.h
index 73b0b87ad5..afae5e5c01 100644
--- a/odb.h
+++ b/odb.h
@@ -343,7 +343,12 @@ struct object_info {
struct {
struct packed_git *pack;
off_t offset;
- unsigned int is_delta;
+ enum packed_object_type {
+ PACKED_OBJECT_TYPE_UNKNOWN,
+ PACKED_OBJECT_TYPE_FULL,
+ PACKED_OBJECT_TYPE_OFS_DELTA,
+ PACKED_OBJECT_TYPE_REF_DELTA,
+ } type;
} packed;
} u;
};
diff --git a/packfile.c b/packfile.c
index b0c6665c87..cc797b2b6a 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2159,8 +2159,18 @@ int packfile_store_read_object_info(struct packfile_store *store,
if (oi->whence == OI_PACKED) {
oi->u.packed.offset = e.offset;
oi->u.packed.pack = e.p;
- oi->u.packed.is_delta = (rtype == OBJ_REF_DELTA ||
- rtype == OBJ_OFS_DELTA);
+
+ switch (rtype) {
+ case OBJ_REF_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
+ break;
+ case OBJ_OFS_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA;
+ break;
+ default:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_FULL;
+ break;
+ }
}
return 0;
@@ -2531,7 +2541,8 @@ int packfile_store_read_object_stream(struct odb_read_stream **out,
oi.sizep = &size;
if (packfile_store_read_object_info(store, oid, &oi, 0) ||
- oi.u.packed.is_delta ||
+ oi.u.packed.type == PACKED_OBJECT_TYPE_REF_DELTA ||
+ oi.u.packed.type == PACKED_OBJECT_TYPE_OFS_DELTA ||
repo_settings_get_big_file_threshold(store->odb->repo) >= size)
return -1;
--
2.52.0.508.g883dcfc63e.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v3 4/7] packfile: always populate pack-specific info when reading object info
2026-01-06 6:54 ` [PATCH v3 0/7] Improvements for reading object info Patrick Steinhardt
` (2 preceding siblings ...)
2026-01-06 6:54 ` [PATCH v3 3/7] packfile: extend `is_delta` field to allow for "unknown" state Patrick Steinhardt
@ 2026-01-06 6:55 ` Patrick Steinhardt
2026-01-06 6:55 ` [PATCH v3 5/7] packfile: disentangle return value of `packed_object_info()` Patrick Steinhardt
` (3 subsequent siblings)
7 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-06 6:55 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes
When reading object information via `packed_object_info()` we may not
populate the object info's packfile-specific fields. This leads to
inconsistent object info depending on whether the info was populated via
`packfile_store_read_object_info()` or `packed_object_info()`.
Fix this inconsistency so that we can always assume the pack info to be
populated when reading object info from a pack.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 31 ++++++++++++++-----------------
1 file changed, 14 insertions(+), 17 deletions(-)
diff --git a/packfile.c b/packfile.c
index cc797b2b6a..f7c33a2f77 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1657,6 +1657,20 @@ int packed_object_info(struct repository *r, struct packed_git *p,
}
oi->whence = OI_PACKED;
+ oi->u.packed.offset = obj_offset;
+ oi->u.packed.pack = p;
+
+ switch (type) {
+ case OBJ_REF_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
+ break;
+ case OBJ_OFS_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA;
+ break;
+ default:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_FULL;
+ break;
+ }
out:
unuse_pack(&w_curs);
@@ -2156,23 +2170,6 @@ int packfile_store_read_object_info(struct packfile_store *store,
return -1;
}
- if (oi->whence == OI_PACKED) {
- oi->u.packed.offset = e.offset;
- oi->u.packed.pack = e.p;
-
- switch (rtype) {
- case OBJ_REF_DELTA:
- oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
- break;
- case OBJ_OFS_DELTA:
- oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA;
- break;
- default:
- oi->u.packed.type = PACKED_OBJECT_TYPE_FULL;
- break;
- }
- }
-
return 0;
}
--
2.52.0.508.g883dcfc63e.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v3 5/7] packfile: disentangle return value of `packed_object_info()`
2026-01-06 6:54 ` [PATCH v3 0/7] Improvements for reading object info Patrick Steinhardt
` (3 preceding siblings ...)
2026-01-06 6:55 ` [PATCH v3 4/7] packfile: always populate pack-specific info when reading object info Patrick Steinhardt
@ 2026-01-06 6:55 ` Patrick Steinhardt
2026-01-06 6:55 ` [PATCH v3 6/7] packfile: skip unpacking object header for disk size requests Patrick Steinhardt
` (2 subsequent siblings)
7 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-06 6:55 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes
The `packed_object_info()` function returns the type of the packed
object. While we use an `enum object_type` to store the return value,
this type is not to be confused with the actual object type. It _may_
contain the object type, but it may just as well encode that the given
packed object is stored as a delta.
We have removed the only caller that relied on this returned object type
in the preceding commit, so let's simplify semantics and return either 0
on success or a negative error code otherwise.
This unblocks a small optimization where we can skip reading the object
type altogether.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 21 ++++++++++++---------
packfile.h | 4 ++++
2 files changed, 16 insertions(+), 9 deletions(-)
diff --git a/packfile.c b/packfile.c
index f7c33a2f77..8c6ef45a67 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1587,6 +1587,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
unsigned long size;
off_t curpos = obj_offset;
enum object_type type;
+ int ret;
/*
* We always get the representation type, but only convert it to
@@ -1607,12 +1608,12 @@ int packed_object_info(struct repository *r, struct packed_git *p,
off_t base_offset = get_delta_base(p, &w_curs, &tmp_pos,
type, obj_offset);
if (!base_offset) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
*oi->sizep = get_size_from_delta(p, &w_curs, tmp_pos);
if (*oi->sizep == 0) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
} else {
@@ -1625,7 +1626,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (offset_to_pack_pos(p, obj_offset, &pos) < 0) {
error("could not find object at offset %"PRIuMAX" "
"in pack %s", (uintmax_t)obj_offset, p->pack_name);
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
@@ -1639,7 +1640,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (oi->typep)
*oi->typep = ptot;
if (ptot < 0) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
}
@@ -1649,7 +1650,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (get_delta_base_oid(p, &w_curs, curpos,
oi->delta_base_oid,
type, obj_offset) < 0) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
} else
@@ -1672,9 +1673,11 @@ int packed_object_info(struct repository *r, struct packed_git *p,
break;
}
+ ret = 0;
+
out:
unuse_pack(&w_curs);
- return type;
+ return ret;
}
static void *unpack_compressed_entry(struct packed_git *p,
@@ -2152,7 +2155,7 @@ int packfile_store_read_object_info(struct packfile_store *store,
unsigned flags UNUSED)
{
struct pack_entry e;
- int rtype;
+ int ret;
if (!find_pack_entry(store->odb->repo, oid, &e))
return 1;
@@ -2164,8 +2167,8 @@ int packfile_store_read_object_info(struct packfile_store *store,
if (!oi)
return 0;
- rtype = packed_object_info(store->odb->repo, e.p, e.offset, oi);
- if (rtype < 0) {
+ ret = packed_object_info(store->odb->repo, e.p, e.offset, oi);
+ if (ret < 0) {
mark_bad_packed_object(e.p, oid);
return -1;
}
diff --git a/packfile.h b/packfile.h
index 59d162a3f4..d7cce582af 100644
--- a/packfile.h
+++ b/packfile.h
@@ -378,6 +378,10 @@ void release_pack_memory(size_t);
/* global flag to enable extra checks when accessing packed objects */
extern int do_check_packed_object_crc;
+/*
+ * Look up the object info for a specific offset in the packfile.
+ * Returns zero on success, a negative error code otherwise.
+ */
int packed_object_info(struct repository *r,
struct packed_git *pack,
off_t offset, struct object_info *);
--
2.52.0.508.g883dcfc63e.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v3 6/7] packfile: skip unpacking object header for disk size requests
2026-01-06 6:54 ` [PATCH v3 0/7] Improvements for reading object info Patrick Steinhardt
` (4 preceding siblings ...)
2026-01-06 6:55 ` [PATCH v3 5/7] packfile: disentangle return value of `packed_object_info()` Patrick Steinhardt
@ 2026-01-06 6:55 ` Patrick Steinhardt
2026-01-06 6:55 ` [PATCH v3 7/7] packfile: drop repository parameter from `packed_object_info()` Patrick Steinhardt
2026-01-07 10:17 ` [PATCH v3 0/7] Improvements for reading object info Karthik Nayak
7 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-06 6:55 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes
While most of the object info requests for a packed object require us to
unpack its headers, reading its disk size doesn't. We still unpack the
object header in that case though, which is unnecessary work.
Skip reading the header if only the disk size is requested. This leads
to a small speedup when reading disk size, only. The following benchmark
was done in the Git repository:
Benchmark 1: ./git rev-list --disk-usage HEAD (rev = HEAD~)
Time (mean ± σ): 105.2 ms ± 0.6 ms [User: 91.4 ms, System: 13.3 ms]
Range (min … max): 103.7 ms … 106.0 ms 27 runs
Benchmark 2: ./git rev-list --disk-usage HEAD (rev = HEAD)
Time (mean ± σ): 96.7 ms ± 0.4 ms [User: 86.2 ms, System: 10.0 ms]
Range (min … max): 96.2 ms … 98.1 ms 30 runs
Summary
./git rev-list --disk-usage HEAD (rev = HEAD) ran
1.09 ± 0.01 times faster than ./git rev-list --disk-usage HEAD (rev = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/packfile.c b/packfile.c
index 8c6ef45a67..a2ba237ce7 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1586,7 +1586,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
struct pack_window *w_curs = NULL;
unsigned long size;
off_t curpos = obj_offset;
- enum object_type type;
+ enum object_type type = OBJ_NONE;
int ret;
/*
@@ -1598,7 +1598,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
&type);
if (!*oi->contentp)
type = OBJ_BAD;
- } else {
+ } else if (oi->sizep || oi->typep || oi->delta_base_oid) {
type = unpack_object_header(p, &w_curs, &curpos, &size);
}
@@ -1662,6 +1662,9 @@ int packed_object_info(struct repository *r, struct packed_git *p,
oi->u.packed.pack = p;
switch (type) {
+ case OBJ_NONE:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_UNKNOWN;
+ break;
case OBJ_REF_DELTA:
oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
break;
--
2.52.0.508.g883dcfc63e.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v3 7/7] packfile: drop repository parameter from `packed_object_info()`
2026-01-06 6:54 ` [PATCH v3 0/7] Improvements for reading object info Patrick Steinhardt
` (5 preceding siblings ...)
2026-01-06 6:55 ` [PATCH v3 6/7] packfile: skip unpacking object header for disk size requests Patrick Steinhardt
@ 2026-01-06 6:55 ` Patrick Steinhardt
2026-01-07 10:17 ` [PATCH v3 0/7] Improvements for reading object info Karthik Nayak
7 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-06 6:55 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes
The function `packed_object_info()` takes a packfile and offset and
returns the object info for the corresponding object. Despite these two
parameters though it also takes a repository pointer. This is redundant
information though, as `struct packed_git` already has a repository
pointer that is always populated.
Drop the redundant parameter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/cat-file.c | 3 +--
builtin/pack-objects.c | 4 ++--
commit-graph.c | 2 +-
pack-bitmap.c | 3 +--
packfile.c | 8 ++++----
packfile.h | 3 +--
6 files changed, 10 insertions(+), 13 deletions(-)
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 505ddaa12f..2ad712e9f8 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -487,8 +487,7 @@ static void batch_object_write(const char *obj_name,
data->info.sizep = &data->size;
if (pack)
- ret = packed_object_info(the_repository, pack,
- offset, &data->info);
+ ret = packed_object_info(pack, offset, &data->info);
else
ret = odb_read_object_info_extended(the_repository->objects,
&data->oid, &data->info,
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 1ce8d6ee21..85762f8c4f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2411,7 +2411,7 @@ static void drop_reused_delta(struct object_entry *entry)
oi.sizep = &size;
oi.typep = &type;
- if (packed_object_info(the_repository, IN_PACK(entry), entry->in_pack_offset, &oi) < 0) {
+ if (packed_object_info(IN_PACK(entry), entry->in_pack_offset, &oi) < 0) {
/*
* We failed to get the info from this pack for some reason;
* fall back to odb_read_object_info, which may find another copy.
@@ -3748,7 +3748,7 @@ static int add_object_entry_from_pack(const struct object_id *oid,
struct object_info oi = OBJECT_INFO_INIT;
oi.typep = &type;
- if (packed_object_info(the_repository, p, ofs, &oi) < 0) {
+ if (packed_object_info(p, ofs, &oi) < 0) {
die(_("could not get type of object %s in pack %s"),
oid_to_hex(oid), p->pack_name);
} else if (type == OBJ_COMMIT) {
diff --git a/commit-graph.c b/commit-graph.c
index 80be2ff2c3..f572670bd0 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1499,7 +1499,7 @@ static int add_packed_commits(const struct object_id *oid,
display_progress(ctx->progress, ++ctx->progress_done);
oi.typep = &type;
- if (packed_object_info(ctx->r, pack, offset, &oi) < 0)
+ if (packed_object_info(pack, offset, &oi) < 0)
die(_("unable to get type of object %s"), oid_to_hex(oid));
if (type != OBJ_COMMIT)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 8ca79725b1..972203f12b 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1876,8 +1876,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
ofs = pack_pos_to_offset(pack, pos);
}
- if (packed_object_info(bitmap_repo(bitmap_git), pack, ofs,
- &oi) < 0) {
+ if (packed_object_info(pack, ofs, &oi) < 0) {
struct object_id oid;
nth_bitmap_object_oid(bitmap_git, &oid,
pack_pos_to_index(pack, pos));
diff --git a/packfile.c b/packfile.c
index a2ba237ce7..39899aec49 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1580,7 +1580,7 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
hashmap_add(&delta_base_cache, &ent->ent);
}
-int packed_object_info(struct repository *r, struct packed_git *p,
+int packed_object_info(struct packed_git *p,
off_t obj_offset, struct object_info *oi)
{
struct pack_window *w_curs = NULL;
@@ -1594,7 +1594,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
* a "real" type later if the caller is interested.
*/
if (oi->contentp) {
- *oi->contentp = cache_or_unpack_entry(r, p, obj_offset, oi->sizep,
+ *oi->contentp = cache_or_unpack_entry(p->repo, p, obj_offset, oi->sizep,
&type);
if (!*oi->contentp)
type = OBJ_BAD;
@@ -1635,7 +1635,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (oi->typep) {
enum object_type ptot;
- ptot = packed_to_object_type(r, p, obj_offset,
+ ptot = packed_to_object_type(p->repo, p, obj_offset,
type, &w_curs, curpos);
if (oi->typep)
*oi->typep = ptot;
@@ -2170,7 +2170,7 @@ int packfile_store_read_object_info(struct packfile_store *store,
if (!oi)
return 0;
- ret = packed_object_info(store->odb->repo, e.p, e.offset, oi);
+ ret = packed_object_info(e.p, e.offset, oi);
if (ret < 0) {
mark_bad_packed_object(e.p, oid);
return -1;
diff --git a/packfile.h b/packfile.h
index d7cce582af..33fed26362 100644
--- a/packfile.h
+++ b/packfile.h
@@ -382,8 +382,7 @@ extern int do_check_packed_object_crc;
* Look up the object info for a specific offset in the packfile.
* Returns zero on success, a negative error code otherwise.
*/
-int packed_object_info(struct repository *r,
- struct packed_git *pack,
+int packed_object_info(struct packed_git *pack,
off_t offset, struct object_info *);
void mark_bad_packed_object(struct packed_git *, const struct object_id *);
--
2.52.0.508.g883dcfc63e.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH v3 1/7] object-file: always set OI_LOOSE when reading object info
2026-01-06 6:54 ` [PATCH v3 1/7] object-file: always set OI_LOOSE when " Patrick Steinhardt
@ 2026-01-07 8:50 ` Karthik Nayak
2026-01-07 11:27 ` Patrick Steinhardt
0 siblings, 1 reply; 58+ messages in thread
From: Karthik Nayak @ 2026-01-07 8:50 UTC (permalink / raw)
To: Patrick Steinhardt, git
Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes
[-- Attachment #1: Type: text/plain, Size: 1730 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
> There are some early returns in ``odb_source_loose_read_object_info()`
Nit: s/``/`
> in cases where we don't have to open the loose object. These return
> paths do not set `struct object_info::whence` to `OI_LOOSE` though, so
> it becomes impossible for the caller to tell the format of such an
> object.
>
> Nobody seems to care about this right now, but it's a bug waiting to
> happen. Fix this by always setting `whence` on success.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> object-file.c | 19 +++++++++++++++----
> 1 file changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/object-file.c b/object-file.c
> index 6280e42f34..d566df427a 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -439,12 +439,23 @@ int odb_source_loose_read_object_info(struct odb_source *source,
> */
> if (!oi || (!oi->typep && !oi->sizep && !oi->contentp)) {
> struct stat st;
> - if ((!oi || !oi->disk_sizep) && (flags & OBJECT_INFO_QUICK))
> - return quick_has_loose(source->loose, oid) ? 0 : -1;
> +
> + if ((!oi || !oi->disk_sizep) && (flags & OBJECT_INFO_QUICK)) {
> + status = quick_has_loose(source->loose, oid) ? 0 : -1;
> + if (!status && oi)
> + oi->whence = OI_LOOSE;
> + return status;
> + }
> +
> if (stat_loose_object(source->loose, oid, &st, &path) < 0)
> return -1;
> - if (oi && oi->disk_sizep)
> - *oi->disk_sizep = st.st_size;
> +
> + if (oi) {
> + if (oi->disk_sizep)
> + *oi->disk_sizep = st.st_size;
> + oi->whence = OI_LOOSE;
> + }
> +
> return 0;
> }
>
The change looks good. I'm wary of early returns independently doing the
cleanup, wonder if it'd be better to do `status = ...; goto cleanup`
instead.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v3 3/7] packfile: extend `is_delta` field to allow for "unknown" state
2026-01-06 6:54 ` [PATCH v3 3/7] packfile: extend `is_delta` field to allow for "unknown" state Patrick Steinhardt
@ 2026-01-07 10:12 ` Karthik Nayak
0 siblings, 0 replies; 58+ messages in thread
From: Karthik Nayak @ 2026-01-07 10:12 UTC (permalink / raw)
To: Patrick Steinhardt, git
Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes
[-- Attachment #1: Type: text/plain, Size: 2562 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
> The `struct object_info::u::packed::is_delta` field determines whether
> or not a specific object is stored as a delta. It only stores whether or
> not the object is stored as delta, so it is treated as a boolean value.
>
> This boolean is insufficient though: when reading a packed object via
> `packfile_store_read_object_info()` we know to skip parsing the actual
> object when the user didn't request any object-specific data. In that
> case we won't read the object itself, but will only look up its position
> in the packfile. Consequently, we do not know whether it is a delta or
> not.
>
> This isn't really an issue right now, as the check for an empty request
> is broken. But a subsequent commit will fix it, and once we do we will
> have the need to also represent an "unknown" delta state.
>
> Prepare for this change by introducing a new enum that encodes the
> object type. We don't use the "unknown" state just yet, but will start
> to do so in a subsequent commit.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> odb.h | 7 ++++++-
> packfile.c | 17 ++++++++++++++---
> 2 files changed, 20 insertions(+), 4 deletions(-)
>
> diff --git a/odb.h b/odb.h
> index 73b0b87ad5..afae5e5c01 100644
> --- a/odb.h
> +++ b/odb.h
> @@ -343,7 +343,12 @@ struct object_info {
> struct {
> struct packed_git *pack;
> off_t offset;
> - unsigned int is_delta;
> + enum packed_object_type {
> + PACKED_OBJECT_TYPE_UNKNOWN,
> + PACKED_OBJECT_TYPE_FULL,
> + PACKED_OBJECT_TYPE_OFS_DELTA,
> + PACKED_OBJECT_TYPE_REF_DELTA,
> + } type;
> } packed;
> } u;
> };
> diff --git a/packfile.c b/packfile.c
> index b0c6665c87..cc797b2b6a 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -2159,8 +2159,18 @@ int packfile_store_read_object_info(struct packfile_store *store,
> if (oi->whence == OI_PACKED) {
> oi->u.packed.offset = e.offset;
> oi->u.packed.pack = e.p;
> - oi->u.packed.is_delta = (rtype == OBJ_REF_DELTA ||
> - rtype == OBJ_OFS_DELTA);
> +
> + switch (rtype) {
> + case OBJ_REF_DELTA:
> + oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
> + break;
> + case OBJ_OFS_DELTA:
> + oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA;
> + break;
> + default:
> + oi->u.packed.type = PACKED_OBJECT_TYPE_FULL;
> + break;
> + }
> }
>
So we get `rtype` from `packed_object_info()` which can return OBJ_BAD,
but return early in such a scenario. So overall this makes sense. I like
that we are now storing more and clearer information.
[snip]
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v3 0/7] Improvements for reading object info
2026-01-06 6:54 ` [PATCH v3 0/7] Improvements for reading object info Patrick Steinhardt
` (6 preceding siblings ...)
2026-01-06 6:55 ` [PATCH v3 7/7] packfile: drop repository parameter from `packed_object_info()` Patrick Steinhardt
@ 2026-01-07 10:17 ` Karthik Nayak
7 siblings, 0 replies; 58+ messages in thread
From: Karthik Nayak @ 2026-01-07 10:17 UTC (permalink / raw)
To: Patrick Steinhardt, git
Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes
[-- Attachment #1: Type: text/plain, Size: 410 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
> Hi,
>
> this patch series contains various small improvements for reading object
> info for either loose or packed objects. These improvements were split
> out of a larger patch series where I'm about to introduce a new generic
> `odb_for_each_object()` function.
>
I'm dropping into v3 of the series for review, as such the series looks
good to me.
Thanks!
[snip]
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v3 1/7] object-file: always set OI_LOOSE when reading object info
2026-01-07 8:50 ` Karthik Nayak
@ 2026-01-07 11:27 ` Patrick Steinhardt
0 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-07 11:27 UTC (permalink / raw)
To: Karthik Nayak
Cc: git, Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk,
Toon Claes
On Wed, Jan 07, 2026 at 12:50:45AM -0800, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > diff --git a/object-file.c b/object-file.c
> > index 6280e42f34..d566df427a 100644
> > --- a/object-file.c
> > +++ b/object-file.c
> > @@ -439,12 +439,23 @@ int odb_source_loose_read_object_info(struct odb_source *source,
> > */
> > if (!oi || (!oi->typep && !oi->sizep && !oi->contentp)) {
> > struct stat st;
> > - if ((!oi || !oi->disk_sizep) && (flags & OBJECT_INFO_QUICK))
> > - return quick_has_loose(source->loose, oid) ? 0 : -1;
> > +
> > + if ((!oi || !oi->disk_sizep) && (flags & OBJECT_INFO_QUICK)) {
> > + status = quick_has_loose(source->loose, oid) ? 0 : -1;
> > + if (!status && oi)
> > + oi->whence = OI_LOOSE;
> > + return status;
> > + }
> > +
> > if (stat_loose_object(source->loose, oid, &st, &path) < 0)
> > return -1;
> > - if (oi && oi->disk_sizep)
> > - *oi->disk_sizep = st.st_size;
> > +
> > + if (oi) {
> > + if (oi->disk_sizep)
> > + *oi->disk_sizep = st.st_size;
> > + oi->whence = OI_LOOSE;
> > + }
> > +
> > return 0;
> > }
> >
>
> The change looks good. I'm wary of early returns independently doing the
> cleanup, wonder if it'd be better to do `status = ...; goto cleanup`
> instead.
I share that sentiment, and I was in fact having a look at what it would
take to have a single exit path in this function. I eventually discarded
the work though because it required a bunch of changes to really make
this whole function more readable than it currently is.
But now that you're the second one thinking this I'll probably bite the
bullet and just do it.
Thanks!
Patrick
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH v4 0/7] Improvements for reading object info
2025-12-18 6:28 [PATCH 0/8] Improvements for reading object info Patrick Steinhardt
` (10 preceding siblings ...)
2026-01-06 6:54 ` [PATCH v3 0/7] Improvements for reading object info Patrick Steinhardt
@ 2026-01-07 13:07 ` Patrick Steinhardt
2026-01-07 13:08 ` [PATCH v4 1/7] object-file: always set OI_LOOSE when " Patrick Steinhardt
` (7 more replies)
2026-01-12 9:00 ` [PATCH v5 " Patrick Steinhardt
12 siblings, 8 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-07 13:07 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes,
Karthik Nayak
Hi,
this patch series contains various small improvements for reading object
info for either loose or packed objects. These improvements were split
out of a larger patch series where I'm about to introduce a new generic
`odb_for_each_object()` function.
Changes in v4:
- Extend the fix for OI_LOOSE and refactor the whole function to have
a single exit path as proposed by Karthik. This results in a lot
more changes, but makes the function way easier to reason about
going forward.
- Link to v3: https://lore.kernel.org/r/20260106-b4-pks-odb-read-object-info-improvements-v3-0-b5e02fae1fb0@pks.im
Changes in v3:
- Fix a commit message typo.
- Fix a function comment missing some words.
- Link to v2: https://lore.kernel.org/r/20251218-b4-pks-odb-read-object-info-improvements-v2-0-62e3e49072bc@pks.im
Changes in v2:
- Rebase the series on top of master with jc/object-read-stream-fix
merged into it. I've also evicted the patch that fixes the same
underlying issue.
- Improve the commit message that drops OI_DBCACHED to explain why
this is a safe refactoring.
- Link to v1: https://lore.kernel.org/r/20251218-b4-pks-odb-read-object-info-improvements-v1-0-81c8368492be@pks.im
Thanks!
Patrick
---
Patrick Steinhardt (7):
object-file: always set OI_LOOSE when reading object info
packfile: always declare object info to be OI_PACKED
packfile: extend `is_delta` field to allow for "unknown" state
packfile: always populate pack-specific info when reading object info
packfile: disentangle return value of `packed_object_info()`
packfile: skip unpacking object header for disk size requests
packfile: drop repository parameter from `packed_object_info()`
builtin/cat-file.c | 3 +-
builtin/pack-objects.c | 4 +-
commit-graph.c | 2 +-
object-file.c | 115 ++++++++++++++++++++++++++++++-------------------
odb.h | 8 +++-
pack-bitmap.c | 3 +-
packfile.c | 61 +++++++++++++++-----------
packfile.h | 7 ++-
8 files changed, 124 insertions(+), 79 deletions(-)
Range-diff versus v3:
1: 5c67d9abe8 < -: ---------- object-file: always set OI_LOOSE when reading object info
-: ---------- > 1: 7708b50c2a object-file: always set OI_LOOSE when reading object info
2: 8b106feb28 = 2: a96ac5b351 packfile: always declare object info to be OI_PACKED
3: adbd3e5ae5 = 3: 8e3193a06e packfile: extend `is_delta` field to allow for "unknown" state
4: 218c64c9a5 = 4: e718161286 packfile: always populate pack-specific info when reading object info
5: dcae7be795 = 5: 217bec7e3b packfile: disentangle return value of `packed_object_info()`
6: beac514592 = 6: 2aaacfd639 packfile: skip unpacking object header for disk size requests
7: dacccf1cb4 = 7: a9e37b7e00 packfile: drop repository parameter from `packed_object_info()`
---
base-commit: 7df68b50e49b6a1b576abb19b2e5d457749bc28b
change-id: 20251215-b4-pks-odb-read-object-info-improvements-0e031ef827d2
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH v4 1/7] object-file: always set OI_LOOSE when reading object info
2026-01-07 13:07 ` [PATCH v4 " Patrick Steinhardt
@ 2026-01-07 13:08 ` Patrick Steinhardt
2026-01-08 9:30 ` Karthik Nayak
2026-01-07 13:08 ` [PATCH v4 2/7] packfile: always declare object info to be OI_PACKED Patrick Steinhardt
` (6 subsequent siblings)
7 siblings, 1 reply; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-07 13:08 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes,
Karthik Nayak
There are some early returns in `odb_source_loose_read_object_info()`
in cases where we don't have to open the loose object. These return
paths do not set `struct object_info::whence` to `OI_LOOSE` though, so
it becomes impossible for the caller to tell the format of such an
object.
The root cause of this really is that we have so many different return
paths in the function. As a consequence, it's harder than necessary to
make sure that all successful exit paths sot up the `whence` field as
expected.
Address this by refactoring the function to have a single exit path.
Like this, we can trivially set up the `whence` field when we exit
successfully from the function.
Note that we also:
- Rename `status` to `ret` to match our usual coding style, but also
to show that the old `status` variable is now always getting the
expected value. Furthermore, the value is not initialized anymore,
which has the consequence that most compilers will warn for exit
paths where we forgot to set it.
- Move the setup of scratch pointers closer to `parse_loose_header()`
to show where it's needed.
- Guard a couple of variables on cleanup so that they only get
released in case they have been set up.
- Reset `oi->delta_base_oid` towards the end of the function, together
with all the other object info pointers.
Overall, all these changes result in a diff that is somewhat hard to
read. But the end result is significantly easier to read and reason
about, so I'd argue this one-time churn is worth it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
object-file.c | 115 ++++++++++++++++++++++++++++++++++++----------------------
1 file changed, 71 insertions(+), 44 deletions(-)
diff --git a/object-file.c b/object-file.c
index 6280e42f34..e7e4c3348f 100644
--- a/object-file.c
+++ b/object-file.c
@@ -416,19 +416,16 @@ int odb_source_loose_read_object_info(struct odb_source *source,
const struct object_id *oid,
struct object_info *oi, int flags)
{
- int status = 0;
+ int ret;
int fd;
unsigned long mapsize;
const char *path;
- void *map;
- git_zstream stream;
+ void *map = NULL;
+ git_zstream stream, *stream_to_end = NULL;
char hdr[MAX_HEADER_LEN];
unsigned long size_scratch;
enum object_type type_scratch;
- if (oi && oi->delta_base_oid)
- oidclr(oi->delta_base_oid, source->odb->repo->hash_algo);
-
/*
* If we don't care about type or size, then we don't
* need to look inside the object at all. Note that we
@@ -439,71 +436,101 @@ int odb_source_loose_read_object_info(struct odb_source *source,
*/
if (!oi || (!oi->typep && !oi->sizep && !oi->contentp)) {
struct stat st;
- if ((!oi || !oi->disk_sizep) && (flags & OBJECT_INFO_QUICK))
- return quick_has_loose(source->loose, oid) ? 0 : -1;
- if (stat_loose_object(source->loose, oid, &st, &path) < 0)
- return -1;
+
+ if ((!oi || !oi->disk_sizep) && (flags & OBJECT_INFO_QUICK)) {
+ ret = quick_has_loose(source->loose, oid) ? 0 : -1;
+ goto out;
+ }
+
+ if (stat_loose_object(source->loose, oid, &st, &path) < 0) {
+ ret = -1;
+ goto out;
+ }
+
if (oi && oi->disk_sizep)
*oi->disk_sizep = st.st_size;
- return 0;
+
+ ret = 0;
+ goto out;
}
fd = open_loose_object(source->loose, oid, &path);
if (fd < 0) {
if (errno != ENOENT)
error_errno(_("unable to open loose object %s"), oid_to_hex(oid));
- return -1;
+ ret = -1;
+ goto out;
}
- map = map_fd(fd, path, &mapsize);
- if (!map)
- return -1;
- if (!oi->sizep)
- oi->sizep = &size_scratch;
- if (!oi->typep)
- oi->typep = &type_scratch;
+ map = map_fd(fd, path, &mapsize);
+ if (!map) {
+ ret = -1;
+ goto out;
+ }
if (oi->disk_sizep)
*oi->disk_sizep = mapsize;
+ stream_to_end = &stream;
+
switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr))) {
case ULHR_OK:
- if (parse_loose_header(hdr, oi) < 0)
- status = error(_("unable to parse %s header"), oid_to_hex(oid));
- else if (*oi->typep < 0)
+ if (!oi->sizep)
+ oi->sizep = &size_scratch;
+ if (!oi->typep)
+ oi->typep = &type_scratch;
+
+ if (parse_loose_header(hdr, oi) < 0) {
+ ret = error(_("unable to parse %s header"), oid_to_hex(oid));
+ goto corrupt;
+ }
+
+ if (*oi->typep < 0)
die(_("invalid object type"));
- if (!oi->contentp)
- break;
- *oi->contentp = unpack_loose_rest(&stream, hdr, *oi->sizep, oid);
- if (*oi->contentp)
- goto cleanup;
+ if (oi->contentp) {
+ *oi->contentp = unpack_loose_rest(&stream, hdr, *oi->sizep, oid);
+ if (!*oi->contentp) {
+ ret = -1;
+ goto corrupt;
+ }
+ }
- status = -1;
break;
case ULHR_BAD:
- status = error(_("unable to unpack %s header"),
- oid_to_hex(oid));
- break;
+ ret = error(_("unable to unpack %s header"),
+ oid_to_hex(oid));
+ goto corrupt;
case ULHR_TOO_LONG:
- status = error(_("header for %s too long, exceeds %d bytes"),
- oid_to_hex(oid), MAX_HEADER_LEN);
- break;
+ ret = error(_("header for %s too long, exceeds %d bytes"),
+ oid_to_hex(oid), MAX_HEADER_LEN);
+ goto corrupt;
}
- if (status && (flags & OBJECT_INFO_DIE_IF_CORRUPT))
+ ret = 0;
+
+corrupt:
+ if (ret && (flags & OBJECT_INFO_DIE_IF_CORRUPT))
die(_("loose object %s (stored in %s) is corrupt"),
oid_to_hex(oid), path);
-cleanup:
- git_inflate_end(&stream);
- munmap(map, mapsize);
- if (oi->sizep == &size_scratch)
- oi->sizep = NULL;
- if (oi->typep == &type_scratch)
- oi->typep = NULL;
- oi->whence = OI_LOOSE;
- return status;
+out:
+ if (stream_to_end)
+ git_inflate_end(stream_to_end);
+ if (map)
+ munmap(map, mapsize);
+ if (oi) {
+ if (oi->sizep == &size_scratch)
+ oi->sizep = NULL;
+ if (oi->typep == &type_scratch)
+ oi->typep = NULL;
+ if (oi->delta_base_oid)
+ oidclr(oi->delta_base_oid, source->odb->repo->hash_algo);
+ if (!ret)
+ oi->whence = OI_LOOSE;
+ }
+
+ return ret;
}
static void hash_object_body(const struct git_hash_algo *algo, struct git_hash_ctx *c,
--
2.52.0.542.g9473a8513b.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v4 2/7] packfile: always declare object info to be OI_PACKED
2026-01-07 13:07 ` [PATCH v4 " Patrick Steinhardt
2026-01-07 13:08 ` [PATCH v4 1/7] object-file: always set OI_LOOSE when " Patrick Steinhardt
@ 2026-01-07 13:08 ` Patrick Steinhardt
2026-01-07 13:08 ` [PATCH v4 3/7] packfile: extend `is_delta` field to allow for "unknown" state Patrick Steinhardt
` (5 subsequent siblings)
7 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-07 13:08 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes,
Karthik Nayak
When reading object info via a packfile we yield one of two types:
- The object can either be OI_PACKED, which is what a caller would
typically expect.
- Or it can be OI_DBCACHED if it is stored in the delta base cache.
The latter really is an implementation detail though, and callers
typically don't care at all about the difference. Furthermore, the
information whether or not it is part of the delta base cache can
already be derived via the `is_delta` field, so the fact that we discern
between OI_PACKED and OI_DBCACHED only further complicates the
interface.
There aren't all that many callers that care about the `whence` field in
the first place. In fact, there's only three:
- `packfile_store_read_object_info()` checks for `whence == OI_PACKED`
and then populates the packfile information of the object info
structure. We now start to do this also for deltified objects, which
gives its callers strictly more information.
- `repack_local_links()` wants to determine whether the object is part
of a promisor pack and checks for `whence == OI_PACKED`. If so, it
verifies that the packfile is a promisor pack. It's arguably wrong
to declare that an object is not part of a promisor pack only
because it is stored in the delta base cache.
- `is_not_in_promisor_pack_obj()` does the same, but checks that a
specific object is _not_ part of a promisor pack. The same reasoning
as above applies.
Drop the OI_DBCACHED enum completely. None of the callers seem to care
about the distinction.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.h | 1 -
packfile.c | 3 +--
2 files changed, 1 insertion(+), 3 deletions(-)
diff --git a/odb.h b/odb.h
index 014cd9585a..73b0b87ad5 100644
--- a/odb.h
+++ b/odb.h
@@ -330,7 +330,6 @@ struct object_info {
OI_CACHED,
OI_LOOSE,
OI_PACKED,
- OI_DBCACHED
} whence;
union {
/*
diff --git a/packfile.c b/packfile.c
index 08a0863fc3..b0c6665c87 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1656,8 +1656,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
oidclr(oi->delta_base_oid, p->repo->hash_algo);
}
- oi->whence = in_delta_base_cache(p, obj_offset) ? OI_DBCACHED :
- OI_PACKED;
+ oi->whence = OI_PACKED;
out:
unuse_pack(&w_curs);
--
2.52.0.542.g9473a8513b.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v4 3/7] packfile: extend `is_delta` field to allow for "unknown" state
2026-01-07 13:07 ` [PATCH v4 " Patrick Steinhardt
2026-01-07 13:08 ` [PATCH v4 1/7] object-file: always set OI_LOOSE when " Patrick Steinhardt
2026-01-07 13:08 ` [PATCH v4 2/7] packfile: always declare object info to be OI_PACKED Patrick Steinhardt
@ 2026-01-07 13:08 ` Patrick Steinhardt
2026-01-07 13:08 ` [PATCH v4 4/7] packfile: always populate pack-specific info when reading object info Patrick Steinhardt
` (4 subsequent siblings)
7 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-07 13:08 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes,
Karthik Nayak
The `struct object_info::u::packed::is_delta` field determines whether
or not a specific object is stored as a delta. It only stores whether or
not the object is stored as delta, so it is treated as a boolean value.
This boolean is insufficient though: when reading a packed object via
`packfile_store_read_object_info()` we know to skip parsing the actual
object when the user didn't request any object-specific data. In that
case we won't read the object itself, but will only look up its position
in the packfile. Consequently, we do not know whether it is a delta or
not.
This isn't really an issue right now, as the check for an empty request
is broken. But a subsequent commit will fix it, and once we do we will
have the need to also represent an "unknown" delta state.
Prepare for this change by introducing a new enum that encodes the
object type. We don't use the "unknown" state just yet, but will start
to do so in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.h | 7 ++++++-
packfile.c | 17 ++++++++++++++---
2 files changed, 20 insertions(+), 4 deletions(-)
diff --git a/odb.h b/odb.h
index 73b0b87ad5..afae5e5c01 100644
--- a/odb.h
+++ b/odb.h
@@ -343,7 +343,12 @@ struct object_info {
struct {
struct packed_git *pack;
off_t offset;
- unsigned int is_delta;
+ enum packed_object_type {
+ PACKED_OBJECT_TYPE_UNKNOWN,
+ PACKED_OBJECT_TYPE_FULL,
+ PACKED_OBJECT_TYPE_OFS_DELTA,
+ PACKED_OBJECT_TYPE_REF_DELTA,
+ } type;
} packed;
} u;
};
diff --git a/packfile.c b/packfile.c
index b0c6665c87..cc797b2b6a 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2159,8 +2159,18 @@ int packfile_store_read_object_info(struct packfile_store *store,
if (oi->whence == OI_PACKED) {
oi->u.packed.offset = e.offset;
oi->u.packed.pack = e.p;
- oi->u.packed.is_delta = (rtype == OBJ_REF_DELTA ||
- rtype == OBJ_OFS_DELTA);
+
+ switch (rtype) {
+ case OBJ_REF_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
+ break;
+ case OBJ_OFS_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA;
+ break;
+ default:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_FULL;
+ break;
+ }
}
return 0;
@@ -2531,7 +2541,8 @@ int packfile_store_read_object_stream(struct odb_read_stream **out,
oi.sizep = &size;
if (packfile_store_read_object_info(store, oid, &oi, 0) ||
- oi.u.packed.is_delta ||
+ oi.u.packed.type == PACKED_OBJECT_TYPE_REF_DELTA ||
+ oi.u.packed.type == PACKED_OBJECT_TYPE_OFS_DELTA ||
repo_settings_get_big_file_threshold(store->odb->repo) >= size)
return -1;
--
2.52.0.542.g9473a8513b.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v4 4/7] packfile: always populate pack-specific info when reading object info
2026-01-07 13:07 ` [PATCH v4 " Patrick Steinhardt
` (2 preceding siblings ...)
2026-01-07 13:08 ` [PATCH v4 3/7] packfile: extend `is_delta` field to allow for "unknown" state Patrick Steinhardt
@ 2026-01-07 13:08 ` Patrick Steinhardt
2026-01-07 13:08 ` [PATCH v4 5/7] packfile: disentangle return value of `packed_object_info()` Patrick Steinhardt
` (3 subsequent siblings)
7 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-07 13:08 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes,
Karthik Nayak
When reading object information via `packed_object_info()` we may not
populate the object info's packfile-specific fields. This leads to
inconsistent object info depending on whether the info was populated via
`packfile_store_read_object_info()` or `packed_object_info()`.
Fix this inconsistency so that we can always assume the pack info to be
populated when reading object info from a pack.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 31 ++++++++++++++-----------------
1 file changed, 14 insertions(+), 17 deletions(-)
diff --git a/packfile.c b/packfile.c
index cc797b2b6a..f7c33a2f77 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1657,6 +1657,20 @@ int packed_object_info(struct repository *r, struct packed_git *p,
}
oi->whence = OI_PACKED;
+ oi->u.packed.offset = obj_offset;
+ oi->u.packed.pack = p;
+
+ switch (type) {
+ case OBJ_REF_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
+ break;
+ case OBJ_OFS_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA;
+ break;
+ default:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_FULL;
+ break;
+ }
out:
unuse_pack(&w_curs);
@@ -2156,23 +2170,6 @@ int packfile_store_read_object_info(struct packfile_store *store,
return -1;
}
- if (oi->whence == OI_PACKED) {
- oi->u.packed.offset = e.offset;
- oi->u.packed.pack = e.p;
-
- switch (rtype) {
- case OBJ_REF_DELTA:
- oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
- break;
- case OBJ_OFS_DELTA:
- oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA;
- break;
- default:
- oi->u.packed.type = PACKED_OBJECT_TYPE_FULL;
- break;
- }
- }
-
return 0;
}
--
2.52.0.542.g9473a8513b.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v4 5/7] packfile: disentangle return value of `packed_object_info()`
2026-01-07 13:07 ` [PATCH v4 " Patrick Steinhardt
` (3 preceding siblings ...)
2026-01-07 13:08 ` [PATCH v4 4/7] packfile: always populate pack-specific info when reading object info Patrick Steinhardt
@ 2026-01-07 13:08 ` Patrick Steinhardt
2026-01-07 13:08 ` [PATCH v4 6/7] packfile: skip unpacking object header for disk size requests Patrick Steinhardt
` (2 subsequent siblings)
7 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-07 13:08 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes,
Karthik Nayak
The `packed_object_info()` function returns the type of the packed
object. While we use an `enum object_type` to store the return value,
this type is not to be confused with the actual object type. It _may_
contain the object type, but it may just as well encode that the given
packed object is stored as a delta.
We have removed the only caller that relied on this returned object type
in the preceding commit, so let's simplify semantics and return either 0
on success or a negative error code otherwise.
This unblocks a small optimization where we can skip reading the object
type altogether.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 21 ++++++++++++---------
packfile.h | 4 ++++
2 files changed, 16 insertions(+), 9 deletions(-)
diff --git a/packfile.c b/packfile.c
index f7c33a2f77..8c6ef45a67 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1587,6 +1587,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
unsigned long size;
off_t curpos = obj_offset;
enum object_type type;
+ int ret;
/*
* We always get the representation type, but only convert it to
@@ -1607,12 +1608,12 @@ int packed_object_info(struct repository *r, struct packed_git *p,
off_t base_offset = get_delta_base(p, &w_curs, &tmp_pos,
type, obj_offset);
if (!base_offset) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
*oi->sizep = get_size_from_delta(p, &w_curs, tmp_pos);
if (*oi->sizep == 0) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
} else {
@@ -1625,7 +1626,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (offset_to_pack_pos(p, obj_offset, &pos) < 0) {
error("could not find object at offset %"PRIuMAX" "
"in pack %s", (uintmax_t)obj_offset, p->pack_name);
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
@@ -1639,7 +1640,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (oi->typep)
*oi->typep = ptot;
if (ptot < 0) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
}
@@ -1649,7 +1650,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (get_delta_base_oid(p, &w_curs, curpos,
oi->delta_base_oid,
type, obj_offset) < 0) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
} else
@@ -1672,9 +1673,11 @@ int packed_object_info(struct repository *r, struct packed_git *p,
break;
}
+ ret = 0;
+
out:
unuse_pack(&w_curs);
- return type;
+ return ret;
}
static void *unpack_compressed_entry(struct packed_git *p,
@@ -2152,7 +2155,7 @@ int packfile_store_read_object_info(struct packfile_store *store,
unsigned flags UNUSED)
{
struct pack_entry e;
- int rtype;
+ int ret;
if (!find_pack_entry(store->odb->repo, oid, &e))
return 1;
@@ -2164,8 +2167,8 @@ int packfile_store_read_object_info(struct packfile_store *store,
if (!oi)
return 0;
- rtype = packed_object_info(store->odb->repo, e.p, e.offset, oi);
- if (rtype < 0) {
+ ret = packed_object_info(store->odb->repo, e.p, e.offset, oi);
+ if (ret < 0) {
mark_bad_packed_object(e.p, oid);
return -1;
}
diff --git a/packfile.h b/packfile.h
index 59d162a3f4..d7cce582af 100644
--- a/packfile.h
+++ b/packfile.h
@@ -378,6 +378,10 @@ void release_pack_memory(size_t);
/* global flag to enable extra checks when accessing packed objects */
extern int do_check_packed_object_crc;
+/*
+ * Look up the object info for a specific offset in the packfile.
+ * Returns zero on success, a negative error code otherwise.
+ */
int packed_object_info(struct repository *r,
struct packed_git *pack,
off_t offset, struct object_info *);
--
2.52.0.542.g9473a8513b.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v4 6/7] packfile: skip unpacking object header for disk size requests
2026-01-07 13:07 ` [PATCH v4 " Patrick Steinhardt
` (4 preceding siblings ...)
2026-01-07 13:08 ` [PATCH v4 5/7] packfile: disentangle return value of `packed_object_info()` Patrick Steinhardt
@ 2026-01-07 13:08 ` Patrick Steinhardt
2026-01-07 13:08 ` [PATCH v4 7/7] packfile: drop repository parameter from `packed_object_info()` Patrick Steinhardt
2026-01-08 9:30 ` [PATCH v4 0/7] Improvements for reading object info Karthik Nayak
7 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-07 13:08 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes,
Karthik Nayak
While most of the object info requests for a packed object require us to
unpack its headers, reading its disk size doesn't. We still unpack the
object header in that case though, which is unnecessary work.
Skip reading the header if only the disk size is requested. This leads
to a small speedup when reading disk size, only. The following benchmark
was done in the Git repository:
Benchmark 1: ./git rev-list --disk-usage HEAD (rev = HEAD~)
Time (mean ± σ): 105.2 ms ± 0.6 ms [User: 91.4 ms, System: 13.3 ms]
Range (min … max): 103.7 ms … 106.0 ms 27 runs
Benchmark 2: ./git rev-list --disk-usage HEAD (rev = HEAD)
Time (mean ± σ): 96.7 ms ± 0.4 ms [User: 86.2 ms, System: 10.0 ms]
Range (min … max): 96.2 ms … 98.1 ms 30 runs
Summary
./git rev-list --disk-usage HEAD (rev = HEAD) ran
1.09 ± 0.01 times faster than ./git rev-list --disk-usage HEAD (rev = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/packfile.c b/packfile.c
index 8c6ef45a67..a2ba237ce7 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1586,7 +1586,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
struct pack_window *w_curs = NULL;
unsigned long size;
off_t curpos = obj_offset;
- enum object_type type;
+ enum object_type type = OBJ_NONE;
int ret;
/*
@@ -1598,7 +1598,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
&type);
if (!*oi->contentp)
type = OBJ_BAD;
- } else {
+ } else if (oi->sizep || oi->typep || oi->delta_base_oid) {
type = unpack_object_header(p, &w_curs, &curpos, &size);
}
@@ -1662,6 +1662,9 @@ int packed_object_info(struct repository *r, struct packed_git *p,
oi->u.packed.pack = p;
switch (type) {
+ case OBJ_NONE:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_UNKNOWN;
+ break;
case OBJ_REF_DELTA:
oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
break;
--
2.52.0.542.g9473a8513b.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v4 7/7] packfile: drop repository parameter from `packed_object_info()`
2026-01-07 13:07 ` [PATCH v4 " Patrick Steinhardt
` (5 preceding siblings ...)
2026-01-07 13:08 ` [PATCH v4 6/7] packfile: skip unpacking object header for disk size requests Patrick Steinhardt
@ 2026-01-07 13:08 ` Patrick Steinhardt
2026-01-08 9:30 ` [PATCH v4 0/7] Improvements for reading object info Karthik Nayak
7 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-07 13:08 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes,
Karthik Nayak
The function `packed_object_info()` takes a packfile and offset and
returns the object info for the corresponding object. Despite these two
parameters though it also takes a repository pointer. This is redundant
information though, as `struct packed_git` already has a repository
pointer that is always populated.
Drop the redundant parameter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/cat-file.c | 3 +--
builtin/pack-objects.c | 4 ++--
commit-graph.c | 2 +-
pack-bitmap.c | 3 +--
packfile.c | 8 ++++----
packfile.h | 3 +--
6 files changed, 10 insertions(+), 13 deletions(-)
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 505ddaa12f..2ad712e9f8 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -487,8 +487,7 @@ static void batch_object_write(const char *obj_name,
data->info.sizep = &data->size;
if (pack)
- ret = packed_object_info(the_repository, pack,
- offset, &data->info);
+ ret = packed_object_info(pack, offset, &data->info);
else
ret = odb_read_object_info_extended(the_repository->objects,
&data->oid, &data->info,
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 1ce8d6ee21..85762f8c4f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2411,7 +2411,7 @@ static void drop_reused_delta(struct object_entry *entry)
oi.sizep = &size;
oi.typep = &type;
- if (packed_object_info(the_repository, IN_PACK(entry), entry->in_pack_offset, &oi) < 0) {
+ if (packed_object_info(IN_PACK(entry), entry->in_pack_offset, &oi) < 0) {
/*
* We failed to get the info from this pack for some reason;
* fall back to odb_read_object_info, which may find another copy.
@@ -3748,7 +3748,7 @@ static int add_object_entry_from_pack(const struct object_id *oid,
struct object_info oi = OBJECT_INFO_INIT;
oi.typep = &type;
- if (packed_object_info(the_repository, p, ofs, &oi) < 0) {
+ if (packed_object_info(p, ofs, &oi) < 0) {
die(_("could not get type of object %s in pack %s"),
oid_to_hex(oid), p->pack_name);
} else if (type == OBJ_COMMIT) {
diff --git a/commit-graph.c b/commit-graph.c
index 80be2ff2c3..f572670bd0 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1499,7 +1499,7 @@ static int add_packed_commits(const struct object_id *oid,
display_progress(ctx->progress, ++ctx->progress_done);
oi.typep = &type;
- if (packed_object_info(ctx->r, pack, offset, &oi) < 0)
+ if (packed_object_info(pack, offset, &oi) < 0)
die(_("unable to get type of object %s"), oid_to_hex(oid));
if (type != OBJ_COMMIT)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 8ca79725b1..972203f12b 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1876,8 +1876,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
ofs = pack_pos_to_offset(pack, pos);
}
- if (packed_object_info(bitmap_repo(bitmap_git), pack, ofs,
- &oi) < 0) {
+ if (packed_object_info(pack, ofs, &oi) < 0) {
struct object_id oid;
nth_bitmap_object_oid(bitmap_git, &oid,
pack_pos_to_index(pack, pos));
diff --git a/packfile.c b/packfile.c
index a2ba237ce7..39899aec49 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1580,7 +1580,7 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
hashmap_add(&delta_base_cache, &ent->ent);
}
-int packed_object_info(struct repository *r, struct packed_git *p,
+int packed_object_info(struct packed_git *p,
off_t obj_offset, struct object_info *oi)
{
struct pack_window *w_curs = NULL;
@@ -1594,7 +1594,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
* a "real" type later if the caller is interested.
*/
if (oi->contentp) {
- *oi->contentp = cache_or_unpack_entry(r, p, obj_offset, oi->sizep,
+ *oi->contentp = cache_or_unpack_entry(p->repo, p, obj_offset, oi->sizep,
&type);
if (!*oi->contentp)
type = OBJ_BAD;
@@ -1635,7 +1635,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (oi->typep) {
enum object_type ptot;
- ptot = packed_to_object_type(r, p, obj_offset,
+ ptot = packed_to_object_type(p->repo, p, obj_offset,
type, &w_curs, curpos);
if (oi->typep)
*oi->typep = ptot;
@@ -2170,7 +2170,7 @@ int packfile_store_read_object_info(struct packfile_store *store,
if (!oi)
return 0;
- ret = packed_object_info(store->odb->repo, e.p, e.offset, oi);
+ ret = packed_object_info(e.p, e.offset, oi);
if (ret < 0) {
mark_bad_packed_object(e.p, oid);
return -1;
diff --git a/packfile.h b/packfile.h
index d7cce582af..33fed26362 100644
--- a/packfile.h
+++ b/packfile.h
@@ -382,8 +382,7 @@ extern int do_check_packed_object_crc;
* Look up the object info for a specific offset in the packfile.
* Returns zero on success, a negative error code otherwise.
*/
-int packed_object_info(struct repository *r,
- struct packed_git *pack,
+int packed_object_info(struct packed_git *pack,
off_t offset, struct object_info *);
void mark_bad_packed_object(struct packed_git *, const struct object_id *);
--
2.52.0.542.g9473a8513b.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH v4 1/7] object-file: always set OI_LOOSE when reading object info
2026-01-07 13:08 ` [PATCH v4 1/7] object-file: always set OI_LOOSE when " Patrick Steinhardt
@ 2026-01-08 9:30 ` Karthik Nayak
0 siblings, 0 replies; 58+ messages in thread
From: Karthik Nayak @ 2026-01-08 9:30 UTC (permalink / raw)
To: Patrick Steinhardt, git
Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes
[-- Attachment #1: Type: text/plain, Size: 4188 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
> There are some early returns in `odb_source_loose_read_object_info()`
> in cases where we don't have to open the loose object. These return
> paths do not set `struct object_info::whence` to `OI_LOOSE` though, so
> it becomes impossible for the caller to tell the format of such an
> object.
>
> The root cause of this really is that we have so many different return
> paths in the function. As a consequence, it's harder than necessary to
> make sure that all successful exit paths sot up the `whence` field as
> expected.
>
> Address this by refactoring the function to have a single exit path.
> Like this, we can trivially set up the `whence` field when we exit
> successfully from the function.
>
> Note that we also:
>
> - Rename `status` to `ret` to match our usual coding style, but also
> to show that the old `status` variable is now always getting the
> expected value. Furthermore, the value is not initialized anymore,
> which has the consequence that most compilers will warn for exit
> paths where we forgot to set it.
>
> - Move the setup of scratch pointers closer to `parse_loose_header()`
> to show where it's needed.
>
> - Guard a couple of variables on cleanup so that they only get
> released in case they have been set up.
>
> - Reset `oi->delta_base_oid` towards the end of the function, together
> with all the other object info pointers.
>
> Overall, all these changes result in a diff that is somewhat hard to
> read. But the end result is significantly easier to read and reason
> about, so I'd argue this one-time churn is worth it.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> object-file.c | 115 ++++++++++++++++++++++++++++++++++++----------------------
> 1 file changed, 71 insertions(+), 44 deletions(-)
>
> diff --git a/object-file.c b/object-file.c
> index 6280e42f34..e7e4c3348f 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -416,19 +416,16 @@ int odb_source_loose_read_object_info(struct odb_source *source,
> const struct object_id *oid,
> struct object_info *oi, int flags)
> {
> - int status = 0;
> + int ret;
> int fd;
> unsigned long mapsize;
> const char *path;
> - void *map;
> - git_zstream stream;
> + void *map = NULL;
> + git_zstream stream, *stream_to_end = NULL;
> char hdr[MAX_HEADER_LEN];
> unsigned long size_scratch;
> enum object_type type_scratch;
>
> - if (oi && oi->delta_base_oid)
> - oidclr(oi->delta_base_oid, source->odb->repo->hash_algo);
> -
> /*
> * If we don't care about type or size, then we don't
> * need to look inside the object at all. Note that we
> @@ -439,71 +436,101 @@ int odb_source_loose_read_object_info(struct odb_source *source,
> */
> if (!oi || (!oi->typep && !oi->sizep && !oi->contentp)) {
> struct stat st;
> - if ((!oi || !oi->disk_sizep) && (flags & OBJECT_INFO_QUICK))
> - return quick_has_loose(source->loose, oid) ? 0 : -1;
> - if (stat_loose_object(source->loose, oid, &st, &path) < 0)
> - return -1;
> +
> + if ((!oi || !oi->disk_sizep) && (flags & OBJECT_INFO_QUICK)) {
> + ret = quick_has_loose(source->loose, oid) ? 0 : -1;
> + goto out;
> + }
> +
> + if (stat_loose_object(source->loose, oid, &st, &path) < 0) {
> + ret = -1;
> + goto out;
> + }
> +
> if (oi && oi->disk_sizep)
> *oi->disk_sizep = st.st_size;
> - return 0;
> +
> + ret = 0;
> + goto out;
> }
>
> fd = open_loose_object(source->loose, oid, &path);
> if (fd < 0) {
> if (errno != ENOENT)
> error_errno(_("unable to open loose object %s"), oid_to_hex(oid));
> - return -1;
> + ret = -1;
> + goto out;
> }
> - map = map_fd(fd, path, &mapsize);
> - if (!map)
> - return -1;
>
> - if (!oi->sizep)
> - oi->sizep = &size_scratch;
> - if (!oi->typep)
> - oi->typep = &type_scratch;
> + map = map_fd(fd, path, &mapsize);
> + if (!map) {
> + ret = -1;
> + goto out;
> + }
>
> if (oi->disk_sizep)
> *oi->disk_sizep = mapsize;
>
> + stream_to_end = &stream;
> +
>
Okay we use `stream_to_end` to simply identify if the stream needs to be
cleared.
The changes look good and indeed the final outcome is better here.
Thanks.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v4 0/7] Improvements for reading object info
2026-01-07 13:07 ` [PATCH v4 " Patrick Steinhardt
` (6 preceding siblings ...)
2026-01-07 13:08 ` [PATCH v4 7/7] packfile: drop repository parameter from `packed_object_info()` Patrick Steinhardt
@ 2026-01-08 9:30 ` Karthik Nayak
7 siblings, 0 replies; 58+ messages in thread
From: Karthik Nayak @ 2026-01-08 9:30 UTC (permalink / raw)
To: Patrick Steinhardt, git
Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes
[-- Attachment #1: Type: text/plain, Size: 787 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
> Hi,
>
> this patch series contains various small improvements for reading object
> info for either loose or packed objects. These improvements were split
> out of a larger patch series where I'm about to introduce a new generic
> `odb_for_each_object()` function.
>
> Changes in v4:
> - Extend the fix for OI_LOOSE and refactor the whole function to have
> a single exit path as proposed by Karthik. This results in a lot
> more changes, but makes the function way easier to reason about
> going forward.
> - Link to v3: https://lore.kernel.org/r/20260106-b4-pks-odb-read-object-info-improvements-v3-0-b5e02fae1fb0@pks.im
>
I had a look at the first commit which was changed in this version and
it looks much nicer now. Thanks!
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH v5 0/7] Improvements for reading object info
2025-12-18 6:28 [PATCH 0/8] Improvements for reading object info Patrick Steinhardt
` (11 preceding siblings ...)
2026-01-07 13:07 ` [PATCH v4 " Patrick Steinhardt
@ 2026-01-12 9:00 ` Patrick Steinhardt
2026-01-12 9:00 ` [PATCH v5 1/7] object-file: always set OI_LOOSE when " Patrick Steinhardt
` (6 more replies)
12 siblings, 7 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-12 9:00 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes,
Karthik Nayak, Matt Smiley
Hi,
this patch series contains various small improvements for reading object
info for either loose or packed objects. These improvements were split
out of a larger patch series where I'm about to introduce a new generic
`odb_for_each_object()` function.
Changes in v5:
- I discovered that this patch series incidentally fixes a segfault
when using git-archive(1) to read deltified blobs that are larger
than "core.bigFileThreshold". So the only change is an added test
case that will detect this regression going forward.
- Link to v4: https://lore.kernel.org/r/20260107-b4-pks-odb-read-object-info-improvements-v4-0-b5d55c47082a@pks.im
Changes in v4:
- Extend the fix for OI_LOOSE and refactor the whole function to have
a single exit path as proposed by Karthik. This results in a lot
more changes, but makes the function way easier to reason about
going forward.
- Link to v3: https://lore.kernel.org/r/20260106-b4-pks-odb-read-object-info-improvements-v3-0-b5e02fae1fb0@pks.im
Changes in v3:
- Fix a commit message typo.
- Fix a function comment missing some words.
- Link to v2: https://lore.kernel.org/r/20251218-b4-pks-odb-read-object-info-improvements-v2-0-62e3e49072bc@pks.im
Changes in v2:
- Rebase the series on top of master with jc/object-read-stream-fix
merged into it. I've also evicted the patch that fixes the same
underlying issue.
- Improve the commit message that drops OI_DBCACHED to explain why
this is a safe refactoring.
- Link to v1: https://lore.kernel.org/r/20251218-b4-pks-odb-read-object-info-improvements-v1-0-81c8368492be@pks.im
Thanks!
Patrick
---
Patrick Steinhardt (7):
object-file: always set OI_LOOSE when reading object info
packfile: always declare object info to be OI_PACKED
packfile: extend `is_delta` field to allow for "unknown" state
packfile: always populate pack-specific info when reading object info
packfile: disentangle return value of `packed_object_info()`
packfile: skip unpacking object header for disk size requests
packfile: drop repository parameter from `packed_object_info()`
builtin/cat-file.c | 3 +-
builtin/pack-objects.c | 4 +-
commit-graph.c | 2 +-
object-file.c | 115 ++++++++++++++++++++++++++++++-------------------
odb.h | 8 +++-
pack-bitmap.c | 3 +-
packfile.c | 61 +++++++++++++++-----------
packfile.h | 7 ++-
t/t5003-archive-zip.sh | 34 +++++++++++++++
9 files changed, 158 insertions(+), 79 deletions(-)
Range-diff versus v4:
1: 07f529a631 = 1: da9d514001 object-file: always set OI_LOOSE when reading object info
2: b547df2885 ! 2: c7b29f3789 packfile: always declare object info to be OI_PACKED
@@ Commit message
Drop the OI_DBCACHED enum completely. None of the callers seem to care
about the distinction.
+ Note that this also fixes a segfault introduced in 8c1b84bc97
+ (streaming: move logic to read packed objects streams into backend,
+ 2025-11-23), which refactors how we stream packed objects. The intent is
+ to only read packed objects in case they are stored non-deltified as
+ we'd otherwise have to deflate them first. But the check for whether or
+ not the object is stored as a delta was unconditionally done via
+ `oi.u.packed.is_delta`, which is only valid in case `oi.whence` is
+ `OI_PACKED`. But under some circumstances we got `OI_DBCACHED` here,
+ which means that none of the `oi.u.packed` fields were initialized at
+ all. Consequently, we assumed the object was not stored as a delta, and
+ then try to read the object from `oi.u.packed.pack`, which is a `NULL`
+ pointer and thus causes a segfault.
+
+ Add a test case for this issue so that this cannot regress in the
+ future anymore.
+
+ Reported-by: Matt Smiley <msmiley@gitlab.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
## odb.h ##
@@ packfile.c: int packed_object_info(struct repository *r, struct packed_git *p,
out:
unuse_pack(&w_curs);
+
+ ## t/t5003-archive-zip.sh ##
+@@ t/t5003-archive-zip.sh: check_zip with_untracked2
+ check_added with_untracked2 untracked one/untracked
+ check_added with_untracked2 untracked two/untracked
+
++test_expect_success 'git-archive --format=zip with bigFile delta chains' '
++ test_when_finished rm -rf repo &&
++ git init repo &&
++ (
++ cd repo &&
++ test-tool genrandom foo 100000 >base &&
++ {
++ cat base &&
++ echo "trailing data"
++ } >delta-1 &&
++ {
++ cat delta-1 &&
++ echo "trailing data"
++ } >delta-2 &&
++ git add . &&
++ git commit -m "blobs" &&
++ git repack -Ad &&
++ git verify-pack -v .git/objects/pack/pack-*.idx >stats &&
++ test_grep "chain length = 1: 1 object" stats &&
++ test_grep "chain length = 2: 1 object" stats &&
++
++ git -c core.bigFileThreshold=1k archive --format=zip HEAD >archive.zip &&
++ if test_have_prereq UNZIP
++ then
++ mkdir unpack &&
++ cd unpack &&
++ "$GIT_UNZIP" ../archive.zip &&
++ test_cmp base ../base &&
++ test_cmp delta-1 ../delta-1 &&
++ test_cmp delta-2 ../delta-2
++ fi
++ )
++'
++
+ # Test remote archive over HTTP protocol.
+ #
+ # Note: this should be the last part of this test suite, because
3: 28940ce932 = 3: ef5ac585f0 packfile: extend `is_delta` field to allow for "unknown" state
4: c13c74467d = 4: 2a844d61fe packfile: always populate pack-specific info when reading object info
5: d3c17fcc71 = 5: a23f59d530 packfile: disentangle return value of `packed_object_info()`
6: 1c598686c5 = 6: f246dc3745 packfile: skip unpacking object header for disk size requests
7: afc5d85991 = 7: a0c4f59547 packfile: drop repository parameter from `packed_object_info()`
---
base-commit: 7df68b50e49b6a1b576abb19b2e5d457749bc28b
change-id: 20251215-b4-pks-odb-read-object-info-improvements-0e031ef827d2
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH v5 1/7] object-file: always set OI_LOOSE when reading object info
2026-01-12 9:00 ` [PATCH v5 " Patrick Steinhardt
@ 2026-01-12 9:00 ` Patrick Steinhardt
2026-01-12 9:00 ` [PATCH v5 2/7] packfile: always declare object info to be OI_PACKED Patrick Steinhardt
` (5 subsequent siblings)
6 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-12 9:00 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes,
Karthik Nayak
There are some early returns in `odb_source_loose_read_object_info()`
in cases where we don't have to open the loose object. These return
paths do not set `struct object_info::whence` to `OI_LOOSE` though, so
it becomes impossible for the caller to tell the format of such an
object.
The root cause of this really is that we have so many different return
paths in the function. As a consequence, it's harder than necessary to
make sure that all successful exit paths sot up the `whence` field as
expected.
Address this by refactoring the function to have a single exit path.
Like this, we can trivially set up the `whence` field when we exit
successfully from the function.
Note that we also:
- Rename `status` to `ret` to match our usual coding style, but also
to show that the old `status` variable is now always getting the
expected value. Furthermore, the value is not initialized anymore,
which has the consequence that most compilers will warn for exit
paths where we forgot to set it.
- Move the setup of scratch pointers closer to `parse_loose_header()`
to show where it's needed.
- Guard a couple of variables on cleanup so that they only get
released in case they have been set up.
- Reset `oi->delta_base_oid` towards the end of the function, together
with all the other object info pointers.
Overall, all these changes result in a diff that is somewhat hard to
read. But the end result is significantly easier to read and reason
about, so I'd argue this one-time churn is worth it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
object-file.c | 115 ++++++++++++++++++++++++++++++++++++----------------------
1 file changed, 71 insertions(+), 44 deletions(-)
diff --git a/object-file.c b/object-file.c
index 6280e42f34..e7e4c3348f 100644
--- a/object-file.c
+++ b/object-file.c
@@ -416,19 +416,16 @@ int odb_source_loose_read_object_info(struct odb_source *source,
const struct object_id *oid,
struct object_info *oi, int flags)
{
- int status = 0;
+ int ret;
int fd;
unsigned long mapsize;
const char *path;
- void *map;
- git_zstream stream;
+ void *map = NULL;
+ git_zstream stream, *stream_to_end = NULL;
char hdr[MAX_HEADER_LEN];
unsigned long size_scratch;
enum object_type type_scratch;
- if (oi && oi->delta_base_oid)
- oidclr(oi->delta_base_oid, source->odb->repo->hash_algo);
-
/*
* If we don't care about type or size, then we don't
* need to look inside the object at all. Note that we
@@ -439,71 +436,101 @@ int odb_source_loose_read_object_info(struct odb_source *source,
*/
if (!oi || (!oi->typep && !oi->sizep && !oi->contentp)) {
struct stat st;
- if ((!oi || !oi->disk_sizep) && (flags & OBJECT_INFO_QUICK))
- return quick_has_loose(source->loose, oid) ? 0 : -1;
- if (stat_loose_object(source->loose, oid, &st, &path) < 0)
- return -1;
+
+ if ((!oi || !oi->disk_sizep) && (flags & OBJECT_INFO_QUICK)) {
+ ret = quick_has_loose(source->loose, oid) ? 0 : -1;
+ goto out;
+ }
+
+ if (stat_loose_object(source->loose, oid, &st, &path) < 0) {
+ ret = -1;
+ goto out;
+ }
+
if (oi && oi->disk_sizep)
*oi->disk_sizep = st.st_size;
- return 0;
+
+ ret = 0;
+ goto out;
}
fd = open_loose_object(source->loose, oid, &path);
if (fd < 0) {
if (errno != ENOENT)
error_errno(_("unable to open loose object %s"), oid_to_hex(oid));
- return -1;
+ ret = -1;
+ goto out;
}
- map = map_fd(fd, path, &mapsize);
- if (!map)
- return -1;
- if (!oi->sizep)
- oi->sizep = &size_scratch;
- if (!oi->typep)
- oi->typep = &type_scratch;
+ map = map_fd(fd, path, &mapsize);
+ if (!map) {
+ ret = -1;
+ goto out;
+ }
if (oi->disk_sizep)
*oi->disk_sizep = mapsize;
+ stream_to_end = &stream;
+
switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr))) {
case ULHR_OK:
- if (parse_loose_header(hdr, oi) < 0)
- status = error(_("unable to parse %s header"), oid_to_hex(oid));
- else if (*oi->typep < 0)
+ if (!oi->sizep)
+ oi->sizep = &size_scratch;
+ if (!oi->typep)
+ oi->typep = &type_scratch;
+
+ if (parse_loose_header(hdr, oi) < 0) {
+ ret = error(_("unable to parse %s header"), oid_to_hex(oid));
+ goto corrupt;
+ }
+
+ if (*oi->typep < 0)
die(_("invalid object type"));
- if (!oi->contentp)
- break;
- *oi->contentp = unpack_loose_rest(&stream, hdr, *oi->sizep, oid);
- if (*oi->contentp)
- goto cleanup;
+ if (oi->contentp) {
+ *oi->contentp = unpack_loose_rest(&stream, hdr, *oi->sizep, oid);
+ if (!*oi->contentp) {
+ ret = -1;
+ goto corrupt;
+ }
+ }
- status = -1;
break;
case ULHR_BAD:
- status = error(_("unable to unpack %s header"),
- oid_to_hex(oid));
- break;
+ ret = error(_("unable to unpack %s header"),
+ oid_to_hex(oid));
+ goto corrupt;
case ULHR_TOO_LONG:
- status = error(_("header for %s too long, exceeds %d bytes"),
- oid_to_hex(oid), MAX_HEADER_LEN);
- break;
+ ret = error(_("header for %s too long, exceeds %d bytes"),
+ oid_to_hex(oid), MAX_HEADER_LEN);
+ goto corrupt;
}
- if (status && (flags & OBJECT_INFO_DIE_IF_CORRUPT))
+ ret = 0;
+
+corrupt:
+ if (ret && (flags & OBJECT_INFO_DIE_IF_CORRUPT))
die(_("loose object %s (stored in %s) is corrupt"),
oid_to_hex(oid), path);
-cleanup:
- git_inflate_end(&stream);
- munmap(map, mapsize);
- if (oi->sizep == &size_scratch)
- oi->sizep = NULL;
- if (oi->typep == &type_scratch)
- oi->typep = NULL;
- oi->whence = OI_LOOSE;
- return status;
+out:
+ if (stream_to_end)
+ git_inflate_end(stream_to_end);
+ if (map)
+ munmap(map, mapsize);
+ if (oi) {
+ if (oi->sizep == &size_scratch)
+ oi->sizep = NULL;
+ if (oi->typep == &type_scratch)
+ oi->typep = NULL;
+ if (oi->delta_base_oid)
+ oidclr(oi->delta_base_oid, source->odb->repo->hash_algo);
+ if (!ret)
+ oi->whence = OI_LOOSE;
+ }
+
+ return ret;
}
static void hash_object_body(const struct git_hash_algo *algo, struct git_hash_ctx *c,
--
2.52.0.590.g1f87b77810.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 2/7] packfile: always declare object info to be OI_PACKED
2026-01-12 9:00 ` [PATCH v5 " Patrick Steinhardt
2026-01-12 9:00 ` [PATCH v5 1/7] object-file: always set OI_LOOSE when " Patrick Steinhardt
@ 2026-01-12 9:00 ` Patrick Steinhardt
2026-01-12 14:54 ` Junio C Hamano
2026-01-12 9:00 ` [PATCH v5 3/7] packfile: extend `is_delta` field to allow for "unknown" state Patrick Steinhardt
` (4 subsequent siblings)
6 siblings, 1 reply; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-12 9:00 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes,
Karthik Nayak, Matt Smiley
When reading object info via a packfile we yield one of two types:
- The object can either be OI_PACKED, which is what a caller would
typically expect.
- Or it can be OI_DBCACHED if it is stored in the delta base cache.
The latter really is an implementation detail though, and callers
typically don't care at all about the difference. Furthermore, the
information whether or not it is part of the delta base cache can
already be derived via the `is_delta` field, so the fact that we discern
between OI_PACKED and OI_DBCACHED only further complicates the
interface.
There aren't all that many callers that care about the `whence` field in
the first place. In fact, there's only three:
- `packfile_store_read_object_info()` checks for `whence == OI_PACKED`
and then populates the packfile information of the object info
structure. We now start to do this also for deltified objects, which
gives its callers strictly more information.
- `repack_local_links()` wants to determine whether the object is part
of a promisor pack and checks for `whence == OI_PACKED`. If so, it
verifies that the packfile is a promisor pack. It's arguably wrong
to declare that an object is not part of a promisor pack only
because it is stored in the delta base cache.
- `is_not_in_promisor_pack_obj()` does the same, but checks that a
specific object is _not_ part of a promisor pack. The same reasoning
as above applies.
Drop the OI_DBCACHED enum completely. None of the callers seem to care
about the distinction.
Note that this also fixes a segfault introduced in 8c1b84bc97
(streaming: move logic to read packed objects streams into backend,
2025-11-23), which refactors how we stream packed objects. The intent is
to only read packed objects in case they are stored non-deltified as
we'd otherwise have to deflate them first. But the check for whether or
not the object is stored as a delta was unconditionally done via
`oi.u.packed.is_delta`, which is only valid in case `oi.whence` is
`OI_PACKED`. But under some circumstances we got `OI_DBCACHED` here,
which means that none of the `oi.u.packed` fields were initialized at
all. Consequently, we assumed the object was not stored as a delta, and
then try to read the object from `oi.u.packed.pack`, which is a `NULL`
pointer and thus causes a segfault.
Add a test case for this issue so that this cannot regress in the
future anymore.
Reported-by: Matt Smiley <msmiley@gitlab.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.h | 1 -
packfile.c | 3 +--
t/t5003-archive-zip.sh | 34 ++++++++++++++++++++++++++++++++++
3 files changed, 35 insertions(+), 3 deletions(-)
diff --git a/odb.h b/odb.h
index 014cd9585a..73b0b87ad5 100644
--- a/odb.h
+++ b/odb.h
@@ -330,7 +330,6 @@ struct object_info {
OI_CACHED,
OI_LOOSE,
OI_PACKED,
- OI_DBCACHED
} whence;
union {
/*
diff --git a/packfile.c b/packfile.c
index 08a0863fc3..b0c6665c87 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1656,8 +1656,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
oidclr(oi->delta_base_oid, p->repo->hash_algo);
}
- oi->whence = in_delta_base_cache(p, obj_offset) ? OI_DBCACHED :
- OI_PACKED;
+ oi->whence = OI_PACKED;
out:
unuse_pack(&w_curs);
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index 961c6aac25..c8c1c5c06b 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -239,6 +239,40 @@ check_zip with_untracked2
check_added with_untracked2 untracked one/untracked
check_added with_untracked2 untracked two/untracked
+test_expect_success 'git-archive --format=zip with bigFile delta chains' '
+ test_when_finished rm -rf repo &&
+ git init repo &&
+ (
+ cd repo &&
+ test-tool genrandom foo 100000 >base &&
+ {
+ cat base &&
+ echo "trailing data"
+ } >delta-1 &&
+ {
+ cat delta-1 &&
+ echo "trailing data"
+ } >delta-2 &&
+ git add . &&
+ git commit -m "blobs" &&
+ git repack -Ad &&
+ git verify-pack -v .git/objects/pack/pack-*.idx >stats &&
+ test_grep "chain length = 1: 1 object" stats &&
+ test_grep "chain length = 2: 1 object" stats &&
+
+ git -c core.bigFileThreshold=1k archive --format=zip HEAD >archive.zip &&
+ if test_have_prereq UNZIP
+ then
+ mkdir unpack &&
+ cd unpack &&
+ "$GIT_UNZIP" ../archive.zip &&
+ test_cmp base ../base &&
+ test_cmp delta-1 ../delta-1 &&
+ test_cmp delta-2 ../delta-2
+ fi
+ )
+'
+
# Test remote archive over HTTP protocol.
#
# Note: this should be the last part of this test suite, because
--
2.52.0.590.g1f87b77810.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 3/7] packfile: extend `is_delta` field to allow for "unknown" state
2026-01-12 9:00 ` [PATCH v5 " Patrick Steinhardt
2026-01-12 9:00 ` [PATCH v5 1/7] object-file: always set OI_LOOSE when " Patrick Steinhardt
2026-01-12 9:00 ` [PATCH v5 2/7] packfile: always declare object info to be OI_PACKED Patrick Steinhardt
@ 2026-01-12 9:00 ` Patrick Steinhardt
2026-01-12 9:00 ` [PATCH v5 4/7] packfile: always populate pack-specific info when reading object info Patrick Steinhardt
` (3 subsequent siblings)
6 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-12 9:00 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes,
Karthik Nayak
The `struct object_info::u::packed::is_delta` field determines whether
or not a specific object is stored as a delta. It only stores whether or
not the object is stored as delta, so it is treated as a boolean value.
This boolean is insufficient though: when reading a packed object via
`packfile_store_read_object_info()` we know to skip parsing the actual
object when the user didn't request any object-specific data. In that
case we won't read the object itself, but will only look up its position
in the packfile. Consequently, we do not know whether it is a delta or
not.
This isn't really an issue right now, as the check for an empty request
is broken. But a subsequent commit will fix it, and once we do we will
have the need to also represent an "unknown" delta state.
Prepare for this change by introducing a new enum that encodes the
object type. We don't use the "unknown" state just yet, but will start
to do so in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.h | 7 ++++++-
packfile.c | 17 ++++++++++++++---
2 files changed, 20 insertions(+), 4 deletions(-)
diff --git a/odb.h b/odb.h
index 73b0b87ad5..afae5e5c01 100644
--- a/odb.h
+++ b/odb.h
@@ -343,7 +343,12 @@ struct object_info {
struct {
struct packed_git *pack;
off_t offset;
- unsigned int is_delta;
+ enum packed_object_type {
+ PACKED_OBJECT_TYPE_UNKNOWN,
+ PACKED_OBJECT_TYPE_FULL,
+ PACKED_OBJECT_TYPE_OFS_DELTA,
+ PACKED_OBJECT_TYPE_REF_DELTA,
+ } type;
} packed;
} u;
};
diff --git a/packfile.c b/packfile.c
index b0c6665c87..cc797b2b6a 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2159,8 +2159,18 @@ int packfile_store_read_object_info(struct packfile_store *store,
if (oi->whence == OI_PACKED) {
oi->u.packed.offset = e.offset;
oi->u.packed.pack = e.p;
- oi->u.packed.is_delta = (rtype == OBJ_REF_DELTA ||
- rtype == OBJ_OFS_DELTA);
+
+ switch (rtype) {
+ case OBJ_REF_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
+ break;
+ case OBJ_OFS_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA;
+ break;
+ default:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_FULL;
+ break;
+ }
}
return 0;
@@ -2531,7 +2541,8 @@ int packfile_store_read_object_stream(struct odb_read_stream **out,
oi.sizep = &size;
if (packfile_store_read_object_info(store, oid, &oi, 0) ||
- oi.u.packed.is_delta ||
+ oi.u.packed.type == PACKED_OBJECT_TYPE_REF_DELTA ||
+ oi.u.packed.type == PACKED_OBJECT_TYPE_OFS_DELTA ||
repo_settings_get_big_file_threshold(store->odb->repo) >= size)
return -1;
--
2.52.0.590.g1f87b77810.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 4/7] packfile: always populate pack-specific info when reading object info
2026-01-12 9:00 ` [PATCH v5 " Patrick Steinhardt
` (2 preceding siblings ...)
2026-01-12 9:00 ` [PATCH v5 3/7] packfile: extend `is_delta` field to allow for "unknown" state Patrick Steinhardt
@ 2026-01-12 9:00 ` Patrick Steinhardt
2026-01-12 9:00 ` [PATCH v5 5/7] packfile: disentangle return value of `packed_object_info()` Patrick Steinhardt
` (2 subsequent siblings)
6 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-12 9:00 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes,
Karthik Nayak
When reading object information via `packed_object_info()` we may not
populate the object info's packfile-specific fields. This leads to
inconsistent object info depending on whether the info was populated via
`packfile_store_read_object_info()` or `packed_object_info()`.
Fix this inconsistency so that we can always assume the pack info to be
populated when reading object info from a pack.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 31 ++++++++++++++-----------------
1 file changed, 14 insertions(+), 17 deletions(-)
diff --git a/packfile.c b/packfile.c
index cc797b2b6a..f7c33a2f77 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1657,6 +1657,20 @@ int packed_object_info(struct repository *r, struct packed_git *p,
}
oi->whence = OI_PACKED;
+ oi->u.packed.offset = obj_offset;
+ oi->u.packed.pack = p;
+
+ switch (type) {
+ case OBJ_REF_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
+ break;
+ case OBJ_OFS_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA;
+ break;
+ default:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_FULL;
+ break;
+ }
out:
unuse_pack(&w_curs);
@@ -2156,23 +2170,6 @@ int packfile_store_read_object_info(struct packfile_store *store,
return -1;
}
- if (oi->whence == OI_PACKED) {
- oi->u.packed.offset = e.offset;
- oi->u.packed.pack = e.p;
-
- switch (rtype) {
- case OBJ_REF_DELTA:
- oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
- break;
- case OBJ_OFS_DELTA:
- oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA;
- break;
- default:
- oi->u.packed.type = PACKED_OBJECT_TYPE_FULL;
- break;
- }
- }
-
return 0;
}
--
2.52.0.590.g1f87b77810.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 5/7] packfile: disentangle return value of `packed_object_info()`
2026-01-12 9:00 ` [PATCH v5 " Patrick Steinhardt
` (3 preceding siblings ...)
2026-01-12 9:00 ` [PATCH v5 4/7] packfile: always populate pack-specific info when reading object info Patrick Steinhardt
@ 2026-01-12 9:00 ` Patrick Steinhardt
2026-01-12 9:00 ` [PATCH v5 6/7] packfile: skip unpacking object header for disk size requests Patrick Steinhardt
2026-01-12 9:00 ` [PATCH v5 7/7] packfile: drop repository parameter from `packed_object_info()` Patrick Steinhardt
6 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-12 9:00 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes,
Karthik Nayak
The `packed_object_info()` function returns the type of the packed
object. While we use an `enum object_type` to store the return value,
this type is not to be confused with the actual object type. It _may_
contain the object type, but it may just as well encode that the given
packed object is stored as a delta.
We have removed the only caller that relied on this returned object type
in the preceding commit, so let's simplify semantics and return either 0
on success or a negative error code otherwise.
This unblocks a small optimization where we can skip reading the object
type altogether.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 21 ++++++++++++---------
packfile.h | 4 ++++
2 files changed, 16 insertions(+), 9 deletions(-)
diff --git a/packfile.c b/packfile.c
index f7c33a2f77..8c6ef45a67 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1587,6 +1587,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
unsigned long size;
off_t curpos = obj_offset;
enum object_type type;
+ int ret;
/*
* We always get the representation type, but only convert it to
@@ -1607,12 +1608,12 @@ int packed_object_info(struct repository *r, struct packed_git *p,
off_t base_offset = get_delta_base(p, &w_curs, &tmp_pos,
type, obj_offset);
if (!base_offset) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
*oi->sizep = get_size_from_delta(p, &w_curs, tmp_pos);
if (*oi->sizep == 0) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
} else {
@@ -1625,7 +1626,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (offset_to_pack_pos(p, obj_offset, &pos) < 0) {
error("could not find object at offset %"PRIuMAX" "
"in pack %s", (uintmax_t)obj_offset, p->pack_name);
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
@@ -1639,7 +1640,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (oi->typep)
*oi->typep = ptot;
if (ptot < 0) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
}
@@ -1649,7 +1650,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (get_delta_base_oid(p, &w_curs, curpos,
oi->delta_base_oid,
type, obj_offset) < 0) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
} else
@@ -1672,9 +1673,11 @@ int packed_object_info(struct repository *r, struct packed_git *p,
break;
}
+ ret = 0;
+
out:
unuse_pack(&w_curs);
- return type;
+ return ret;
}
static void *unpack_compressed_entry(struct packed_git *p,
@@ -2152,7 +2155,7 @@ int packfile_store_read_object_info(struct packfile_store *store,
unsigned flags UNUSED)
{
struct pack_entry e;
- int rtype;
+ int ret;
if (!find_pack_entry(store->odb->repo, oid, &e))
return 1;
@@ -2164,8 +2167,8 @@ int packfile_store_read_object_info(struct packfile_store *store,
if (!oi)
return 0;
- rtype = packed_object_info(store->odb->repo, e.p, e.offset, oi);
- if (rtype < 0) {
+ ret = packed_object_info(store->odb->repo, e.p, e.offset, oi);
+ if (ret < 0) {
mark_bad_packed_object(e.p, oid);
return -1;
}
diff --git a/packfile.h b/packfile.h
index 59d162a3f4..d7cce582af 100644
--- a/packfile.h
+++ b/packfile.h
@@ -378,6 +378,10 @@ void release_pack_memory(size_t);
/* global flag to enable extra checks when accessing packed objects */
extern int do_check_packed_object_crc;
+/*
+ * Look up the object info for a specific offset in the packfile.
+ * Returns zero on success, a negative error code otherwise.
+ */
int packed_object_info(struct repository *r,
struct packed_git *pack,
off_t offset, struct object_info *);
--
2.52.0.590.g1f87b77810.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 6/7] packfile: skip unpacking object header for disk size requests
2026-01-12 9:00 ` [PATCH v5 " Patrick Steinhardt
` (4 preceding siblings ...)
2026-01-12 9:00 ` [PATCH v5 5/7] packfile: disentangle return value of `packed_object_info()` Patrick Steinhardt
@ 2026-01-12 9:00 ` Patrick Steinhardt
2026-01-12 9:00 ` [PATCH v5 7/7] packfile: drop repository parameter from `packed_object_info()` Patrick Steinhardt
6 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-12 9:00 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes,
Karthik Nayak
While most of the object info requests for a packed object require us to
unpack its headers, reading its disk size doesn't. We still unpack the
object header in that case though, which is unnecessary work.
Skip reading the header if only the disk size is requested. This leads
to a small speedup when reading disk size, only. The following benchmark
was done in the Git repository:
Benchmark 1: ./git rev-list --disk-usage HEAD (rev = HEAD~)
Time (mean ± σ): 105.2 ms ± 0.6 ms [User: 91.4 ms, System: 13.3 ms]
Range (min … max): 103.7 ms … 106.0 ms 27 runs
Benchmark 2: ./git rev-list --disk-usage HEAD (rev = HEAD)
Time (mean ± σ): 96.7 ms ± 0.4 ms [User: 86.2 ms, System: 10.0 ms]
Range (min … max): 96.2 ms … 98.1 ms 30 runs
Summary
./git rev-list --disk-usage HEAD (rev = HEAD) ran
1.09 ± 0.01 times faster than ./git rev-list --disk-usage HEAD (rev = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/packfile.c b/packfile.c
index 8c6ef45a67..a2ba237ce7 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1586,7 +1586,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
struct pack_window *w_curs = NULL;
unsigned long size;
off_t curpos = obj_offset;
- enum object_type type;
+ enum object_type type = OBJ_NONE;
int ret;
/*
@@ -1598,7 +1598,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
&type);
if (!*oi->contentp)
type = OBJ_BAD;
- } else {
+ } else if (oi->sizep || oi->typep || oi->delta_base_oid) {
type = unpack_object_header(p, &w_curs, &curpos, &size);
}
@@ -1662,6 +1662,9 @@ int packed_object_info(struct repository *r, struct packed_git *p,
oi->u.packed.pack = p;
switch (type) {
+ case OBJ_NONE:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_UNKNOWN;
+ break;
case OBJ_REF_DELTA:
oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
break;
--
2.52.0.590.g1f87b77810.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH v5 7/7] packfile: drop repository parameter from `packed_object_info()`
2026-01-12 9:00 ` [PATCH v5 " Patrick Steinhardt
` (5 preceding siblings ...)
2026-01-12 9:00 ` [PATCH v5 6/7] packfile: skip unpacking object header for disk size requests Patrick Steinhardt
@ 2026-01-12 9:00 ` Patrick Steinhardt
6 siblings, 0 replies; 58+ messages in thread
From: Patrick Steinhardt @ 2026-01-12 9:00 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes,
Karthik Nayak
The function `packed_object_info()` takes a packfile and offset and
returns the object info for the corresponding object. Despite these two
parameters though it also takes a repository pointer. This is redundant
information though, as `struct packed_git` already has a repository
pointer that is always populated.
Drop the redundant parameter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/cat-file.c | 3 +--
builtin/pack-objects.c | 4 ++--
commit-graph.c | 2 +-
pack-bitmap.c | 3 +--
packfile.c | 8 ++++----
packfile.h | 3 +--
6 files changed, 10 insertions(+), 13 deletions(-)
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 505ddaa12f..2ad712e9f8 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -487,8 +487,7 @@ static void batch_object_write(const char *obj_name,
data->info.sizep = &data->size;
if (pack)
- ret = packed_object_info(the_repository, pack,
- offset, &data->info);
+ ret = packed_object_info(pack, offset, &data->info);
else
ret = odb_read_object_info_extended(the_repository->objects,
&data->oid, &data->info,
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 1ce8d6ee21..85762f8c4f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2411,7 +2411,7 @@ static void drop_reused_delta(struct object_entry *entry)
oi.sizep = &size;
oi.typep = &type;
- if (packed_object_info(the_repository, IN_PACK(entry), entry->in_pack_offset, &oi) < 0) {
+ if (packed_object_info(IN_PACK(entry), entry->in_pack_offset, &oi) < 0) {
/*
* We failed to get the info from this pack for some reason;
* fall back to odb_read_object_info, which may find another copy.
@@ -3748,7 +3748,7 @@ static int add_object_entry_from_pack(const struct object_id *oid,
struct object_info oi = OBJECT_INFO_INIT;
oi.typep = &type;
- if (packed_object_info(the_repository, p, ofs, &oi) < 0) {
+ if (packed_object_info(p, ofs, &oi) < 0) {
die(_("could not get type of object %s in pack %s"),
oid_to_hex(oid), p->pack_name);
} else if (type == OBJ_COMMIT) {
diff --git a/commit-graph.c b/commit-graph.c
index 80be2ff2c3..f572670bd0 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1499,7 +1499,7 @@ static int add_packed_commits(const struct object_id *oid,
display_progress(ctx->progress, ++ctx->progress_done);
oi.typep = &type;
- if (packed_object_info(ctx->r, pack, offset, &oi) < 0)
+ if (packed_object_info(pack, offset, &oi) < 0)
die(_("unable to get type of object %s"), oid_to_hex(oid));
if (type != OBJ_COMMIT)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 8ca79725b1..972203f12b 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1876,8 +1876,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
ofs = pack_pos_to_offset(pack, pos);
}
- if (packed_object_info(bitmap_repo(bitmap_git), pack, ofs,
- &oi) < 0) {
+ if (packed_object_info(pack, ofs, &oi) < 0) {
struct object_id oid;
nth_bitmap_object_oid(bitmap_git, &oid,
pack_pos_to_index(pack, pos));
diff --git a/packfile.c b/packfile.c
index a2ba237ce7..39899aec49 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1580,7 +1580,7 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
hashmap_add(&delta_base_cache, &ent->ent);
}
-int packed_object_info(struct repository *r, struct packed_git *p,
+int packed_object_info(struct packed_git *p,
off_t obj_offset, struct object_info *oi)
{
struct pack_window *w_curs = NULL;
@@ -1594,7 +1594,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
* a "real" type later if the caller is interested.
*/
if (oi->contentp) {
- *oi->contentp = cache_or_unpack_entry(r, p, obj_offset, oi->sizep,
+ *oi->contentp = cache_or_unpack_entry(p->repo, p, obj_offset, oi->sizep,
&type);
if (!*oi->contentp)
type = OBJ_BAD;
@@ -1635,7 +1635,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (oi->typep) {
enum object_type ptot;
- ptot = packed_to_object_type(r, p, obj_offset,
+ ptot = packed_to_object_type(p->repo, p, obj_offset,
type, &w_curs, curpos);
if (oi->typep)
*oi->typep = ptot;
@@ -2170,7 +2170,7 @@ int packfile_store_read_object_info(struct packfile_store *store,
if (!oi)
return 0;
- ret = packed_object_info(store->odb->repo, e.p, e.offset, oi);
+ ret = packed_object_info(e.p, e.offset, oi);
if (ret < 0) {
mark_bad_packed_object(e.p, oid);
return -1;
diff --git a/packfile.h b/packfile.h
index d7cce582af..33fed26362 100644
--- a/packfile.h
+++ b/packfile.h
@@ -382,8 +382,7 @@ extern int do_check_packed_object_crc;
* Look up the object info for a specific offset in the packfile.
* Returns zero on success, a negative error code otherwise.
*/
-int packed_object_info(struct repository *r,
- struct packed_git *pack,
+int packed_object_info(struct packed_git *pack,
off_t offset, struct object_info *);
void mark_bad_packed_object(struct packed_git *, const struct object_id *);
--
2.52.0.590.g1f87b77810.dirty
^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH v5 2/7] packfile: always declare object info to be OI_PACKED
2026-01-12 9:00 ` [PATCH v5 2/7] packfile: always declare object info to be OI_PACKED Patrick Steinhardt
@ 2026-01-12 14:54 ` Junio C Hamano
0 siblings, 0 replies; 58+ messages in thread
From: Junio C Hamano @ 2026-01-12 14:54 UTC (permalink / raw)
To: Patrick Steinhardt
Cc: git, Aaron Plattner, Kristoffer Haugsbakk, Toon Claes,
Karthik Nayak, Matt Smiley
Patrick Steinhardt <ps@pks.im> writes:
> Note that this also fixes a segfault introduced in 8c1b84bc97
> (streaming: move logic to read packed objects streams into backend,
> 2025-11-23), which refactors how we stream packed objects. The intent is
> to only read packed objects in case they are stored non-deltified as
> we'd otherwise have to deflate them first. But the check for whether or
> not the object is stored as a delta was unconditionally done via
> `oi.u.packed.is_delta`, which is only valid in case `oi.whence` is
> `OI_PACKED`. But under some circumstances we got `OI_DBCACHED` here,
> which means that none of the `oi.u.packed` fields were initialized at
> all. Consequently, we assumed the object was not stored as a delta, and
> then try to read the object from `oi.u.packed.pack`, which is a `NULL`
> pointer and thus causes a segfault.
>
> Add a test case for this issue so that this cannot regress in the
> future anymore.
Great. Thanks. Will requeue.
> Reported-by: Matt Smiley <msmiley@gitlab.com>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> odb.h | 1 -
> packfile.c | 3 +--
> t/t5003-archive-zip.sh | 34 ++++++++++++++++++++++++++++++++++
> 3 files changed, 35 insertions(+), 3 deletions(-)
>
> diff --git a/odb.h b/odb.h
> index 014cd9585a..73b0b87ad5 100644
> --- a/odb.h
> +++ b/odb.h
> @@ -330,7 +330,6 @@ struct object_info {
> OI_CACHED,
> OI_LOOSE,
> OI_PACKED,
> - OI_DBCACHED
> } whence;
^ permalink raw reply [flat|nested] 58+ messages in thread
end of thread, other threads:[~2026-01-12 14:54 UTC | newest]
Thread overview: 58+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-18 6:28 [PATCH 0/8] Improvements for reading object info Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 1/8] object-file: always set OI_LOOSE when " Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 2/8] packfile: always declare object info to be OI_PACKED Patrick Steinhardt
2025-12-18 7:23 ` Junio C Hamano
2025-12-18 9:10 ` Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 3/8] packfile: extend `is_delta` field to allow for "unknown" state Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 4/8] packfile: always populate pack-specific info when reading object info Patrick Steinhardt
2025-12-18 7:32 ` Junio C Hamano
2025-12-18 6:28 ` [PATCH 5/8] packfile: disentangle return value of `packed_object_info()` Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 6/8] packfile: skip unpacking object header for disk size requests Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 7/8] packfile: fix short-circuiting of empty requests Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 8/8] packfile: drop repository parameter from `packed_object_info()` Patrick Steinhardt
2025-12-18 8:09 ` [PATCH 0/8] Improvements for reading object info Junio C Hamano
2025-12-18 8:30 ` Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 0/7] " Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 1/7] object-file: always set OI_LOOSE when " Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 2/7] packfile: always declare object info to be OI_PACKED Patrick Steinhardt
2026-01-05 14:29 ` Toon Claes
2025-12-18 10:54 ` [PATCH v2 3/7] packfile: extend `is_delta` field to allow for "unknown" state Patrick Steinhardt
2026-01-05 15:35 ` Toon Claes
2026-01-06 6:34 ` Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 4/7] packfile: always populate pack-specific info when reading object info Patrick Steinhardt
2025-12-30 17:03 ` Kristoffer Haugsbakk
2026-01-05 11:38 ` Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 5/7] packfile: disentangle return value of `packed_object_info()` Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 6/7] packfile: skip unpacking object header for disk size requests Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 7/7] packfile: drop repository parameter from `packed_object_info()` Patrick Steinhardt
2026-01-06 6:54 ` [PATCH v3 0/7] Improvements for reading object info Patrick Steinhardt
2026-01-06 6:54 ` [PATCH v3 1/7] object-file: always set OI_LOOSE when " Patrick Steinhardt
2026-01-07 8:50 ` Karthik Nayak
2026-01-07 11:27 ` Patrick Steinhardt
2026-01-06 6:54 ` [PATCH v3 2/7] packfile: always declare object info to be OI_PACKED Patrick Steinhardt
2026-01-06 6:54 ` [PATCH v3 3/7] packfile: extend `is_delta` field to allow for "unknown" state Patrick Steinhardt
2026-01-07 10:12 ` Karthik Nayak
2026-01-06 6:55 ` [PATCH v3 4/7] packfile: always populate pack-specific info when reading object info Patrick Steinhardt
2026-01-06 6:55 ` [PATCH v3 5/7] packfile: disentangle return value of `packed_object_info()` Patrick Steinhardt
2026-01-06 6:55 ` [PATCH v3 6/7] packfile: skip unpacking object header for disk size requests Patrick Steinhardt
2026-01-06 6:55 ` [PATCH v3 7/7] packfile: drop repository parameter from `packed_object_info()` Patrick Steinhardt
2026-01-07 10:17 ` [PATCH v3 0/7] Improvements for reading object info Karthik Nayak
2026-01-07 13:07 ` [PATCH v4 " Patrick Steinhardt
2026-01-07 13:08 ` [PATCH v4 1/7] object-file: always set OI_LOOSE when " Patrick Steinhardt
2026-01-08 9:30 ` Karthik Nayak
2026-01-07 13:08 ` [PATCH v4 2/7] packfile: always declare object info to be OI_PACKED Patrick Steinhardt
2026-01-07 13:08 ` [PATCH v4 3/7] packfile: extend `is_delta` field to allow for "unknown" state Patrick Steinhardt
2026-01-07 13:08 ` [PATCH v4 4/7] packfile: always populate pack-specific info when reading object info Patrick Steinhardt
2026-01-07 13:08 ` [PATCH v4 5/7] packfile: disentangle return value of `packed_object_info()` Patrick Steinhardt
2026-01-07 13:08 ` [PATCH v4 6/7] packfile: skip unpacking object header for disk size requests Patrick Steinhardt
2026-01-07 13:08 ` [PATCH v4 7/7] packfile: drop repository parameter from `packed_object_info()` Patrick Steinhardt
2026-01-08 9:30 ` [PATCH v4 0/7] Improvements for reading object info Karthik Nayak
2026-01-12 9:00 ` [PATCH v5 " Patrick Steinhardt
2026-01-12 9:00 ` [PATCH v5 1/7] object-file: always set OI_LOOSE when " Patrick Steinhardt
2026-01-12 9:00 ` [PATCH v5 2/7] packfile: always declare object info to be OI_PACKED Patrick Steinhardt
2026-01-12 14:54 ` Junio C Hamano
2026-01-12 9:00 ` [PATCH v5 3/7] packfile: extend `is_delta` field to allow for "unknown" state Patrick Steinhardt
2026-01-12 9:00 ` [PATCH v5 4/7] packfile: always populate pack-specific info when reading object info Patrick Steinhardt
2026-01-12 9:00 ` [PATCH v5 5/7] packfile: disentangle return value of `packed_object_info()` Patrick Steinhardt
2026-01-12 9:00 ` [PATCH v5 6/7] packfile: skip unpacking object header for disk size requests Patrick Steinhardt
2026-01-12 9:00 ` [PATCH v5 7/7] packfile: drop repository parameter from `packed_object_info()` Patrick Steinhardt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox