* [PATCH 0/8] Improvements for reading object info
@ 2025-12-18 6:28 Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 1/8] object-file: always set OI_LOOSE when " Patrick Steinhardt
` (9 more replies)
0 siblings, 10 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 6:28 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
Hi,
this patch series contains various small improvements for reading object
info for either loose or packed objects. These improvements were split
out of a larger patch series where I'm about to introduce a new generic
`odb_for_each_object()` function.
This series has a conflict with ps/packfile-store-in-odb-source. I
decided to not make this a dependency though because those two topics
are independent from one another, and I expect that this series here
will be merged down faster than the conflicting one. Furthermore, the
conflict itself is quite minor:
diff --cc packfile.c
index 8daa5a5ee7,ce6716fbea..0000000000
--- a/packfile.c
+++ b/packfile.c
@@@ -2157,10 -2132,11 +2151,10 @@@ int packfile_store_read_object_info(str
struct object_info *oi,
unsigned flags UNUSED)
{
- static struct object_info blank_oi = OBJECT_INFO_INIT;
struct pack_entry e;
- int rtype;
+ int ret;
- if (!find_pack_entry(store->odb->repo, oid, &e))
+ if (!find_pack_entry(store, oid, &e))
return 1;
/*
@@@ -2549,9 -2555,8 +2571,9 @@@ int packfile_store_read_object_stream(s
oi.sizep = &size;
if (packfile_store_read_object_info(store, oid, &oi, 0) ||
- oi.u.packed.is_delta ||
+ oi.u.packed.type == PACKED_OBJECT_TYPE_REF_DELTA ||
+ oi.u.packed.type == PACKED_OBJECT_TYPE_OFS_DELTA ||
- repo_settings_get_big_file_threshold(store->odb->repo) >= size)
+ repo_settings_get_big_file_threshold(store->source->odb->repo) >= size)
return -1;
in_pack_type = unpack_object_header(oi.u.packed.pack,
I'd thus propose to merge this series via an evil merge, but if this
proves to be burdensome I'm happy to defer it to a later point. Just let
me know and I'll adapt accordingly, thanks!
This also fixes the issue reported in <f4ba7e89-4717-4b36-921f-56537131fd69@nvidia.com>.
Patrick
---
Patrick Steinhardt (8):
object-file: always set OI_LOOSE when reading object info
packfile: always declare object info to be OI_PACKED
packfile: extend `is_delta` field to allow for "unknown" state
packfile: always populate pack-specific info when reading object info
packfile: disentangle return value of `packed_object_info()`
packfile: skip unpacking object header for disk size requests
packfile: fix short-circuiting of empty requests
packfile: drop repository parameter from `packed_object_info()`
builtin/cat-file.c | 3 +--
builtin/pack-objects.c | 4 +--
commit-graph.c | 2 +-
object-file.c | 13 ++++++++--
odb.h | 18 +++++++++++--
pack-bitmap.c | 3 +--
packfile.c | 69 +++++++++++++++++++++++++++++++-------------------
packfile.h | 7 +++--
8 files changed, 80 insertions(+), 39 deletions(-)
---
base-commit: c4a0c8845e2426375ad257b6c221a3a7d92ecfda
change-id: 20251215-b4-pks-odb-read-object-info-improvements-0e031ef827d2
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/8] object-file: always set OI_LOOSE when reading object info
2025-12-18 6:28 [PATCH 0/8] Improvements for reading object info Patrick Steinhardt
@ 2025-12-18 6:28 ` Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 2/8] packfile: always declare object info to be OI_PACKED Patrick Steinhardt
` (8 subsequent siblings)
9 siblings, 0 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 6:28 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
There are some early returns in ``odb_source_loose_read_object_info()`
in cases where we don't have to open the loose object. These return
paths do not set `struct object_info::whence` to `OI_LOOSE` though, so
it becomes impossible for the caller to tell the format of such an
object.
Nobody seems to care about this right now, but it's a bug waiting to
happen. Fix this by always setting `whence` on success.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
object-file.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/object-file.c b/object-file.c
index af1c3f972d..716b325669 100644
--- a/object-file.c
+++ b/object-file.c
@@ -439,12 +439,21 @@ int odb_source_loose_read_object_info(struct odb_source *source,
*/
if (!oi->typep && !oi->sizep && !oi->contentp) {
struct stat st;
- if (!oi->disk_sizep && (flags & OBJECT_INFO_QUICK))
- return quick_has_loose(source->loose, oid) ? 0 : -1;
+
+ if (!oi->disk_sizep && (flags & OBJECT_INFO_QUICK)) {
+ status = quick_has_loose(source->loose, oid) ? 0 : -1;
+ if (!status)
+ oi->whence = OI_LOOSE;
+ return status;
+ }
+
if (stat_loose_object(source->loose, oid, &st, &path) < 0)
return -1;
+
if (oi->disk_sizep)
*oi->disk_sizep = st.st_size;
+
+ oi->whence = OI_LOOSE;
return 0;
}
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 2/8] packfile: always declare object info to be OI_PACKED
2025-12-18 6:28 [PATCH 0/8] Improvements for reading object info Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 1/8] object-file: always set OI_LOOSE when " Patrick Steinhardt
@ 2025-12-18 6:28 ` Patrick Steinhardt
2025-12-18 7:23 ` Junio C Hamano
2025-12-18 6:28 ` [PATCH 3/8] packfile: extend `is_delta` field to allow for "unknown" state Patrick Steinhardt
` (7 subsequent siblings)
9 siblings, 1 reply; 23+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 6:28 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
When reading object info via a packfile we yield one of two types:
- The object can either be OI_PACKED, which is what a caller would
typically expect.
- Or it can be OI_DBCACHED if it is stored in the delta base cache.
The latter really is an implementation detail though, and callers
typically don't care at all about the difference. Furthermore, the
information whether or not it is part of the delta base cache can
already be derived via the `is_delta` field, so the fact that we discern
between OI_PACKED and OI_DBCACHED only further complicates the
interface.
Drop the OI_DBCACHED enum completely. There don't seem to be any callers
that care about the distinction.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.h | 1 -
packfile.c | 3 +--
2 files changed, 1 insertion(+), 3 deletions(-)
diff --git a/odb.h b/odb.h
index 014cd9585a..73b0b87ad5 100644
--- a/odb.h
+++ b/odb.h
@@ -330,7 +330,6 @@ struct object_info {
OI_CACHED,
OI_LOOSE,
OI_PACKED,
- OI_DBCACHED
} whence;
union {
/*
diff --git a/packfile.c b/packfile.c
index c88bd92619..79ad9d7179 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1656,8 +1656,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
oidclr(oi->delta_base_oid, p->repo->hash_algo);
}
- oi->whence = in_delta_base_cache(p, obj_offset) ? OI_DBCACHED :
- OI_PACKED;
+ oi->whence = OI_PACKED;
out:
unuse_pack(&w_curs);
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 3/8] packfile: extend `is_delta` field to allow for "unknown" state
2025-12-18 6:28 [PATCH 0/8] Improvements for reading object info Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 1/8] object-file: always set OI_LOOSE when " Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 2/8] packfile: always declare object info to be OI_PACKED Patrick Steinhardt
@ 2025-12-18 6:28 ` Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 4/8] packfile: always populate pack-specific info when reading object info Patrick Steinhardt
` (6 subsequent siblings)
9 siblings, 0 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 6:28 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
The `struct object_info::u::packed::is_delta` field determines whether
or not a specific object is stored as a delta. It only stores whether or
not the object is stored as delta, so it is treated as a boolean value.
This boolean is insufficient though: when reading a packed object via
`packfile_store_read_object_info()` we know to skip parsing the actual
object when the user didn't request any object-specific data. In that
case we won't read the object itself, but will only look up its position
in the packfile. Consequently, we do not know whether it is a delta or
not.
This isn't really an issue right now, as the check for an empty request
is broken. But a subsequent commit will fix it, and once we do we will
have the need to also represent an "unknown" delta state.
Prepare for this change by introducing a new enum that encodes the
object type. We don't use the "unknown" state just yet, but will start
to do so in the next commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.h | 7 ++++++-
packfile.c | 17 ++++++++++++++---
2 files changed, 20 insertions(+), 4 deletions(-)
diff --git a/odb.h b/odb.h
index 73b0b87ad5..afae5e5c01 100644
--- a/odb.h
+++ b/odb.h
@@ -343,7 +343,12 @@ struct object_info {
struct {
struct packed_git *pack;
off_t offset;
- unsigned int is_delta;
+ enum packed_object_type {
+ PACKED_OBJECT_TYPE_UNKNOWN,
+ PACKED_OBJECT_TYPE_FULL,
+ PACKED_OBJECT_TYPE_OFS_DELTA,
+ PACKED_OBJECT_TYPE_REF_DELTA,
+ } type;
} packed;
} u;
};
diff --git a/packfile.c b/packfile.c
index 79ad9d7179..9bce52f912 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2160,8 +2160,18 @@ int packfile_store_read_object_info(struct packfile_store *store,
if (oi->whence == OI_PACKED) {
oi->u.packed.offset = e.offset;
oi->u.packed.pack = e.p;
- oi->u.packed.is_delta = (rtype == OBJ_REF_DELTA ||
- rtype == OBJ_OFS_DELTA);
+
+ switch (rtype) {
+ case OBJ_REF_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
+ break;
+ case OBJ_OFS_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA;
+ break;
+ default:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_FULL;
+ break;
+ }
}
return 0;
@@ -2532,7 +2542,8 @@ int packfile_store_read_object_stream(struct odb_read_stream **out,
oi.sizep = &size;
if (packfile_store_read_object_info(store, oid, &oi, 0) ||
- oi.u.packed.is_delta ||
+ oi.u.packed.type == PACKED_OBJECT_TYPE_REF_DELTA ||
+ oi.u.packed.type == PACKED_OBJECT_TYPE_OFS_DELTA ||
repo_settings_get_big_file_threshold(store->odb->repo) >= size)
return -1;
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 4/8] packfile: always populate pack-specific info when reading object info
2025-12-18 6:28 [PATCH 0/8] Improvements for reading object info Patrick Steinhardt
` (2 preceding siblings ...)
2025-12-18 6:28 ` [PATCH 3/8] packfile: extend `is_delta` field to allow for "unknown" state Patrick Steinhardt
@ 2025-12-18 6:28 ` Patrick Steinhardt
2025-12-18 7:32 ` Junio C Hamano
2025-12-18 6:28 ` [PATCH 5/8] packfile: disentangle return value of `packed_object_info()` Patrick Steinhardt
` (5 subsequent siblings)
9 siblings, 1 reply; 23+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 6:28 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
When reading object information from a packfile we are not always
populating the pack-specific information. This happens in two cases:
- When calling `packed_object_info()` directly instead of
`packfile_store_read_object_info()`.
- When we've got the empty request.
Fix both of these issues so that we can always assume the pack info to
be populated when reading object info from a pack.
Note that we don't really care about the second case right now, as the
condition will always evaluate to false anyway. This will be fixed in
the next commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 38 ++++++++++++++++++++------------------
1 file changed, 20 insertions(+), 18 deletions(-)
diff --git a/packfile.c b/packfile.c
index 9bce52f912..6e66c90c46 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1657,6 +1657,20 @@ int packed_object_info(struct repository *r, struct packed_git *p,
}
oi->whence = OI_PACKED;
+ oi->u.packed.offset = obj_offset;
+ oi->u.packed.pack = p;
+
+ switch (type) {
+ case OBJ_REF_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
+ break;
+ case OBJ_OFS_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA;
+ break;
+ default:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_FULL;
+ break;
+ }
out:
unuse_pack(&w_curs);
@@ -2148,8 +2162,13 @@ int packfile_store_read_object_info(struct packfile_store *store,
* We know that the caller doesn't actually need the
* information below, so return early.
*/
- if (oi == &blank_oi)
+ if (oi == &blank_oi) {
+ oi->whence = OI_PACKED;
+ oi->u.packed.offset = e.offset;
+ oi->u.packed.pack = e.p;
+ oi->u.packed.type = PACKED_OBJECT_TYPE_UNKNOWN;
return 0;
+ }
rtype = packed_object_info(store->odb->repo, e.p, e.offset, oi);
if (rtype < 0) {
@@ -2157,23 +2176,6 @@ int packfile_store_read_object_info(struct packfile_store *store,
return -1;
}
- if (oi->whence == OI_PACKED) {
- oi->u.packed.offset = e.offset;
- oi->u.packed.pack = e.p;
-
- switch (rtype) {
- case OBJ_REF_DELTA:
- oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
- break;
- case OBJ_OFS_DELTA:
- oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA;
- break;
- default:
- oi->u.packed.type = PACKED_OBJECT_TYPE_FULL;
- break;
- }
- }
-
return 0;
}
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 5/8] packfile: disentangle return value of `packed_object_info()`
2025-12-18 6:28 [PATCH 0/8] Improvements for reading object info Patrick Steinhardt
` (3 preceding siblings ...)
2025-12-18 6:28 ` [PATCH 4/8] packfile: always populate pack-specific info when reading object info Patrick Steinhardt
@ 2025-12-18 6:28 ` Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 6/8] packfile: skip unpacking object header for disk size requests Patrick Steinhardt
` (4 subsequent siblings)
9 siblings, 0 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 6:28 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
The `packed_object_info()` function returns the type of the packed
object. While we use an `enum object_type` to store the return value,
this type is not to be confused with the actual object type. It _may_
contain the object type, but it may just as well encode that the given
packed object is stored as a delta.
We have removed the only caller that relied on this returned object type
in the preceding commit, so let's simplify semantics and return either 0
on success or a negative error code otherwise.
This unblocks a small optimization where we can skip reading the object
type altogether.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 21 ++++++++++++---------
packfile.h | 4 ++++
2 files changed, 16 insertions(+), 9 deletions(-)
diff --git a/packfile.c b/packfile.c
index 6e66c90c46..c141b8a7b1 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1587,6 +1587,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
unsigned long size;
off_t curpos = obj_offset;
enum object_type type;
+ int ret;
/*
* We always get the representation type, but only convert it to
@@ -1607,12 +1608,12 @@ int packed_object_info(struct repository *r, struct packed_git *p,
off_t base_offset = get_delta_base(p, &w_curs, &tmp_pos,
type, obj_offset);
if (!base_offset) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
*oi->sizep = get_size_from_delta(p, &w_curs, tmp_pos);
if (*oi->sizep == 0) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
} else {
@@ -1625,7 +1626,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (offset_to_pack_pos(p, obj_offset, &pos) < 0) {
error("could not find object at offset %"PRIuMAX" "
"in pack %s", (uintmax_t)obj_offset, p->pack_name);
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
@@ -1639,7 +1640,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (oi->typep)
*oi->typep = ptot;
if (ptot < 0) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
}
@@ -1649,7 +1650,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (get_delta_base_oid(p, &w_curs, curpos,
oi->delta_base_oid,
type, obj_offset) < 0) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
} else
@@ -1672,9 +1673,11 @@ int packed_object_info(struct repository *r, struct packed_git *p,
break;
}
+ ret = 0;
+
out:
unuse_pack(&w_curs);
- return type;
+ return ret;
}
static void *unpack_compressed_entry(struct packed_git *p,
@@ -2153,7 +2156,7 @@ int packfile_store_read_object_info(struct packfile_store *store,
{
static struct object_info blank_oi = OBJECT_INFO_INIT;
struct pack_entry e;
- int rtype;
+ int ret;
if (!find_pack_entry(store->odb->repo, oid, &e))
return 1;
@@ -2170,8 +2173,8 @@ int packfile_store_read_object_info(struct packfile_store *store,
return 0;
}
- rtype = packed_object_info(store->odb->repo, e.p, e.offset, oi);
- if (rtype < 0) {
+ ret = packed_object_info(store->odb->repo, e.p, e.offset, oi);
+ if (ret < 0) {
mark_bad_packed_object(e.p, oid);
return -1;
}
diff --git a/packfile.h b/packfile.h
index 59d162a3f4..07f5bfbc4f 100644
--- a/packfile.h
+++ b/packfile.h
@@ -378,6 +378,10 @@ void release_pack_memory(size_t);
/* global flag to enable extra checks when accessing packed objects */
extern int do_check_packed_object_crc;
+/*
+ * Look up the object info for a specific offset in the packfile.
+ * success, a negative error code otherwise.
+ */
int packed_object_info(struct repository *r,
struct packed_git *pack,
off_t offset, struct object_info *);
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 6/8] packfile: skip unpacking object header for disk size requests
2025-12-18 6:28 [PATCH 0/8] Improvements for reading object info Patrick Steinhardt
` (4 preceding siblings ...)
2025-12-18 6:28 ` [PATCH 5/8] packfile: disentangle return value of `packed_object_info()` Patrick Steinhardt
@ 2025-12-18 6:28 ` Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 7/8] packfile: fix short-circuiting of empty requests Patrick Steinhardt
` (3 subsequent siblings)
9 siblings, 0 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 6:28 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
While most of the object info requests for a packed object require us to
unpack its headers, reading its disk size doesn't. We still unpack the
object header in that case though, which is unnecessary work.
Skip reading the header if only the disk size is requested. This leads
to a small speedup when reading disk size, only. The following benchmark
was done in the Git repository:
Benchmark 1: ./git rev-list --disk-usage HEAD (rev = HEAD~)
Time (mean ± σ): 105.2 ms ± 0.6 ms [User: 91.4 ms, System: 13.3 ms]
Range (min … max): 103.7 ms … 106.0 ms 27 runs
Benchmark 2: ./git rev-list --disk-usage HEAD (rev = HEAD)
Time (mean ± σ): 96.7 ms ± 0.4 ms [User: 86.2 ms, System: 10.0 ms]
Range (min … max): 96.2 ms … 98.1 ms 30 runs
Summary
./git rev-list --disk-usage HEAD (rev = HEAD) ran
1.09 ± 0.01 times faster than ./git rev-list --disk-usage HEAD (rev = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/packfile.c b/packfile.c
index c141b8a7b1..d2ae2432eb 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1586,7 +1586,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
struct pack_window *w_curs = NULL;
unsigned long size;
off_t curpos = obj_offset;
- enum object_type type;
+ enum object_type type = OBJ_NONE;
int ret;
/*
@@ -1598,7 +1598,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
&type);
if (!*oi->contentp)
type = OBJ_BAD;
- } else {
+ } else if (oi->sizep || oi->typep || oi->delta_base_oid) {
type = unpack_object_header(p, &w_curs, &curpos, &size);
}
@@ -1662,6 +1662,9 @@ int packed_object_info(struct repository *r, struct packed_git *p,
oi->u.packed.pack = p;
switch (type) {
+ case OBJ_NONE:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_UNKNOWN;
+ break;
case OBJ_REF_DELTA:
oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
break;
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 7/8] packfile: fix short-circuiting of empty requests
2025-12-18 6:28 [PATCH 0/8] Improvements for reading object info Patrick Steinhardt
` (5 preceding siblings ...)
2025-12-18 6:28 ` [PATCH 6/8] packfile: skip unpacking object header for disk size requests Patrick Steinhardt
@ 2025-12-18 6:28 ` Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 8/8] packfile: drop repository parameter from `packed_object_info()` Patrick Steinhardt
` (2 subsequent siblings)
9 siblings, 0 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 6:28 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
When reading object information from the packfile store we have logic
that tries to bail out early on empty requests. This is supposed to be a
performance optimization so that we don't even have to unpack the object
header stored in the packfile.
This optimization doesn't work though: we compare the passed-in object
info pointer with the pointer of an on-stack variable, which of course
cannot ever become true. This issue was introduced via d9f517d051
(object-file: split out functions relating to object store subsystem,
2025-04-15): before this commit, we checked whether the passed-in object
info was a `NULL` pointer, and if so, we set it to point to `blank_oi`
instead. The commit then split up these the logic so that we continue to
set up `blank_oi` in `do_oid_object_info_extended()`, but then do the
check in `packfile_store_read_object_info()`. But even before that
commit the logic was only partially working, as it could very well be
that callers pass a blank object info themselves.
Fix this bug by introducing a new `object_info_is_blank_request()`
helper, which simply verifies that none of the contained request
pointers are populated.
Reported-by: Aaron Plattner <aplattner@nvidia.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.h | 10 ++++++++++
packfile.c | 3 +--
2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/odb.h b/odb.h
index afae5e5c01..b869c054c1 100644
--- a/odb.h
+++ b/odb.h
@@ -353,6 +353,16 @@ struct object_info {
} u;
};
+/*
+ * Given an object info structure, figure out whether any of its request
+ * pointers are populated.
+ */
+static inline bool object_info_is_blank_request(struct object_info *oi)
+{
+ return !oi->typep && !oi->sizep && !oi->disk_sizep &&
+ !oi->delta_base_oid && !oi->contentp;
+}
+
/*
* Initializer for a "struct object_info" that wants no items. You may
* also memset() the memory to all-zeroes.
diff --git a/packfile.c b/packfile.c
index d2ae2432eb..ce83e77899 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2157,7 +2157,6 @@ int packfile_store_read_object_info(struct packfile_store *store,
struct object_info *oi,
unsigned flags UNUSED)
{
- static struct object_info blank_oi = OBJECT_INFO_INIT;
struct pack_entry e;
int ret;
@@ -2168,7 +2167,7 @@ int packfile_store_read_object_info(struct packfile_store *store,
* We know that the caller doesn't actually need the
* information below, so return early.
*/
- if (oi == &blank_oi) {
+ if (object_info_is_blank_request(oi)) {
oi->whence = OI_PACKED;
oi->u.packed.offset = e.offset;
oi->u.packed.pack = e.p;
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 8/8] packfile: drop repository parameter from `packed_object_info()`
2025-12-18 6:28 [PATCH 0/8] Improvements for reading object info Patrick Steinhardt
` (6 preceding siblings ...)
2025-12-18 6:28 ` [PATCH 7/8] packfile: fix short-circuiting of empty requests Patrick Steinhardt
@ 2025-12-18 6:28 ` Patrick Steinhardt
2025-12-18 8:09 ` [PATCH 0/8] Improvements for reading object info Junio C Hamano
2025-12-18 10:54 ` [PATCH v2 0/7] " Patrick Steinhardt
9 siblings, 0 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 6:28 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
The function `packed_object_info()` takes a packfile and offset and
returns the object info for the corresponding object. Despite these two
parameters though it also takes a repository pointer. This is redundant
information though, as `struct packed_git` already has a repository
pointer that is always populated.
Drop the redundant parameter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/cat-file.c | 3 +--
builtin/pack-objects.c | 4 ++--
commit-graph.c | 2 +-
pack-bitmap.c | 3 +--
packfile.c | 8 ++++----
packfile.h | 3 +--
6 files changed, 10 insertions(+), 13 deletions(-)
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 505ddaa12f..2ad712e9f8 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -487,8 +487,7 @@ static void batch_object_write(const char *obj_name,
data->info.sizep = &data->size;
if (pack)
- ret = packed_object_info(the_repository, pack,
- offset, &data->info);
+ ret = packed_object_info(pack, offset, &data->info);
else
ret = odb_read_object_info_extended(the_repository->objects,
&data->oid, &data->info,
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 1ce8d6ee21..85762f8c4f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2411,7 +2411,7 @@ static void drop_reused_delta(struct object_entry *entry)
oi.sizep = &size;
oi.typep = &type;
- if (packed_object_info(the_repository, IN_PACK(entry), entry->in_pack_offset, &oi) < 0) {
+ if (packed_object_info(IN_PACK(entry), entry->in_pack_offset, &oi) < 0) {
/*
* We failed to get the info from this pack for some reason;
* fall back to odb_read_object_info, which may find another copy.
@@ -3748,7 +3748,7 @@ static int add_object_entry_from_pack(const struct object_id *oid,
struct object_info oi = OBJECT_INFO_INIT;
oi.typep = &type;
- if (packed_object_info(the_repository, p, ofs, &oi) < 0) {
+ if (packed_object_info(p, ofs, &oi) < 0) {
die(_("could not get type of object %s in pack %s"),
oid_to_hex(oid), p->pack_name);
} else if (type == OBJ_COMMIT) {
diff --git a/commit-graph.c b/commit-graph.c
index 80be2ff2c3..f572670bd0 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1499,7 +1499,7 @@ static int add_packed_commits(const struct object_id *oid,
display_progress(ctx->progress, ++ctx->progress_done);
oi.typep = &type;
- if (packed_object_info(ctx->r, pack, offset, &oi) < 0)
+ if (packed_object_info(pack, offset, &oi) < 0)
die(_("unable to get type of object %s"), oid_to_hex(oid));
if (type != OBJ_COMMIT)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 8ca79725b1..972203f12b 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1876,8 +1876,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
ofs = pack_pos_to_offset(pack, pos);
}
- if (packed_object_info(bitmap_repo(bitmap_git), pack, ofs,
- &oi) < 0) {
+ if (packed_object_info(pack, ofs, &oi) < 0) {
struct object_id oid;
nth_bitmap_object_oid(bitmap_git, &oid,
pack_pos_to_index(pack, pos));
diff --git a/packfile.c b/packfile.c
index ce83e77899..8daa5a5ee7 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1580,7 +1580,7 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
hashmap_add(&delta_base_cache, &ent->ent);
}
-int packed_object_info(struct repository *r, struct packed_git *p,
+int packed_object_info(struct packed_git *p,
off_t obj_offset, struct object_info *oi)
{
struct pack_window *w_curs = NULL;
@@ -1594,7 +1594,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
* a "real" type later if the caller is interested.
*/
if (oi->contentp) {
- *oi->contentp = cache_or_unpack_entry(r, p, obj_offset, oi->sizep,
+ *oi->contentp = cache_or_unpack_entry(p->repo, p, obj_offset, oi->sizep,
&type);
if (!*oi->contentp)
type = OBJ_BAD;
@@ -1635,7 +1635,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (oi->typep) {
enum object_type ptot;
- ptot = packed_to_object_type(r, p, obj_offset,
+ ptot = packed_to_object_type(p->repo, p, obj_offset,
type, &w_curs, curpos);
if (oi->typep)
*oi->typep = ptot;
@@ -2175,7 +2175,7 @@ int packfile_store_read_object_info(struct packfile_store *store,
return 0;
}
- ret = packed_object_info(store->odb->repo, e.p, e.offset, oi);
+ ret = packed_object_info(e.p, e.offset, oi);
if (ret < 0) {
mark_bad_packed_object(e.p, oid);
return -1;
diff --git a/packfile.h b/packfile.h
index 07f5bfbc4f..573d06f6ba 100644
--- a/packfile.h
+++ b/packfile.h
@@ -382,8 +382,7 @@ extern int do_check_packed_object_crc;
* Look up the object info for a specific offset in the packfile.
* success, a negative error code otherwise.
*/
-int packed_object_info(struct repository *r,
- struct packed_git *pack,
+int packed_object_info(struct packed_git *pack,
off_t offset, struct object_info *);
void mark_bad_packed_object(struct packed_git *, const struct object_id *);
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 2/8] packfile: always declare object info to be OI_PACKED
2025-12-18 6:28 ` [PATCH 2/8] packfile: always declare object info to be OI_PACKED Patrick Steinhardt
@ 2025-12-18 7:23 ` Junio C Hamano
2025-12-18 9:10 ` Patrick Steinhardt
0 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2025-12-18 7:23 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Aaron Plattner
Patrick Steinhardt <ps@pks.im> writes:
> When reading object info via a packfile we yield one of two types:
>
> - The object can either be OI_PACKED, which is what a caller would
> typically expect.
>
> - Or it can be OI_DBCACHED if it is stored in the delta base cache.
>
> The latter really is an implementation detail though, and callers
> typically don't care at all about the difference. Furthermore, the
> information whether or not it is part of the delta base cache can
> already be derived via the `is_delta` field, so the fact that we discern
> between OI_PACKED and OI_DBCACHED only further complicates the
> interface.
If this were "and no existing callers check at all", it would be
trivial to decide for this change. In fact you do say that but in a
weaker form just below.
> Drop the OI_DBCACHED enum completely. There don't seem to be any callers
> that care about the distinction.
"git grep OI_DBCACHED" shows only a single hit, which is what you
are getting rid of in this patch, but I cannot claim that we did a
sufficient audit, as this change will break code paths that check if
they got OI_PACKED and do something differently (or if what they got
is different from OI_PACKED, for that matter).
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> odb.h | 1 -
> packfile.c | 3 +--
> 2 files changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/odb.h b/odb.h
> index 014cd9585a..73b0b87ad5 100644
> --- a/odb.h
> +++ b/odb.h
> @@ -330,7 +330,6 @@ struct object_info {
> OI_CACHED,
> OI_LOOSE,
> OI_PACKED,
> - OI_DBCACHED
> } whence;
> union {
> /*
> diff --git a/packfile.c b/packfile.c
> index c88bd92619..79ad9d7179 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -1656,8 +1656,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
> oidclr(oi->delta_base_oid, p->repo->hash_algo);
> }
>
> - oi->whence = in_delta_base_cache(p, obj_offset) ? OI_DBCACHED :
> - OI_PACKED;
> + oi->whence = OI_PACKED;
>
> out:
> unuse_pack(&w_curs);
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 4/8] packfile: always populate pack-specific info when reading object info
2025-12-18 6:28 ` [PATCH 4/8] packfile: always populate pack-specific info when reading object info Patrick Steinhardt
@ 2025-12-18 7:32 ` Junio C Hamano
0 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2025-12-18 7:32 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Aaron Plattner
Patrick Steinhardt <ps@pks.im> writes:
> @@ -2148,8 +2162,13 @@ int packfile_store_read_object_info(struct packfile_store *store,
> * We know that the caller doesn't actually need the
> * information below, so return early.
> */
> - if (oi == &blank_oi)
> + if (oi == &blank_oi) {
> + oi->whence = OI_PACKED;
> + oi->u.packed.offset = e.offset;
> + oi->u.packed.pack = e.p;
> + oi->u.packed.type = PACKED_OBJECT_TYPE_UNKNOWN;
It cannot be seen as it is before the precontext, but if blank_oi is
still a function scope static that is initialized only once by
assigning OBJECT_INFO_INIT, this will leave a timg bomb waiting to
go off, as it violates the "blank"-ness promise for the next caller
of this function who calls NULL in oi.
I'd prefer we fix this nonsense "we only declared a function scope
static, but without actually using it for anything, other than to
compare its address with the caller supplied parameter" well before
this step.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/8] Improvements for reading object info
2025-12-18 6:28 [PATCH 0/8] Improvements for reading object info Patrick Steinhardt
` (7 preceding siblings ...)
2025-12-18 6:28 ` [PATCH 8/8] packfile: drop repository parameter from `packed_object_info()` Patrick Steinhardt
@ 2025-12-18 8:09 ` Junio C Hamano
2025-12-18 8:30 ` Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 0/7] " Patrick Steinhardt
9 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2025-12-18 8:09 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Aaron Plattner
Patrick Steinhardt <ps@pks.im> writes:
> This series has a conflict with ps/packfile-store-in-odb-source. I
> decided to not make this a dependency though because those two topics
> are independent from one another, and I expect that this series here
> will be merged down faster than the conflicting one. Furthermore, the
> conflict itself is quite minor:
>
> diff --cc packfile.c
> index 8daa5a5ee7,ce6716fbea..0000000000
> --- a/packfile.c
> +++ b/packfile.c
> @@@ -2157,10 -2132,11 +2151,10 @@@ int packfile_store_read_object_info(str
> struct object_info *oi,
> unsigned flags UNUSED)
> {
> - static struct object_info blank_oi = OBJECT_INFO_INIT;
> struct pack_entry e;
> - int rtype;
> + int ret;
>
> - if (!find_pack_entry(store->odb->repo, oid, &e))
> + if (!find_pack_entry(store, oid, &e))
> return 1;
>
> /*
> @@@ -2549,9 -2555,8 +2571,9 @@@ int packfile_store_read_object_stream(s
> oi.sizep = &size;
>
> if (packfile_store_read_object_info(store, oid, &oi, 0) ||
> - oi.u.packed.is_delta ||
> + oi.u.packed.type == PACKED_OBJECT_TYPE_REF_DELTA ||
> + oi.u.packed.type == PACKED_OBJECT_TYPE_OFS_DELTA ||
> - repo_settings_get_big_file_threshold(store->odb->repo) >= size)
> + repo_settings_get_big_file_threshold(store->source->odb->repo) >= size)
> return -1;
>
> in_pack_type = unpack_object_header(oi.u.packed.pack,
>
> I'd thus propose to merge this series via an evil merge, but if this
> proves to be burdensome I'm happy to defer it to a later point. Just let
> me know and I'll adapt accordingly, thanks!
Indeed the conflicts above are miniscule that it does not even need
any evil merge. The surviving lines are all from either ours or
theirs, that changes are close enough to be shown in --cc.
But let me first concentrate more on fixing performance regression
that already made down to 'master'. It is a shame that nobody
caught it while it was cooking in 'next'.
Thanks.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/8] Improvements for reading object info
2025-12-18 8:09 ` [PATCH 0/8] Improvements for reading object info Junio C Hamano
@ 2025-12-18 8:30 ` Patrick Steinhardt
0 siblings, 0 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 8:30 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Aaron Plattner
On Thu, Dec 18, 2025 at 05:09:21PM +0900, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > diff --cc packfile.c
> > index 8daa5a5ee7,ce6716fbea..0000000000
> > --- a/packfile.c
> > +++ b/packfile.c
> > @@@ -2549,9 -2555,8 +2571,9 @@@ int packfile_store_read_object_stream(s
> > oi.sizep = &size;
> >
> > if (packfile_store_read_object_info(store, oid, &oi, 0) ||
> > - oi.u.packed.is_delta ||
> > + oi.u.packed.type == PACKED_OBJECT_TYPE_REF_DELTA ||
> > + oi.u.packed.type == PACKED_OBJECT_TYPE_OFS_DELTA ||
> > - repo_settings_get_big_file_threshold(store->odb->repo) >= size)
> > + repo_settings_get_big_file_threshold(store->source->odb->repo) >= size)
> > return -1;
> >
> > in_pack_type = unpack_object_header(oi.u.packed.pack,
> >
> > I'd thus propose to merge this series via an evil merge, but if this
> > proves to be burdensome I'm happy to defer it to a later point. Just let
> > me know and I'll adapt accordingly, thanks!
>
> Indeed the conflicts above are miniscule that it does not even need
> any evil merge. The surviving lines are all from either ours or
> theirs, that changes are close enough to be shown in --cc.
>
> But let me first concentrate more on fixing performance regression
> that already made down to 'master'. It is a shame that nobody
> caught it while it was cooking in 'next'.
Fair enough, so that means that you'd want to merge your patch down
first, right? If so I'll rebase my series on top of your patch and then
resend it soonish.
In any case, I noticed a slight regression in one of the benchmarks that
prints all objects, but I attributed it to CI flakiness [1]. The uptick
didn't seem strong enough to really be a regression, and I'm still not
sure whether it's related to this patch series or not. Chances are it
is. I'll investigate and make sure to extend the benchmarking suite
accordingly so that we have a clearer signal there.
Thanks!
Patrick
[1]: https://bencher.dev/perf/git?branches=595859eb-071c-48e9-97cf-195e0a3d6ed1&testbeds=02dcb8ad-6873-494c-aabc-9a6237601308&benchmarks=0da3d87a-ce30-4125-86e9-12d84ec4bc49&measures=63dafffb-98c4-4c27-ba43-7112cae627fc
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/8] packfile: always declare object info to be OI_PACKED
2025-12-18 7:23 ` Junio C Hamano
@ 2025-12-18 9:10 ` Patrick Steinhardt
0 siblings, 0 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 9:10 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Aaron Plattner
On Thu, Dec 18, 2025 at 04:23:03PM +0900, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
>
> > When reading object info via a packfile we yield one of two types:
> >
> > - The object can either be OI_PACKED, which is what a caller would
> > typically expect.
> >
> > - Or it can be OI_DBCACHED if it is stored in the delta base cache.
> >
> > The latter really is an implementation detail though, and callers
> > typically don't care at all about the difference. Furthermore, the
> > information whether or not it is part of the delta base cache can
> > already be derived via the `is_delta` field, so the fact that we discern
> > between OI_PACKED and OI_DBCACHED only further complicates the
> > interface.
>
> If this were "and no existing callers check at all", it would be
> trivial to decide for this change. In fact you do say that but in a
> weaker form just below.
>
> > Drop the OI_DBCACHED enum completely. There don't seem to be any callers
> > that care about the distinction.
>
> "git grep OI_DBCACHED" shows only a single hit, which is what you
> are getting rid of in this patch, but I cannot claim that we did a
> sufficient audit, as this change will break code paths that check if
> they got OI_PACKED and do something differently (or if what they got
> is different from OI_PACKED, for that matter).
That's a fair complaint. I'll adapt the commit message to include the
investigation.
Thanks!
Patrick
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v2 0/7] Improvements for reading object info
2025-12-18 6:28 [PATCH 0/8] Improvements for reading object info Patrick Steinhardt
` (8 preceding siblings ...)
2025-12-18 8:09 ` [PATCH 0/8] Improvements for reading object info Junio C Hamano
@ 2025-12-18 10:54 ` Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 1/7] object-file: always set OI_LOOSE when " Patrick Steinhardt
` (6 more replies)
9 siblings, 7 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 10:54 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
Hi,
this patch series contains various small improvements for reading object
info for either loose or packed objects. These improvements were split
out of a larger patch series where I'm about to introduce a new generic
`odb_for_each_object()` function.
Changes in v2:
- Rebase the series on top of master with jc/object-read-stream-fix
merged into it. I've also evicted the patch that fixes the same
underlying issue.
- Improve the commit message that drops OI_DBCACHED to explain why
this is a safe refactoring.
- Link to v1: https://lore.kernel.org/r/20251218-b4-pks-odb-read-object-info-improvements-v1-0-81c8368492be@pks.im
Thanks!
Patrick
---
Patrick Steinhardt (7):
object-file: always set OI_LOOSE when reading object info
packfile: always declare object info to be OI_PACKED
packfile: extend `is_delta` field to allow for "unknown" state
packfile: always populate pack-specific info when reading object info
packfile: disentangle return value of `packed_object_info()`
packfile: skip unpacking object header for disk size requests
packfile: drop repository parameter from `packed_object_info()`
builtin/cat-file.c | 3 +--
builtin/pack-objects.c | 4 ++--
commit-graph.c | 2 +-
object-file.c | 19 ++++++++++++----
odb.h | 8 +++++--
pack-bitmap.c | 3 +--
packfile.c | 61 ++++++++++++++++++++++++++++++--------------------
packfile.h | 7 ++++--
8 files changed, 68 insertions(+), 39 deletions(-)
Range-diff versus v1:
1: 0c1a4a4745 < -: ---------- object-file: always set OI_LOOSE when reading object info
-: ---------- > 1: 2287c0cbd9 object-file: always set OI_LOOSE when reading object info
2: 98962428cf ! 2: a1cd99af9c packfile: always declare object info to be OI_PACKED
@@ Commit message
between OI_PACKED and OI_DBCACHED only further complicates the
interface.
- Drop the OI_DBCACHED enum completely. There don't seem to be any callers
- that care about the distinction.
+ There aren't all that many callers that care about the `whence` field in
+ the first place. In fact, there's only three:
+
+ - `packfile_store_read_object_info()` checks for `whence == OI_PACKED`
+ and then populates the packfile information of the object info
+ structure. We now start to do this also for deltified objects, which
+ gives its callers strictly more information.
+
+ - `repack_local_links()` wants to determine whether the object is part
+ of a promisor pack and checks for `whence == OI_PACKED`. If so, it
+ verifies that the packfile is a promisor pack. It's arguably wrong
+ to declare that an object is not part of a promisor pack only
+ because it is stored in the delta base cache.
+
+ - `is_not_in_promisor_pack_obj()` does the same, but checks that a
+ specific object is _not_ part of a promisor pack. The same reasoning
+ as above applies.
+
+ Drop the OI_DBCACHED enum completely. None of the callers seem to care
+ about the distinction.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
3: 0a5b806934 = 3: 7a043c09ee packfile: extend `is_delta` field to allow for "unknown" state
4: 6a05c85683 ! 4: 448511cb19 packfile: always populate pack-specific info when reading object info
@@ Metadata
## Commit message ##
packfile: always populate pack-specific info when reading object info
- When reading object information from a packfile we are not always
- populating the pack-specific information. This happens in two cases:
+ When reading object information via `packed_object_info()` we may not
+ populate the object info's packfile-specific fields. This leads to
+ inconsistent object info depending on whether the info was populated via
+ `packfile_store_read_object_info()` or `packed_object_info()`.
- - When calling `packed_object_info()` directly instead of
- `packfile_store_read_object_info()`.
-
- - When we've got the empty request.
-
- Fix both of these issues so that we can always assume the pack info to
- be populated when reading object info from a pack.
-
- Note that we don't really care about the second case right now, as the
- condition will always evaluate to false anyway. This will be fixed in
- the next commit.
+ Fix this inconsistecny so that we can always assume the pack info to be
+ populated when reading object info from a pack.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
@@ packfile.c: int packed_object_info(struct repository *r, struct packed_git *p,
out:
unuse_pack(&w_curs);
-@@ packfile.c: int packfile_store_read_object_info(struct packfile_store *store,
- * We know that the caller doesn't actually need the
- * information below, so return early.
- */
-- if (oi == &blank_oi)
-+ if (oi == &blank_oi) {
-+ oi->whence = OI_PACKED;
-+ oi->u.packed.offset = e.offset;
-+ oi->u.packed.pack = e.p;
-+ oi->u.packed.type = PACKED_OBJECT_TYPE_UNKNOWN;
- return 0;
-+ }
-
- rtype = packed_object_info(store->odb->repo, e.p, e.offset, oi);
- if (rtype < 0) {
@@ packfile.c: int packfile_store_read_object_info(struct packfile_store *store,
return -1;
}
5: b09f37400c ! 5: e1ee6c7841 packfile: disentangle return value of `packed_object_info()`
@@ packfile.c: int packed_object_info(struct repository *r, struct packed_git *p,
static void *unpack_compressed_entry(struct packed_git *p,
@@ packfile.c: int packfile_store_read_object_info(struct packfile_store *store,
+ unsigned flags UNUSED)
{
- static struct object_info blank_oi = OBJECT_INFO_INIT;
struct pack_entry e;
- int rtype;
+ int ret;
@@ packfile.c: int packfile_store_read_object_info(struct packfile_store *store,
if (!find_pack_entry(store->odb->repo, oid, &e))
return 1;
@@ packfile.c: int packfile_store_read_object_info(struct packfile_store *store,
+ if (!oi)
return 0;
- }
- rtype = packed_object_info(store->odb->repo, e.p, e.offset, oi);
- if (rtype < 0) {
6: 253c0d47ab = 6: 77589e84b5 packfile: skip unpacking object header for disk size requests
7: 4dac51d4be < -: ---------- packfile: fix short-circuiting of empty requests
8: 2cf441de0d ! 7: 08f4b865e5 packfile: drop repository parameter from `packed_object_info()`
@@ packfile.c: int packed_object_info(struct repository *r, struct packed_git *p,
if (oi->typep)
*oi->typep = ptot;
@@ packfile.c: int packfile_store_read_object_info(struct packfile_store *store,
+ if (!oi)
return 0;
- }
- ret = packed_object_info(store->odb->repo, e.p, e.offset, oi);
+ ret = packed_object_info(e.p, e.offset, oi);
---
base-commit: 7df68b50e49b6a1b576abb19b2e5d457749bc28b
change-id: 20251215-b4-pks-odb-read-object-info-improvements-0e031ef827d2
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v2 1/7] object-file: always set OI_LOOSE when reading object info
2025-12-18 10:54 ` [PATCH v2 0/7] " Patrick Steinhardt
@ 2025-12-18 10:54 ` Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 2/7] packfile: always declare object info to be OI_PACKED Patrick Steinhardt
` (5 subsequent siblings)
6 siblings, 0 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 10:54 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
There are some early returns in ``odb_source_loose_read_object_info()`
in cases where we don't have to open the loose object. These return
paths do not set `struct object_info::whence` to `OI_LOOSE` though, so
it becomes impossible for the caller to tell the format of such an
object.
Nobody seems to care about this right now, but it's a bug waiting to
happen. Fix this by always setting `whence` on success.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
object-file.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)
diff --git a/object-file.c b/object-file.c
index 6280e42f34..d566df427a 100644
--- a/object-file.c
+++ b/object-file.c
@@ -439,12 +439,23 @@ int odb_source_loose_read_object_info(struct odb_source *source,
*/
if (!oi || (!oi->typep && !oi->sizep && !oi->contentp)) {
struct stat st;
- if ((!oi || !oi->disk_sizep) && (flags & OBJECT_INFO_QUICK))
- return quick_has_loose(source->loose, oid) ? 0 : -1;
+
+ if ((!oi || !oi->disk_sizep) && (flags & OBJECT_INFO_QUICK)) {
+ status = quick_has_loose(source->loose, oid) ? 0 : -1;
+ if (!status && oi)
+ oi->whence = OI_LOOSE;
+ return status;
+ }
+
if (stat_loose_object(source->loose, oid, &st, &path) < 0)
return -1;
- if (oi && oi->disk_sizep)
- *oi->disk_sizep = st.st_size;
+
+ if (oi) {
+ if (oi->disk_sizep)
+ *oi->disk_sizep = st.st_size;
+ oi->whence = OI_LOOSE;
+ }
+
return 0;
}
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 2/7] packfile: always declare object info to be OI_PACKED
2025-12-18 10:54 ` [PATCH v2 0/7] " Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 1/7] object-file: always set OI_LOOSE when " Patrick Steinhardt
@ 2025-12-18 10:54 ` Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 3/7] packfile: extend `is_delta` field to allow for "unknown" state Patrick Steinhardt
` (4 subsequent siblings)
6 siblings, 0 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 10:54 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
When reading object info via a packfile we yield one of two types:
- The object can either be OI_PACKED, which is what a caller would
typically expect.
- Or it can be OI_DBCACHED if it is stored in the delta base cache.
The latter really is an implementation detail though, and callers
typically don't care at all about the difference. Furthermore, the
information whether or not it is part of the delta base cache can
already be derived via the `is_delta` field, so the fact that we discern
between OI_PACKED and OI_DBCACHED only further complicates the
interface.
There aren't all that many callers that care about the `whence` field in
the first place. In fact, there's only three:
- `packfile_store_read_object_info()` checks for `whence == OI_PACKED`
and then populates the packfile information of the object info
structure. We now start to do this also for deltified objects, which
gives its callers strictly more information.
- `repack_local_links()` wants to determine whether the object is part
of a promisor pack and checks for `whence == OI_PACKED`. If so, it
verifies that the packfile is a promisor pack. It's arguably wrong
to declare that an object is not part of a promisor pack only
because it is stored in the delta base cache.
- `is_not_in_promisor_pack_obj()` does the same, but checks that a
specific object is _not_ part of a promisor pack. The same reasoning
as above applies.
Drop the OI_DBCACHED enum completely. None of the callers seem to care
about the distinction.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.h | 1 -
packfile.c | 3 +--
2 files changed, 1 insertion(+), 3 deletions(-)
diff --git a/odb.h b/odb.h
index 014cd9585a..73b0b87ad5 100644
--- a/odb.h
+++ b/odb.h
@@ -330,7 +330,6 @@ struct object_info {
OI_CACHED,
OI_LOOSE,
OI_PACKED,
- OI_DBCACHED
} whence;
union {
/*
diff --git a/packfile.c b/packfile.c
index 08a0863fc3..b0c6665c87 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1656,8 +1656,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
oidclr(oi->delta_base_oid, p->repo->hash_algo);
}
- oi->whence = in_delta_base_cache(p, obj_offset) ? OI_DBCACHED :
- OI_PACKED;
+ oi->whence = OI_PACKED;
out:
unuse_pack(&w_curs);
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 3/7] packfile: extend `is_delta` field to allow for "unknown" state
2025-12-18 10:54 ` [PATCH v2 0/7] " Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 1/7] object-file: always set OI_LOOSE when " Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 2/7] packfile: always declare object info to be OI_PACKED Patrick Steinhardt
@ 2025-12-18 10:54 ` Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 4/7] packfile: always populate pack-specific info when reading object info Patrick Steinhardt
` (3 subsequent siblings)
6 siblings, 0 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 10:54 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
The `struct object_info::u::packed::is_delta` field determines whether
or not a specific object is stored as a delta. It only stores whether or
not the object is stored as delta, so it is treated as a boolean value.
This boolean is insufficient though: when reading a packed object via
`packfile_store_read_object_info()` we know to skip parsing the actual
object when the user didn't request any object-specific data. In that
case we won't read the object itself, but will only look up its position
in the packfile. Consequently, we do not know whether it is a delta or
not.
This isn't really an issue right now, as the check for an empty request
is broken. But a subsequent commit will fix it, and once we do we will
have the need to also represent an "unknown" delta state.
Prepare for this change by introducing a new enum that encodes the
object type. We don't use the "unknown" state just yet, but will start
to do so in the next commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.h | 7 ++++++-
packfile.c | 17 ++++++++++++++---
2 files changed, 20 insertions(+), 4 deletions(-)
diff --git a/odb.h b/odb.h
index 73b0b87ad5..afae5e5c01 100644
--- a/odb.h
+++ b/odb.h
@@ -343,7 +343,12 @@ struct object_info {
struct {
struct packed_git *pack;
off_t offset;
- unsigned int is_delta;
+ enum packed_object_type {
+ PACKED_OBJECT_TYPE_UNKNOWN,
+ PACKED_OBJECT_TYPE_FULL,
+ PACKED_OBJECT_TYPE_OFS_DELTA,
+ PACKED_OBJECT_TYPE_REF_DELTA,
+ } type;
} packed;
} u;
};
diff --git a/packfile.c b/packfile.c
index b0c6665c87..cc797b2b6a 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2159,8 +2159,18 @@ int packfile_store_read_object_info(struct packfile_store *store,
if (oi->whence == OI_PACKED) {
oi->u.packed.offset = e.offset;
oi->u.packed.pack = e.p;
- oi->u.packed.is_delta = (rtype == OBJ_REF_DELTA ||
- rtype == OBJ_OFS_DELTA);
+
+ switch (rtype) {
+ case OBJ_REF_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
+ break;
+ case OBJ_OFS_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA;
+ break;
+ default:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_FULL;
+ break;
+ }
}
return 0;
@@ -2531,7 +2541,8 @@ int packfile_store_read_object_stream(struct odb_read_stream **out,
oi.sizep = &size;
if (packfile_store_read_object_info(store, oid, &oi, 0) ||
- oi.u.packed.is_delta ||
+ oi.u.packed.type == PACKED_OBJECT_TYPE_REF_DELTA ||
+ oi.u.packed.type == PACKED_OBJECT_TYPE_OFS_DELTA ||
repo_settings_get_big_file_threshold(store->odb->repo) >= size)
return -1;
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 4/7] packfile: always populate pack-specific info when reading object info
2025-12-18 10:54 ` [PATCH v2 0/7] " Patrick Steinhardt
` (2 preceding siblings ...)
2025-12-18 10:54 ` [PATCH v2 3/7] packfile: extend `is_delta` field to allow for "unknown" state Patrick Steinhardt
@ 2025-12-18 10:54 ` Patrick Steinhardt
2025-12-30 17:03 ` Kristoffer Haugsbakk
2025-12-18 10:54 ` [PATCH v2 5/7] packfile: disentangle return value of `packed_object_info()` Patrick Steinhardt
` (2 subsequent siblings)
6 siblings, 1 reply; 23+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 10:54 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
When reading object information via `packed_object_info()` we may not
populate the object info's packfile-specific fields. This leads to
inconsistent object info depending on whether the info was populated via
`packfile_store_read_object_info()` or `packed_object_info()`.
Fix this inconsistecny so that we can always assume the pack info to be
populated when reading object info from a pack.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 31 ++++++++++++++-----------------
1 file changed, 14 insertions(+), 17 deletions(-)
diff --git a/packfile.c b/packfile.c
index cc797b2b6a..f7c33a2f77 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1657,6 +1657,20 @@ int packed_object_info(struct repository *r, struct packed_git *p,
}
oi->whence = OI_PACKED;
+ oi->u.packed.offset = obj_offset;
+ oi->u.packed.pack = p;
+
+ switch (type) {
+ case OBJ_REF_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
+ break;
+ case OBJ_OFS_DELTA:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA;
+ break;
+ default:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_FULL;
+ break;
+ }
out:
unuse_pack(&w_curs);
@@ -2156,23 +2170,6 @@ int packfile_store_read_object_info(struct packfile_store *store,
return -1;
}
- if (oi->whence == OI_PACKED) {
- oi->u.packed.offset = e.offset;
- oi->u.packed.pack = e.p;
-
- switch (rtype) {
- case OBJ_REF_DELTA:
- oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
- break;
- case OBJ_OFS_DELTA:
- oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA;
- break;
- default:
- oi->u.packed.type = PACKED_OBJECT_TYPE_FULL;
- break;
- }
- }
-
return 0;
}
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 5/7] packfile: disentangle return value of `packed_object_info()`
2025-12-18 10:54 ` [PATCH v2 0/7] " Patrick Steinhardt
` (3 preceding siblings ...)
2025-12-18 10:54 ` [PATCH v2 4/7] packfile: always populate pack-specific info when reading object info Patrick Steinhardt
@ 2025-12-18 10:54 ` Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 6/7] packfile: skip unpacking object header for disk size requests Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 7/7] packfile: drop repository parameter from `packed_object_info()` Patrick Steinhardt
6 siblings, 0 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 10:54 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
The `packed_object_info()` function returns the type of the packed
object. While we use an `enum object_type` to store the return value,
this type is not to be confused with the actual object type. It _may_
contain the object type, but it may just as well encode that the given
packed object is stored as a delta.
We have removed the only caller that relied on this returned object type
in the preceding commit, so let's simplify semantics and return either 0
on success or a negative error code otherwise.
This unblocks a small optimization where we can skip reading the object
type altogether.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 21 ++++++++++++---------
packfile.h | 4 ++++
2 files changed, 16 insertions(+), 9 deletions(-)
diff --git a/packfile.c b/packfile.c
index f7c33a2f77..8c6ef45a67 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1587,6 +1587,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
unsigned long size;
off_t curpos = obj_offset;
enum object_type type;
+ int ret;
/*
* We always get the representation type, but only convert it to
@@ -1607,12 +1608,12 @@ int packed_object_info(struct repository *r, struct packed_git *p,
off_t base_offset = get_delta_base(p, &w_curs, &tmp_pos,
type, obj_offset);
if (!base_offset) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
*oi->sizep = get_size_from_delta(p, &w_curs, tmp_pos);
if (*oi->sizep == 0) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
} else {
@@ -1625,7 +1626,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (offset_to_pack_pos(p, obj_offset, &pos) < 0) {
error("could not find object at offset %"PRIuMAX" "
"in pack %s", (uintmax_t)obj_offset, p->pack_name);
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
@@ -1639,7 +1640,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (oi->typep)
*oi->typep = ptot;
if (ptot < 0) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
}
@@ -1649,7 +1650,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (get_delta_base_oid(p, &w_curs, curpos,
oi->delta_base_oid,
type, obj_offset) < 0) {
- type = OBJ_BAD;
+ ret = -1;
goto out;
}
} else
@@ -1672,9 +1673,11 @@ int packed_object_info(struct repository *r, struct packed_git *p,
break;
}
+ ret = 0;
+
out:
unuse_pack(&w_curs);
- return type;
+ return ret;
}
static void *unpack_compressed_entry(struct packed_git *p,
@@ -2152,7 +2155,7 @@ int packfile_store_read_object_info(struct packfile_store *store,
unsigned flags UNUSED)
{
struct pack_entry e;
- int rtype;
+ int ret;
if (!find_pack_entry(store->odb->repo, oid, &e))
return 1;
@@ -2164,8 +2167,8 @@ int packfile_store_read_object_info(struct packfile_store *store,
if (!oi)
return 0;
- rtype = packed_object_info(store->odb->repo, e.p, e.offset, oi);
- if (rtype < 0) {
+ ret = packed_object_info(store->odb->repo, e.p, e.offset, oi);
+ if (ret < 0) {
mark_bad_packed_object(e.p, oid);
return -1;
}
diff --git a/packfile.h b/packfile.h
index 59d162a3f4..07f5bfbc4f 100644
--- a/packfile.h
+++ b/packfile.h
@@ -378,6 +378,10 @@ void release_pack_memory(size_t);
/* global flag to enable extra checks when accessing packed objects */
extern int do_check_packed_object_crc;
+/*
+ * Look up the object info for a specific offset in the packfile.
+ * success, a negative error code otherwise.
+ */
int packed_object_info(struct repository *r,
struct packed_git *pack,
off_t offset, struct object_info *);
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 6/7] packfile: skip unpacking object header for disk size requests
2025-12-18 10:54 ` [PATCH v2 0/7] " Patrick Steinhardt
` (4 preceding siblings ...)
2025-12-18 10:54 ` [PATCH v2 5/7] packfile: disentangle return value of `packed_object_info()` Patrick Steinhardt
@ 2025-12-18 10:54 ` Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 7/7] packfile: drop repository parameter from `packed_object_info()` Patrick Steinhardt
6 siblings, 0 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 10:54 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
While most of the object info requests for a packed object require us to
unpack its headers, reading its disk size doesn't. We still unpack the
object header in that case though, which is unnecessary work.
Skip reading the header if only the disk size is requested. This leads
to a small speedup when reading disk size, only. The following benchmark
was done in the Git repository:
Benchmark 1: ./git rev-list --disk-usage HEAD (rev = HEAD~)
Time (mean ± σ): 105.2 ms ± 0.6 ms [User: 91.4 ms, System: 13.3 ms]
Range (min … max): 103.7 ms … 106.0 ms 27 runs
Benchmark 2: ./git rev-list --disk-usage HEAD (rev = HEAD)
Time (mean ± σ): 96.7 ms ± 0.4 ms [User: 86.2 ms, System: 10.0 ms]
Range (min … max): 96.2 ms … 98.1 ms 30 runs
Summary
./git rev-list --disk-usage HEAD (rev = HEAD) ran
1.09 ± 0.01 times faster than ./git rev-list --disk-usage HEAD (rev = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/packfile.c b/packfile.c
index 8c6ef45a67..a2ba237ce7 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1586,7 +1586,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
struct pack_window *w_curs = NULL;
unsigned long size;
off_t curpos = obj_offset;
- enum object_type type;
+ enum object_type type = OBJ_NONE;
int ret;
/*
@@ -1598,7 +1598,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
&type);
if (!*oi->contentp)
type = OBJ_BAD;
- } else {
+ } else if (oi->sizep || oi->typep || oi->delta_base_oid) {
type = unpack_object_header(p, &w_curs, &curpos, &size);
}
@@ -1662,6 +1662,9 @@ int packed_object_info(struct repository *r, struct packed_git *p,
oi->u.packed.pack = p;
switch (type) {
+ case OBJ_NONE:
+ oi->u.packed.type = PACKED_OBJECT_TYPE_UNKNOWN;
+ break;
case OBJ_REF_DELTA:
oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
break;
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 7/7] packfile: drop repository parameter from `packed_object_info()`
2025-12-18 10:54 ` [PATCH v2 0/7] " Patrick Steinhardt
` (5 preceding siblings ...)
2025-12-18 10:54 ` [PATCH v2 6/7] packfile: skip unpacking object header for disk size requests Patrick Steinhardt
@ 2025-12-18 10:54 ` Patrick Steinhardt
6 siblings, 0 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2025-12-18 10:54 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Aaron Plattner
The function `packed_object_info()` takes a packfile and offset and
returns the object info for the corresponding object. Despite these two
parameters though it also takes a repository pointer. This is redundant
information though, as `struct packed_git` already has a repository
pointer that is always populated.
Drop the redundant parameter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/cat-file.c | 3 +--
builtin/pack-objects.c | 4 ++--
commit-graph.c | 2 +-
pack-bitmap.c | 3 +--
packfile.c | 8 ++++----
packfile.h | 3 +--
6 files changed, 10 insertions(+), 13 deletions(-)
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 505ddaa12f..2ad712e9f8 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -487,8 +487,7 @@ static void batch_object_write(const char *obj_name,
data->info.sizep = &data->size;
if (pack)
- ret = packed_object_info(the_repository, pack,
- offset, &data->info);
+ ret = packed_object_info(pack, offset, &data->info);
else
ret = odb_read_object_info_extended(the_repository->objects,
&data->oid, &data->info,
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 1ce8d6ee21..85762f8c4f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2411,7 +2411,7 @@ static void drop_reused_delta(struct object_entry *entry)
oi.sizep = &size;
oi.typep = &type;
- if (packed_object_info(the_repository, IN_PACK(entry), entry->in_pack_offset, &oi) < 0) {
+ if (packed_object_info(IN_PACK(entry), entry->in_pack_offset, &oi) < 0) {
/*
* We failed to get the info from this pack for some reason;
* fall back to odb_read_object_info, which may find another copy.
@@ -3748,7 +3748,7 @@ static int add_object_entry_from_pack(const struct object_id *oid,
struct object_info oi = OBJECT_INFO_INIT;
oi.typep = &type;
- if (packed_object_info(the_repository, p, ofs, &oi) < 0) {
+ if (packed_object_info(p, ofs, &oi) < 0) {
die(_("could not get type of object %s in pack %s"),
oid_to_hex(oid), p->pack_name);
} else if (type == OBJ_COMMIT) {
diff --git a/commit-graph.c b/commit-graph.c
index 80be2ff2c3..f572670bd0 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1499,7 +1499,7 @@ static int add_packed_commits(const struct object_id *oid,
display_progress(ctx->progress, ++ctx->progress_done);
oi.typep = &type;
- if (packed_object_info(ctx->r, pack, offset, &oi) < 0)
+ if (packed_object_info(pack, offset, &oi) < 0)
die(_("unable to get type of object %s"), oid_to_hex(oid));
if (type != OBJ_COMMIT)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 8ca79725b1..972203f12b 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1876,8 +1876,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
ofs = pack_pos_to_offset(pack, pos);
}
- if (packed_object_info(bitmap_repo(bitmap_git), pack, ofs,
- &oi) < 0) {
+ if (packed_object_info(pack, ofs, &oi) < 0) {
struct object_id oid;
nth_bitmap_object_oid(bitmap_git, &oid,
pack_pos_to_index(pack, pos));
diff --git a/packfile.c b/packfile.c
index a2ba237ce7..39899aec49 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1580,7 +1580,7 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
hashmap_add(&delta_base_cache, &ent->ent);
}
-int packed_object_info(struct repository *r, struct packed_git *p,
+int packed_object_info(struct packed_git *p,
off_t obj_offset, struct object_info *oi)
{
struct pack_window *w_curs = NULL;
@@ -1594,7 +1594,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
* a "real" type later if the caller is interested.
*/
if (oi->contentp) {
- *oi->contentp = cache_or_unpack_entry(r, p, obj_offset, oi->sizep,
+ *oi->contentp = cache_or_unpack_entry(p->repo, p, obj_offset, oi->sizep,
&type);
if (!*oi->contentp)
type = OBJ_BAD;
@@ -1635,7 +1635,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
if (oi->typep) {
enum object_type ptot;
- ptot = packed_to_object_type(r, p, obj_offset,
+ ptot = packed_to_object_type(p->repo, p, obj_offset,
type, &w_curs, curpos);
if (oi->typep)
*oi->typep = ptot;
@@ -2170,7 +2170,7 @@ int packfile_store_read_object_info(struct packfile_store *store,
if (!oi)
return 0;
- ret = packed_object_info(store->odb->repo, e.p, e.offset, oi);
+ ret = packed_object_info(e.p, e.offset, oi);
if (ret < 0) {
mark_bad_packed_object(e.p, oid);
return -1;
diff --git a/packfile.h b/packfile.h
index 07f5bfbc4f..573d06f6ba 100644
--- a/packfile.h
+++ b/packfile.h
@@ -382,8 +382,7 @@ extern int do_check_packed_object_crc;
* Look up the object info for a specific offset in the packfile.
* success, a negative error code otherwise.
*/
-int packed_object_info(struct repository *r,
- struct packed_git *pack,
+int packed_object_info(struct packed_git *pack,
off_t offset, struct object_info *);
void mark_bad_packed_object(struct packed_git *, const struct object_id *);
--
2.52.0.351.gbe84eed79e.dirty
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v2 4/7] packfile: always populate pack-specific info when reading object info
2025-12-18 10:54 ` [PATCH v2 4/7] packfile: always populate pack-specific info when reading object info Patrick Steinhardt
@ 2025-12-30 17:03 ` Kristoffer Haugsbakk
0 siblings, 0 replies; 23+ messages in thread
From: Kristoffer Haugsbakk @ 2025-12-30 17:03 UTC (permalink / raw)
To: Patrick Steinhardt, git; +Cc: Junio C Hamano, Aaron Plattner
On Thu, Dec 18, 2025, at 11:54, Patrick Steinhardt wrote:
> When reading object information via `packed_object_info()` we may not
> populate the object info's packfile-specific fields. This leads to
> inconsistent object info depending on whether the info was populated via
> `packfile_store_read_object_info()` or `packed_object_info()`.
>
> Fix this inconsistecny so that we can always assume the pack info to be
s/inconsistecny/inconsistency/
> populated when reading object info from a pack.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
>[snip]
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2025-12-30 17:03 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-18 6:28 [PATCH 0/8] Improvements for reading object info Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 1/8] object-file: always set OI_LOOSE when " Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 2/8] packfile: always declare object info to be OI_PACKED Patrick Steinhardt
2025-12-18 7:23 ` Junio C Hamano
2025-12-18 9:10 ` Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 3/8] packfile: extend `is_delta` field to allow for "unknown" state Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 4/8] packfile: always populate pack-specific info when reading object info Patrick Steinhardt
2025-12-18 7:32 ` Junio C Hamano
2025-12-18 6:28 ` [PATCH 5/8] packfile: disentangle return value of `packed_object_info()` Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 6/8] packfile: skip unpacking object header for disk size requests Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 7/8] packfile: fix short-circuiting of empty requests Patrick Steinhardt
2025-12-18 6:28 ` [PATCH 8/8] packfile: drop repository parameter from `packed_object_info()` Patrick Steinhardt
2025-12-18 8:09 ` [PATCH 0/8] Improvements for reading object info Junio C Hamano
2025-12-18 8:30 ` Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 0/7] " Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 1/7] object-file: always set OI_LOOSE when " Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 2/7] packfile: always declare object info to be OI_PACKED Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 3/7] packfile: extend `is_delta` field to allow for "unknown" state Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 4/7] packfile: always populate pack-specific info when reading object info Patrick Steinhardt
2025-12-30 17:03 ` Kristoffer Haugsbakk
2025-12-18 10:54 ` [PATCH v2 5/7] packfile: disentangle return value of `packed_object_info()` Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 6/7] packfile: skip unpacking object header for disk size requests Patrick Steinhardt
2025-12-18 10:54 ` [PATCH v2 7/7] packfile: drop repository parameter from `packed_object_info()` Patrick Steinhardt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).