* Re: [PATCH GSoC RFC v13 05/12] fetch-pack: move function to connect.c
From: Pablo Sabater @ 2026-06-24 12:21 UTC (permalink / raw)
To: Karthik Nayak
Cc: gitster, peff, eric.peijian, chriscool, git, jltobler, toon,
chandrapratap3519
In-Reply-To: <CAOLa=ZRUoBKPAjh6He0qgdZdzAzMxmeS9RMRi-czpHEfKG6EKw@mail.gmail.com>
El lun, 22 jun 2026 a las 12:30, Karthik Nayak
(<karthik.188@gmail.com>) escribió:
>
> Pablo Sabater <pabloosabaterr@gmail.com> writes:
>
> > write_fetch_command_and_capabilities will be refactored in a subsequent
> > commit where it will become a more general-purpose function, making it
> > more accessible to additional commands in the future.
>
> Okay.
>
> > To move `write_fetch_command_and_capabilities()` to `connect.c`, we need
> > to adjust how `advertise_sid` is managed. Previously in `fetch_pack.c`,
> > `advertise_sid` was a static variable, modified using
> > `repo_config_get_bool()`.
>
> Nit: What's missing is why do we need to move it to 'connect.c', I
> assume this is because it being generic means its better placed in
> connect.c over 'fetch-pack.c'. Would be nice to explicitly mention that
> perhaps?
True, it is for that reason, I'll write it explicitly in the next
version, thanks!
>
> >
> > In `connect.c`, we now initialize `advertise_sid` at the begining by
> > directly using `repo_config_get_bool()`. This change is safe because:
> >
> > In the original `fetch-pack.c` code, there are only two places that write
> > `advertise_sid`:
> >
> > 1. In function `do_fetch_pack()`:
> > if (!sever_supports("session_id"))
> > advertise_sid = 0;
> > 2. In function `fetch_pack_config()`:
> > repo_config_get_bool("transfer.advertisesid", &advertise_sid);
> >
> > About 1, since `do_fetch_pack()` is only relevant for protocol v1, this
> > assignment can be ignored, as `write_fetch_command_and_capabilities()`
> > is only used in v2.
> >
> > About 2, `repo_config_get_bool()` is from `config.h` and it's an out-of-box
> > dependency of `connect.c`, so we can reuse it directly.
> >
> > Move `write_fetch_command_and_capabilities()` to `connect.c`
> >
>
> Nit: Wouldn't it then make sense to split this into two?
> 1. Drop usage of the static `advertise_sid` within
> `write_fetch_command_and_capabilities()`.
> 2. Move `write_fetch_command_and_capabilities()` to `connect.c`
>
> That way the second patch is simply a move?
Okay, seems fair, I'll do that, thanks.
>
> [snip]
Thanks for the review,
Pablo
^ permalink raw reply
* [PATCH 6/6] odb: document object info fields
From: Patrick Steinhardt @ 2026-06-24 12:19 UTC (permalink / raw)
To: git
In-Reply-To: <20260624-b4-pks-odb-drop-whence-v1-0-8d1877b790ac@pks.im>
Some of the fields in `struct object_info` are undocumented. Add these
missing comments.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.h | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/odb.h b/odb.h
index e0d05eaf87..a44ec46b08 100644
--- a/odb.h
+++ b/odb.h
@@ -283,12 +283,28 @@ struct object_info_source {
} u;
};
+/*
+ * The object info contains the query and response that is to be used for
+ * functions that end up reading object information. Callers are expected to
+ * populate pointers whose information they want to request.
+ */
struct object_info {
- /* Request */
+ /* The object type. */
enum object_type *typep;
+
+ /* The inflated object size in bytes. */
size_t *sizep;
+
+ /* The object size as stored on disk. */
off_t *disk_sizep;
+
+ /*
+ * The base the object is deltified against, in case it is stored as a
+ * delta.
+ */
struct object_id *delta_base_oid;
+
+ /* The object contents. Ownership of memory goes over to the caller. */
void **contentp;
/*
--
2.55.0.rc1.745.g43192e7977.dirty
^ permalink raw reply related
* [PATCH 5/6] odb: drop `whence` field from object info
From: Patrick Steinhardt @ 2026-06-24 12:19 UTC (permalink / raw)
To: git
In-Reply-To: <20260624-b4-pks-odb-drop-whence-v1-0-8d1877b790ac@pks.im>
In the preceding commits we have migrated all callers to derive their
information of how a specific object is stored to use the new object
info source instead, and hence the field is now unused. Drop it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.c | 1 -
odb.h | 7 -------
odb/source-inmemory.c | 2 --
odb/source-loose.c | 2 --
packfile.c | 2 --
5 files changed, 14 deletions(-)
diff --git a/odb.c b/odb.c
index 99f4e7551c..82c41f1793 100644
--- a/odb.c
+++ b/odb.c
@@ -691,7 +691,6 @@ static int oid_object_info_convert(struct repository *r,
return -1;
}
}
- input_oi->whence = new_oi.whence;
if (input_oi->sourcep)
*input_oi->sourcep = *new_oi.sourcep;
return ret;
diff --git a/odb.h b/odb.h
index 330a55879e..e0d05eaf87 100644
--- a/odb.h
+++ b/odb.h
@@ -311,13 +311,6 @@ struct object_info {
* or multiple times in the same source.
*/
struct object_info_source *sourcep;
-
- /* Response */
- enum {
- OI_CACHED,
- OI_LOOSE,
- OI_PACKED,
- } whence;
};
/*
diff --git a/odb/source-inmemory.c b/odb/source-inmemory.c
index 2328e62687..008e49bfe9 100644
--- a/odb/source-inmemory.c
+++ b/odb/source-inmemory.c
@@ -54,8 +54,6 @@ static void populate_object_info(struct odb_source_inmemory *source,
*oi->mtimep = 0;
if (oi->sourcep)
oi->sourcep->source = &source->base;
-
- oi->whence = OI_CACHED;
}
static int odb_source_inmemory_read_object_info(struct odb_source *source,
diff --git a/odb/source-loose.c b/odb/source-loose.c
index 5c4e9892b5..e743ccab42 100644
--- a/odb/source-loose.c
+++ b/odb/source-loose.c
@@ -198,8 +198,6 @@ static int read_object_info_from_path(struct odb_source_loose *loose,
oidclr(oi->delta_base_oid, loose->base.odb->repo->hash_algo);
if (oi->sourcep && !ret)
oi->sourcep->source = &loose->base;
- if (!ret)
- oi->whence = OI_LOOSE;
}
return ret;
diff --git a/packfile.c b/packfile.c
index fa22095b75..4a8c108034 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1421,8 +1421,6 @@ int packed_object_info_with_index_pos(struct odb_source_packed *source,
oidclr(oi->delta_base_oid, p->repo->hash_algo);
}
- oi->whence = OI_PACKED;
-
if (oi->sourcep) {
if (!source)
BUG("cannot request source without an owning source");
--
2.55.0.rc1.745.g43192e7977.dirty
^ permalink raw reply related
* [PATCH 4/6] treewide: convert users of `whence` to the new source field
From: Patrick Steinhardt @ 2026-06-24 12:19 UTC (permalink / raw)
To: git
In-Reply-To: <20260624-b4-pks-odb-drop-whence-v1-0-8d1877b790ac@pks.im>
The `whence` field has become redundant now that callers can learn about
the exact source an object has been looked up from via the `struct
object_info_source::source` field.
Adapt callers to use the new field. Note that all callsites already set
up the `info.sourcep` request pointer, so the conversion is rather
straight-forward.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/cat-file.c | 2 +-
builtin/index-pack.c | 3 ++-
builtin/pack-objects.c | 2 +-
reachable.c | 2 +-
4 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index adc626ce30..1b96150e5b 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -834,7 +834,7 @@ static int batch_one_object_oi(const struct object_id *oid,
void *_payload)
{
struct for_each_object_payload *payload = _payload;
- if (oi && oi->whence == OI_PACKED)
+ if (oi && oi->sourcep->source->type == ODB_SOURCE_PACKED)
return payload->callback(oid, oi->sourcep->u.packed.pack,
oi->sourcep->u.packed.offset,
payload->payload);
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 77af26db8f..1b03b07e5e 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1833,7 +1833,8 @@ static void repack_local_links(void)
if (odb_read_object_info_extended(the_repository->objects, oid, &info, 0))
/* Missing; assume it is a promisor object */
continue;
- if (info.whence == OI_PACKED && info_source.u.packed.pack->pack_promisor)
+ if (info_source.source->type == ODB_SOURCE_PACKED &&
+ info_source.u.packed.pack->pack_promisor)
continue;
if (!cmd.args.nr) {
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 9deb37e9e8..d0fdfad750 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -5010,7 +5010,7 @@ static int is_not_in_promisor_pack_obj(struct object *obj, void *data UNUSED)
if (odb_read_object_info_extended(the_repository->objects, &obj->oid, &info, 0))
BUG("should_include_obj should only be called on existing objects");
- return info.whence != OI_PACKED || !info_source.u.packed.pack->pack_promisor;
+ return info_source.source->type != ODB_SOURCE_PACKED || !info_source.u.packed.pack->pack_promisor;
}
static int is_not_in_promisor_pack(struct commit *commit, void *data) {
diff --git a/reachable.c b/reachable.c
index 2fc5b82d62..123a658944 100644
--- a/reachable.c
+++ b/reachable.c
@@ -234,7 +234,7 @@ static int add_recent_object(const struct object_id *oid,
add_pending_object(data->revs, obj, "");
if (data->cb) {
- if (oi->whence == OI_PACKED)
+ if (oi->sourcep->source->type == ODB_SOURCE_PACKED)
data->cb(obj, oi->sourcep->u.packed.pack,
oi->sourcep->u.packed.offset, *oi->mtimep);
else
--
2.55.0.rc1.745.g43192e7977.dirty
^ permalink raw reply related
* [PATCH 3/6] odb: add `source` field to struct object_info_source
From: Patrick Steinhardt @ 2026-06-24 12:19 UTC (permalink / raw)
To: git
In-Reply-To: <20260624-b4-pks-odb-drop-whence-v1-0-8d1877b790ac@pks.im>
The previous commit introduced `struct object_info_source` as an opt-in
container for backend-specific information, but for now we only moved
preexisting data into this structure. Most importantly, the caller has
no way yet to learn about which source an object was actually looked up
from. Instead, callers have to rely on the `whence` enum to distinguish
the object type, but cannot use that enum to tell the object source.
Add a `struct odb_source *source` field to the structure and populate it
from each backend's lookup path.
The `whence` enum is still set and used by callers; it will be removed
in a subsequent commit now that `sourcep->source` can identify the
backend on its own.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
odb.h | 3 +++
odb/source-inmemory.c | 3 +++
odb/source-loose.c | 2 ++
packfile.c | 6 +++++-
4 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/odb.h b/odb.h
index 770900289a..330a55879e 100644
--- a/odb.h
+++ b/odb.h
@@ -253,6 +253,9 @@ int odb_pretend_object(struct object_database *odb,
* more about how exactly it is stored.
*/
struct object_info_source {
+ /* The source that this object has been looked up from. */
+ struct odb_source *source;
+
/*
* Backend-specific information about the specific object. This can be
* used for example to uniquely identify a given object in case it
diff --git a/odb/source-inmemory.c b/odb/source-inmemory.c
index e004566d76..2328e62687 100644
--- a/odb/source-inmemory.c
+++ b/odb/source-inmemory.c
@@ -52,6 +52,9 @@ static void populate_object_info(struct odb_source_inmemory *source,
*oi->contentp = xmemdupz(object->buf, object->size);
if (oi->mtimep)
*oi->mtimep = 0;
+ if (oi->sourcep)
+ oi->sourcep->source = &source->base;
+
oi->whence = OI_CACHED;
}
diff --git a/odb/source-loose.c b/odb/source-loose.c
index 66e6bb8d3f..5c4e9892b5 100644
--- a/odb/source-loose.c
+++ b/odb/source-loose.c
@@ -196,6 +196,8 @@ static int read_object_info_from_path(struct odb_source_loose *loose,
oi->typep = NULL;
if (oi->delta_base_oid)
oidclr(oi->delta_base_oid, loose->base.odb->repo->hash_algo);
+ if (oi->sourcep && !ret)
+ oi->sourcep->source = &loose->base;
if (!ret)
oi->whence = OI_LOOSE;
}
diff --git a/packfile.c b/packfile.c
index 688c410b35..fa22095b75 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1324,7 +1324,7 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
hashmap_add(&delta_base_cache, &ent->ent);
}
-int packed_object_info_with_index_pos(struct odb_source_packed *source UNUSED,
+int packed_object_info_with_index_pos(struct odb_source_packed *source,
struct packed_git *p, off_t obj_offset,
uint32_t *maybe_index_pos, struct object_info *oi)
{
@@ -1424,6 +1424,10 @@ int packed_object_info_with_index_pos(struct odb_source_packed *source UNUSED,
oi->whence = OI_PACKED;
if (oi->sourcep) {
+ if (!source)
+ BUG("cannot request source without an owning source");
+ oi->sourcep->source = &source->base;
+
oi->sourcep->u.packed.offset = obj_offset;
oi->sourcep->u.packed.pack = p;
--
2.55.0.rc1.745.g43192e7977.dirty
^ permalink raw reply related
* [PATCH 2/6] odb: make backend-specific fields optional
From: Patrick Steinhardt @ 2026-06-24 12:19 UTC (permalink / raw)
To: git
In-Reply-To: <20260624-b4-pks-odb-drop-whence-v1-0-8d1877b790ac@pks.im>
The `struct object_info` carries two pieces of information
about how an object was looked up:
- The `whence` enum identifying the backend.
- The backend-tagged union `u` exposing backend-specific details
(currently only the packed-source case, which records the owning
pack, offset and packed object type).
The union is populated unconditionally, even though most callers don't
care about provenance at all.
Split the backend-specific union out into a new public type, `struct
object_info_source`, and make the object info structure carry it via
just another opt-in request pointer. As with all the other requestable
information, callers that need source info allocate a `struct
object_info_source` on the stack and point `sourcep` at it; callers that
don't care about it simply leave the field as a `NULL` pointer. Adapt
callers accordingly.
Note that the `whence` enum is strictly-speaking also backend-specific
information, so it would be another good candidate to be moved into the
`struct object_info_source`. For now though it is left alone, as it will
be replaced by a `struct odb_source` pointer in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/cat-file.c | 8 +++++--
builtin/index-pack.c | 8 +++++--
builtin/pack-objects.c | 15 +++++++++----
odb.c | 3 ++-
odb.h | 60 +++++++++++++++++++++++++++++++++-----------------
packfile.c | 33 ++++++++++++++-------------
reachable.c | 5 ++++-
7 files changed, 87 insertions(+), 45 deletions(-)
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 8726485f1f..adc626ce30 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -835,7 +835,8 @@ static int batch_one_object_oi(const struct object_id *oid,
{
struct for_each_object_payload *payload = _payload;
if (oi && oi->whence == OI_PACKED)
- return payload->callback(oid, oi->u.packed.pack, oi->u.packed.offset,
+ return payload->callback(oid, oi->sourcep->u.packed.pack,
+ oi->sourcep->u.packed.offset,
payload->payload);
return payload->callback(oid, NULL, 0, payload->payload);
}
@@ -906,7 +907,10 @@ static void batch_each_object(struct batch_options *opt,
&payload, flags);
}
} else {
- struct object_info oi = { 0 };
+ struct object_info_source oi_source;
+ struct object_info oi = {
+ .sourcep = &oi_source,
+ };
for (source = the_repository->objects->sources; source; source = source->next) {
struct odb_source_files *files = odb_source_files_downcast(source);
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index f396658468..77af26db8f 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1825,11 +1825,15 @@ static void repack_local_links(void)
oidset_iter_init(&outgoing_links, &iter);
while ((oid = oidset_iter_next(&iter))) {
- struct object_info info = OBJECT_INFO_INIT;
+ struct object_info_source info_source;
+ struct object_info info = {
+ .sourcep = &info_source,
+ };
+
if (odb_read_object_info_extended(the_repository->objects, oid, &info, 0))
/* Missing; assume it is a promisor object */
continue;
- if (info.whence == OI_PACKED && info.u.packed.pack->pack_promisor)
+ if (info.whence == OI_PACKED && info_source.u.packed.pack->pack_promisor)
continue;
if (!cmd.args.nr) {
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 620d9ce085..9deb37e9e8 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -4491,8 +4491,9 @@ static int add_object_in_unpacked_pack(const struct object_id *oid,
void *data UNUSED)
{
if (cruft) {
- add_cruft_object_entry(oid, OBJ_NONE, oi->u.packed.pack,
- oi->u.packed.offset, NULL, *oi->mtimep);
+ add_cruft_object_entry(oid, OBJ_NONE, oi->sourcep->u.packed.pack,
+ oi->sourcep->u.packed.offset, NULL,
+ *oi->mtimep);
} else {
add_object_entry(oid, OBJ_NONE, "", 0);
}
@@ -4509,8 +4510,10 @@ static void add_objects_in_unpacked_packs(void)
ODB_FOR_EACH_OBJECT_SKIP_IN_CORE_KEPT_PACKS |
ODB_FOR_EACH_OBJECT_SKIP_ON_DISK_KEPT_PACKS,
};
+ struct object_info_source oi_source;
struct object_info oi = {
.mtimep = &mtime,
+ .sourcep = &oi_source,
};
odb_prepare_alternates(to_pack.repo->objects);
@@ -5000,10 +5003,14 @@ static int option_parse_cruft_expiration(const struct option *opt UNUSED,
static int is_not_in_promisor_pack_obj(struct object *obj, void *data UNUSED)
{
- struct object_info info = OBJECT_INFO_INIT;
+ struct object_info_source info_source;
+ struct object_info info = {
+ .sourcep = &info_source,
+ };
+
if (odb_read_object_info_extended(the_repository->objects, &obj->oid, &info, 0))
BUG("should_include_obj should only be called on existing objects");
- return info.whence != OI_PACKED || !info.u.packed.pack->pack_promisor;
+ return info.whence != OI_PACKED || !info_source.u.packed.pack->pack_promisor;
}
static int is_not_in_promisor_pack(struct commit *commit, void *data) {
diff --git a/odb.c b/odb.c
index 7d555be09f..99f4e7551c 100644
--- a/odb.c
+++ b/odb.c
@@ -692,7 +692,8 @@ static int oid_object_info_convert(struct repository *r,
}
}
input_oi->whence = new_oi.whence;
- input_oi->u = new_oi.u;
+ if (input_oi->sourcep)
+ *input_oi->sourcep = *new_oi.sourcep;
return ret;
}
diff --git a/odb.h b/odb.h
index 3834a0dcbf..770900289a 100644
--- a/odb.h
+++ b/odb.h
@@ -248,6 +248,38 @@ int odb_pretend_object(struct object_database *odb,
void *buf, size_t len, enum object_type type,
struct object_id *oid);
+/*
+ * Object information that can be used to uniquely identify an object and learn
+ * more about how exactly it is stored.
+ */
+struct object_info_source {
+ /*
+ * Backend-specific information about the specific object. This can be
+ * used for example to uniquely identify a given object in case it
+ * exists multiple times.
+ */
+ union {
+ /*
+ * struct {
+ * ... Nothing to expose in this case
+ * } cached;
+ * struct {
+ * ... Nothing to expose in this case
+ * } loose;
+ */
+ struct {
+ struct packed_git *pack;
+ off_t offset;
+ enum packed_object_type {
+ PACKED_OBJECT_TYPE_UNKNOWN,
+ PACKED_OBJECT_TYPE_FULL,
+ PACKED_OBJECT_TYPE_OFS_DELTA,
+ PACKED_OBJECT_TYPE_REF_DELTA,
+ } type;
+ } packed;
+ } u;
+};
+
struct object_info {
/* Request */
enum object_type *typep;
@@ -269,32 +301,20 @@ struct object_info {
*/
time_t *mtimep;
+ /*
+ * Backend-specific information that tells the caller where exactly an
+ * object was looked up from. This information should help disambiguate
+ * object lookups in case the same object exists in multiple sources,
+ * or multiple times in the same source.
+ */
+ struct object_info_source *sourcep;
+
/* Response */
enum {
OI_CACHED,
OI_LOOSE,
OI_PACKED,
} whence;
- union {
- /*
- * struct {
- * ... Nothing to expose in this case
- * } cached;
- * struct {
- * ... Nothing to expose in this case
- * } loose;
- */
- struct {
- struct packed_git *pack;
- off_t offset;
- enum packed_object_type {
- PACKED_OBJECT_TYPE_UNKNOWN,
- PACKED_OBJECT_TYPE_FULL,
- PACKED_OBJECT_TYPE_OFS_DELTA,
- PACKED_OBJECT_TYPE_REF_DELTA,
- } type;
- } packed;
- } u;
};
/*
diff --git a/packfile.c b/packfile.c
index 2b741d7a76..688c410b35 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1422,22 +1422,25 @@ int packed_object_info_with_index_pos(struct odb_source_packed *source UNUSED,
}
oi->whence = OI_PACKED;
- oi->u.packed.offset = obj_offset;
- oi->u.packed.pack = p;
- switch (type) {
- case OBJ_NONE:
- oi->u.packed.type = PACKED_OBJECT_TYPE_UNKNOWN;
- break;
- case OBJ_REF_DELTA:
- oi->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
- break;
- case OBJ_OFS_DELTA:
- oi->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA;
- break;
- default:
- oi->u.packed.type = PACKED_OBJECT_TYPE_FULL;
- break;
+ if (oi->sourcep) {
+ oi->sourcep->u.packed.offset = obj_offset;
+ oi->sourcep->u.packed.pack = p;
+
+ switch (type) {
+ case OBJ_NONE:
+ oi->sourcep->u.packed.type = PACKED_OBJECT_TYPE_UNKNOWN;
+ break;
+ case OBJ_REF_DELTA:
+ oi->sourcep->u.packed.type = PACKED_OBJECT_TYPE_REF_DELTA;
+ break;
+ case OBJ_OFS_DELTA:
+ oi->sourcep->u.packed.type = PACKED_OBJECT_TYPE_OFS_DELTA;
+ break;
+ default:
+ oi->sourcep->u.packed.type = PACKED_OBJECT_TYPE_FULL;
+ break;
+ }
}
ret = 0;
diff --git a/reachable.c b/reachable.c
index 101cfc2727..2fc5b82d62 100644
--- a/reachable.c
+++ b/reachable.c
@@ -235,7 +235,8 @@ static int add_recent_object(const struct object_id *oid,
add_pending_object(data->revs, obj, "");
if (data->cb) {
if (oi->whence == OI_PACKED)
- data->cb(obj, oi->u.packed.pack, oi->u.packed.offset, *oi->mtimep);
+ data->cb(obj, oi->sourcep->u.packed.pack,
+ oi->sourcep->u.packed.offset, *oi->mtimep);
else
data->cb(obj, NULL, 0, *oi->mtimep);
}
@@ -252,9 +253,11 @@ int add_unseen_recent_objects_to_traversal(struct rev_info *revs,
unsigned flags;
enum object_type type;
time_t mtime;
+ struct object_info_source oi_source;
struct object_info oi = {
.mtimep = &mtime,
.typep = &type,
+ .sourcep = &oi_source,
};
int r;
--
2.55.0.rc1.745.g43192e7977.dirty
^ permalink raw reply related
* [PATCH 1/6] packfile: thread odb_source_packed through packed_object_info()
From: Patrick Steinhardt @ 2026-06-24 12:19 UTC (permalink / raw)
To: git
In-Reply-To: <20260624-b4-pks-odb-drop-whence-v1-0-8d1877b790ac@pks.im>
Add an optional `struct odb_source_packed *source` parameter to
`packed_object_info()` and `packed_object_info_with_index_pos()`. This
parameter is unused at this point in time, but it will be used in a
follow-up commit so that we can record the source of a specific object.
Note that callers in "odb/source-packed.c" pass the already-available
source, but all other callers pass `NULL` instead. This is fine though,
as we only care about populating this info when called via the packed
store.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/cat-file.c | 2 +-
builtin/pack-objects.c | 4 ++--
commit-graph.c | 2 +-
odb/source-packed.c | 4 ++--
pack-bitmap.c | 2 +-
packfile.c | 8 +++++---
packfile.h | 6 ++++--
t/helper/test-bitmap.c | 2 +-
8 files changed, 17 insertions(+), 13 deletions(-)
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 0f3dbd9850..8726485f1f 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -497,7 +497,7 @@ static void batch_object_write(const char *obj_name,
data->info.sizep = &data->size;
if (pack)
- ret = packed_object_info(pack, offset, &data->info);
+ ret = packed_object_info(NULL, pack, offset, &data->info);
else
ret = odb_read_object_info_extended(the_repository->objects,
&data->oid, &data->info,
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index bc5f9ef321..620d9ce085 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2463,7 +2463,7 @@ static void drop_reused_delta(struct object_entry *entry)
oi.sizep = &size;
oi.typep = &type;
- if (packed_object_info(IN_PACK(entry), entry->in_pack_offset, &oi) < 0) {
+ if (packed_object_info(NULL, IN_PACK(entry), entry->in_pack_offset, &oi) < 0) {
/*
* We failed to get the info from this pack for some reason;
* fall back to odb_read_object_info, which may find another copy.
@@ -3804,7 +3804,7 @@ static int add_object_entry_from_pack(const struct object_id *oid,
ofs = nth_packed_object_offset(p, pos);
oi.typep = &type;
- if (packed_object_info(p, ofs, &oi) < 0) {
+ if (packed_object_info(NULL, p, ofs, &oi) < 0) {
die(_("could not get type of object %s in pack %s"),
oid_to_hex(oid), p->pack_name);
} else if (type == OBJ_COMMIT) {
diff --git a/commit-graph.c b/commit-graph.c
index c6d9c5c740..9dc8bd5eee 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1538,7 +1538,7 @@ static int add_packed_commits(const struct object_id *oid,
struct object_info oi = OBJECT_INFO_INIT;
oi.typep = &type;
- if (packed_object_info(pack, offset, &oi) < 0)
+ if (packed_object_info(NULL, pack, offset, &oi) < 0)
die(_("unable to get type of object %s"), oid_to_hex(oid));
return add_packed_commits_oi(oid, &oi, data);
diff --git a/odb/source-packed.c b/odb/source-packed.c
index 42c28fba0e..43fb53b72d 100644
--- a/odb/source-packed.c
+++ b/odb/source-packed.c
@@ -59,7 +59,7 @@ static int odb_source_packed_read_object_info(struct odb_source *source,
if (!oi)
return 0;
- ret = packed_object_info(e.p, e.offset, oi);
+ ret = packed_object_info(packed, e.p, e.offset, oi);
if (ret < 0) {
mark_bad_packed_object(e.p, oid);
return -1;
@@ -99,7 +99,7 @@ static int odb_source_packed_for_each_object_wrapper(const struct object_id *oid
off_t offset = nth_packed_object_offset(pack, index_pos);
struct object_info oi = *data->request;
- if (packed_object_info_with_index_pos(pack, offset,
+ if (packed_object_info_with_index_pos(data->store, pack, offset,
&index_pos, &oi) < 0) {
mark_bad_packed_object(pack, oid);
return -1;
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 83eb47a28b..35774b6f0c 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1877,7 +1877,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
ofs = pack_pos_to_offset(pack, pos);
}
- if (packed_object_info(pack, ofs, &oi) < 0) {
+ if (packed_object_info(NULL, pack, ofs, &oi) < 0) {
struct object_id oid;
nth_bitmap_object_oid(bitmap_git, &oid,
pack_pos_to_index(pack, pos));
diff --git a/packfile.c b/packfile.c
index 1d1b23b6cc..2b741d7a76 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1324,7 +1324,8 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
hashmap_add(&delta_base_cache, &ent->ent);
}
-int packed_object_info_with_index_pos(struct packed_git *p, off_t obj_offset,
+int packed_object_info_with_index_pos(struct odb_source_packed *source UNUSED,
+ struct packed_git *p, off_t obj_offset,
uint32_t *maybe_index_pos, struct object_info *oi)
{
struct pack_window *w_curs = NULL;
@@ -1446,10 +1447,11 @@ int packed_object_info_with_index_pos(struct packed_git *p, off_t obj_offset,
return ret;
}
-int packed_object_info(struct packed_git *p, off_t obj_offset,
+int packed_object_info(struct odb_source_packed *source,
+ struct packed_git *p, off_t obj_offset,
struct object_info *oi)
{
- return packed_object_info_with_index_pos(p, obj_offset, NULL, oi);
+ return packed_object_info_with_index_pos(source, p, obj_offset, NULL, oi);
}
static void *unpack_compressed_entry(struct packed_git *p,
diff --git a/packfile.h b/packfile.h
index 2329a69701..e1f77152b5 100644
--- a/packfile.h
+++ b/packfile.h
@@ -320,9 +320,11 @@ extern int do_check_packed_object_crc;
* Look up the object info for a specific offset in the packfile.
* Returns zero on success, a negative error code otherwise.
*/
-int packed_object_info(struct packed_git *pack,
+int packed_object_info(struct odb_source_packed *source,
+ struct packed_git *pack,
off_t offset, struct object_info *);
-int packed_object_info_with_index_pos(struct packed_git *p, off_t obj_offset,
+int packed_object_info_with_index_pos(struct odb_source_packed *source,
+ struct packed_git *p, off_t obj_offset,
uint32_t *maybe_index_pos, struct object_info *oi);
void mark_bad_packed_object(struct packed_git *, const struct object_id *);
diff --git a/t/helper/test-bitmap.c b/t/helper/test-bitmap.c
index b130832b81..8547ef67e2 100644
--- a/t/helper/test-bitmap.c
+++ b/t/helper/test-bitmap.c
@@ -52,7 +52,7 @@ static int add_packed_object(const struct object_id *oid,
entry = packlist_alloc(packed, oid);
entry->idx.offset = nth_packed_object_offset(pack, pos);
- if (packed_object_info(pack, entry->idx.offset, &oi) < 0)
+ if (packed_object_info(NULL, pack, entry->idx.offset, &oi) < 0)
die("could not get type of object %s",
oid_to_hex(oid));
oe_set_type(entry, type);
--
2.55.0.rc1.745.g43192e7977.dirty
^ permalink raw reply related
* [PATCH 0/6] odb: refactor source-specific information in object info
From: Patrick Steinhardt @ 2026-06-24 12:19 UTC (permalink / raw)
To: git
Hi,
this patch series refactors `struct object_info` to not contain the
`whence` field anymore.
This field only gave the caller information about the type of source
this was read from, but it didn't allow them to figure out which source
specifically yielded the object. So instead, we replace this information
with a new `struct object_info_source` field that both contains info
about the source, and any backend-specific data.
With this in place we can re-query the same backend for any given
object. More importantly though, we can eventually also use the backend-
specific data to also uniquely identify any given object, e.g. by
recording the packfile and offset, so that we can even yield the same
object in case one source contains the object multiple times.
Furthermore, with this change all information in `struct object_info` is
now following the same request-response-field style.
The series is built on top of 26d8d94e94 (A few more topics before -rc2,
2026-06-21) with ps/odb-source-packed at 1bba3c035d (odb/source-packed:
drop pointer to "files" parent source, 2026-06-17) merged into it.
Thanks!
Patrick
---
Patrick Steinhardt (6):
packfile: thread odb_source_packed through packed_object_info()
odb: make backend-specific fields optional
odb: add `source` field to struct object_info_source
treewide: convert users of `whence` to the new source field
odb: drop `whence` field from object info
odb: document object info fields
builtin/cat-file.c | 12 +++++---
builtin/index-pack.c | 9 ++++--
builtin/pack-objects.c | 19 ++++++++----
commit-graph.c | 2 +-
odb.c | 4 +--
odb.h | 80 +++++++++++++++++++++++++++++++++++---------------
odb/source-inmemory.c | 3 +-
odb/source-loose.c | 4 +--
odb/source-packed.c | 4 +--
pack-bitmap.c | 2 +-
packfile.c | 45 ++++++++++++++++------------
packfile.h | 6 ++--
reachable.c | 7 +++--
t/helper/test-bitmap.c | 2 +-
14 files changed, 130 insertions(+), 69 deletions(-)
---
base-commit: 969dbd51a70f9105ee9965adec5c5a02e75ab5b3
change-id: 20260612-b4-pks-odb-drop-whence-1b0af9ab16f4
^ permalink raw reply
* [PATCH v2 7/7] commit-reach: terminate merge-base walk when one paint side is exhausted
From: Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson,
Kristofer Karlsson
In-Reply-To: <pull.2149.v2.git.1782303254.gitgitgadget@gmail.com>
From: Kristofer Karlsson <krka@spotify.com>
Add an early termination check to paint_down_to_common() using the
per-side counters introduced earlier. Once the walk enters the
finite-generation region, terminate early when one side's exclusive
count drops to zero -- no new merge-base can form without both paint
sides meeting.
The check also waits for pending_merge_bases to reach zero, ensuring
all merge-base candidates have been dequeued and recorded before
exiting.
The INFINITY gate ensures correctness: commits without a commit-graph
entry have GENERATION_NUMBER_INFINITY and are ordered by commit date,
which is not topologically reliable. The optimization only fires
once the walk enters the finite-generation region where ordering
guarantees hold.
Step counts measured with trace2 on git.git with commit-graph:
merge-base --all v2.0.0 v2.55.0-rc1:
before: 72264 steps after: 44589 steps
merge-base --all v2.55.0-rc1 v2.55.0-rc1~5:
before: 110 steps after: 7 steps
Helped-by: Derrick Stolee <stolee@gmail.com>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
.../technical/paint-down-to-common.adoc | 17 ++++++++++++
commit-reach.c | 27 ++++++++++++++-----
t/t6600-test-reach.sh | 4 +--
3 files changed, 39 insertions(+), 9 deletions(-)
diff --git a/Documentation/technical/paint-down-to-common.adoc b/Documentation/technical/paint-down-to-common.adoc
index 0f4e1892a5..983dfcf233 100644
--- a/Documentation/technical/paint-down-to-common.adoc
+++ b/Documentation/technical/paint-down-to-common.adoc
@@ -94,6 +94,9 @@ ends when one of the following conditions holds:
1. The queue is empty.
2. The queue contains only stale entries.
+ 3. Side exhaustion: no pure PARENT1 or pure PARENT2 commits
+ remain in the queue, no pending merge-base candidates exist,
+ and the walk has entered the finite-generation region.
Stale entry condition
~~~~~~~~~~~~~~~~~~~~~
@@ -104,6 +107,20 @@ existing candidates by proving one is an ancestor of another, but
`remove_redundant()` handles that as a post-processing step, so it
is safe to exit early.
+Side-exhaustion condition
+~~~~~~~~~~~~~~~~~~~~~~~~~
+A new merge-base requires commits from both sides to meet. When one
+side's exclusive counter reaches zero and there are no pending
+merge-base candidates, no future traversal step can produce a new
+candidate.
+
+This optimization only activates in the finite-generation region
+where topological ordering holds. In that region, children are
+always visited before parents, so paint flags are final at visit
+time and an exhausted side cannot reappear. In the INFINITY region,
+commit-date ordering can violate this guarantee, so the check is
+skipped.
+
Related documentation
---------------------
diff --git a/commit-reach.c b/commit-reach.c
index e0d9874f99..f79d0b64d6 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -133,17 +133,30 @@ static void paint_queue_put(struct paint_state *state,
static struct commit *paint_queue_get(struct paint_state *state)
{
- struct commit *commit;
+ struct commit *commit = prio_queue_get(&state->queue);
- if (!state->p1_count && !state->p2_count &&
- !state->pending_merge_bases)
+ if (!commit)
return NULL;
- commit = prio_queue_get(&state->queue);
- if (commit) {
- commit->object.flags &= ~ENQUEUED;
- paint_count_update(state, commit->object.flags, -1);
+ commit->object.flags &= ~ENQUEUED;
+
+ if (!state->pending_merge_bases) {
+ if (!state->p1_count && !state->p2_count)
+ return NULL;
+ /*
+ * Side exhaustion: a new merge-base can only form
+ * when both PARENT1-only and PARENT2-only commits
+ * remain in the queue. In the finite-generation
+ * region the queue is ordered topologically, so
+ * no future step can add paint to visited commits
+ * and an exhausted side cannot reappear.
+ */
+ if ((!state->p1_count || !state->p2_count) &&
+ commit_graph_generation(commit) < GENERATION_NUMBER_INFINITY)
+ return NULL;
}
+
+ paint_count_update(state, commit->object.flags, -1);
return commit;
}
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index c1109fb42f..03175befb3 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -332,12 +332,12 @@ test_expect_success 'merge-base --all commit-walk steps' '
cp commit-graph-full .git/objects/info/commit-graph &&
GIT_TRACE2_EVENT="$(pwd)/trace-full.txt" \
git merge-base --all commit-9-9 commit-9-1 >actual &&
- test_trace2_data paint_down_to_common steps 80 <trace-full.txt &&
+ test_trace2_data paint_down_to_common steps 9 <trace-full.txt &&
cp commit-graph-half .git/objects/info/commit-graph &&
GIT_TRACE2_EVENT="$(pwd)/trace-half.txt" \
git merge-base --all commit-9-9 commit-9-1 >actual &&
- test_trace2_data paint_down_to_common steps 81 <trace-half.txt
+ test_trace2_data paint_down_to_common steps 57 <trace-half.txt
'
test_expect_success 'reduce_heads' '
--
gitgitgadget
^ permalink raw reply related
* [PATCH v2 6/7] commit-reach: remove unused nonstale_queue dedup wrappers
From: Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson,
Kristofer Karlsson
In-Reply-To: <pull.2149.v2.git.1782303254.gitgitgadget@gmail.com>
From: Kristofer Karlsson <krka@spotify.com>
nonstale_queue_put_dedup() and nonstale_queue_get_dedup() became
unused after the previous commit. The core nonstale_queue functions
remain in use by ahead_behind().
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
commit-reach.c | 18 ------------------
1 file changed, 18 deletions(-)
diff --git a/commit-reach.c b/commit-reach.c
index bf102f5e28..e0d9874f99 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -79,24 +79,6 @@ static void clear_nonstale_queue(struct nonstale_queue *queue)
queue->max_nonstale = NULL;
}
-static void nonstale_queue_put_dedup(struct nonstale_queue *queue,
- struct commit *c)
-{
- if (c->object.flags & ENQUEUED)
- return;
- c->object.flags |= ENQUEUED;
- nonstale_queue_put(queue, c);
-}
-
-static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue)
-{
- struct commit *commit = nonstale_queue_get(queue);
-
- if (commit)
- commit->object.flags &= ~ENQUEUED;
- return commit;
-}
-
/*
* Priority queue with per-side commit counters for paint_down_to_common().
* Each non-stale queued commit occupies exactly one bucket: PARENT1-only,
--
gitgitgadget
^ permalink raw reply related
* [PATCH v2 5/7] commit-reach: introduce struct paint_state with per-side counters
From: Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson,
Kristofer Karlsson
In-Reply-To: <pull.2149.v2.git.1782303254.gitgitgadget@gmail.com>
From: Kristofer Karlsson <krka@spotify.com>
Add a paint_state struct for use by paint_down_to_common() that
wraps a prio_queue with per-side commit counters. Each non-stale
queued commit occupies exactly one counter bucket based on its
paint flags: PARENT1-only, PARENT2-only, or both sides (a pending
merge-base candidate).
The counters are maintained by paint_count_update() which adjusts
the appropriate bucket by a signed delta. An exhaustive switch on
the paint+stale bits documents all valid flag combinations in one
place.
Convert paint_down_to_common() to use paint_state. The loop now
drains the queue via paint_queue_get() which returns NULL when all
counters reach zero, replacing the old pointer-based termination
(max_nonstale). This is equivalent behavior -- both conditions
detect that no non-stale entries remain.
The existing nonstale_queue is left in place for ahead_behind().
Step counts (via trace2 from the previous commit) are identical
before and after this refactoring, confirming no behavioral change.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
.../technical/paint-down-to-common.adoc | 9 +-
commit-reach.c | 93 ++++++++++++++++---
2 files changed, 82 insertions(+), 20 deletions(-)
diff --git a/Documentation/technical/paint-down-to-common.adoc b/Documentation/technical/paint-down-to-common.adoc
index c10d5d2887..0f4e1892a5 100644
--- a/Documentation/technical/paint-down-to-common.adoc
+++ b/Documentation/technical/paint-down-to-common.adoc
@@ -88,15 +88,12 @@ re-enqueued is bounded by the number of flag transitions.
Termination
-----------
-The walk uses a `nonstale_queue` wrapper around `prio_queue` that
-tracks `max_nonstale`: the lowest-priority non-stale commit enqueued
-so far. Once that commit is dequeued, every remaining entry is known
-to be STALE and the loop terminates. Specifically, the main loop
+The walk tracks the number of commits of each type in the queue
+(PARENT1-only, PARENT2-only, pending merge-base). The main loop
ends when one of the following conditions holds:
1. The queue is empty.
- 2. `max_nonstale` has been dequeued, meaning the queue only contains
- STALE entries.
+ 2. The queue contains only stale entries.
Stale entry condition
~~~~~~~~~~~~~~~~~~~~~
diff --git a/commit-reach.c b/commit-reach.c
index f6a438550b..bf102f5e28 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -97,6 +97,74 @@ static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue)
return commit;
}
+/*
+ * Priority queue with per-side commit counters for paint_down_to_common().
+ * Each non-stale queued commit occupies exactly one bucket: PARENT1-only,
+ * PARENT2-only, or both (a pending merge-base candidate).
+ */
+struct paint_state {
+ struct prio_queue queue;
+ int p1_count;
+ int p2_count;
+ int pending_merge_bases;
+};
+
+static void paint_count_update(struct paint_state *state,
+ unsigned flags, int delta)
+{
+ switch (flags & (PARENT1 | PARENT2 | STALE)) {
+ case PARENT1:
+ state->p1_count += delta;
+ break;
+
+ case PARENT2:
+ state->p2_count += delta;
+ break;
+
+ case PARENT1 | PARENT2:
+ state->pending_merge_bases += delta;
+ break;
+
+ case PARENT1 | PARENT2 | STALE:
+ break;
+
+ default:
+ BUG("unexpected paint state");
+ }
+}
+
+static void paint_queue_put(struct paint_state *state,
+ struct commit *c, unsigned add_flags)
+{
+ unsigned old_flags = c->object.flags;
+ c->object.flags |= add_flags;
+
+ if (old_flags & ENQUEUED) {
+ paint_count_update(state, old_flags, -1);
+ paint_count_update(state, c->object.flags, 1);
+ } else {
+ c->object.flags |= ENQUEUED;
+ prio_queue_put(&state->queue, c);
+ paint_count_update(state, c->object.flags, 1);
+ }
+}
+
+static struct commit *paint_queue_get(struct paint_state *state)
+{
+ struct commit *commit;
+
+ if (!state->p1_count && !state->p2_count &&
+ !state->pending_merge_bases)
+ return NULL;
+
+ commit = prio_queue_get(&state->queue);
+ if (commit) {
+ commit->object.flags &= ~ENQUEUED;
+ paint_count_update(state, commit->object.flags, -1);
+ }
+ return commit;
+}
+
/*
* See Documentation/technical/paint-down-to-common.adoc
*
@@ -109,31 +177,29 @@ static int paint_down_to_common(struct repository *r,
enum merge_base_flags mb_flags,
struct commit_list **result)
{
- struct nonstale_queue queue = {
- { compare_commits_by_gen_then_commit_date }
+ struct paint_state state = {
+ .queue = { compare_commits_by_gen_then_commit_date }
};
+ struct commit *commit;
int i;
int steps = 0;
timestamp_t last_gen = GENERATION_NUMBER_INFINITY;
struct commit_list **tail = result;
if (!min_generation && !corrected_commit_dates_enabled(r))
- queue.pq.compare = compare_commits_by_commit_date;
+ state.queue.compare = compare_commits_by_commit_date;
one->object.flags |= PARENT1;
if (!n) {
commit_list_append(one, result);
return 0;
}
- nonstale_queue_put_dedup(&queue, one);
+ paint_queue_put(&state, one, 0);
- for (i = 0; i < n; i++) {
- twos[i]->object.flags |= PARENT2;
- nonstale_queue_put_dedup(&queue, twos[i]);
- }
+ for (i = 0; i < n; i++)
+ paint_queue_put(&state, twos[i], PARENT2);
- while (queue.max_nonstale) {
- struct commit *commit = nonstale_queue_get_dedup(&queue);
+ while ((commit = paint_queue_get(&state))) {
struct commit_list *parents;
int flags;
timestamp_t generation = commit_graph_generation(commit);
@@ -172,7 +238,7 @@ static int paint_down_to_common(struct repository *r,
if ((p->object.flags & flags) == flags)
continue;
if (repo_parse_commit(r, p)) {
- clear_nonstale_queue(&queue);
+ clear_prio_queue(&state.queue);
commit_list_free(*result);
*result = NULL;
/*
@@ -187,12 +253,11 @@ static int paint_down_to_common(struct repository *r,
return error(_("could not parse commit %s"),
oid_to_hex(&p->object.oid));
}
- p->object.flags |= flags;
- nonstale_queue_put_dedup(&queue, p);
+ paint_queue_put(&state, p, flags);
}
}
- clear_nonstale_queue(&queue);
+ clear_prio_queue(&state.queue);
trace2_data_intmax("paint_down_to_common", r,
"steps", steps);
commit_list_sort_by_date(result);
--
gitgitgadget
^ permalink raw reply related
* [PATCH v2 4/7] commit-reach: add trace2 instrumentation to paint_down_to_common()
From: Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson,
Kristofer Karlsson
In-Reply-To: <pull.2149.v2.git.1782303254.gitgitgadget@gmail.com>
From: Kristofer Karlsson <krka@spotify.com>
Add a step counter and trace2_data_intmax() call so that the number
of commits visited during the paint walk is observable via
GIT_TRACE2_PERF. This provides a way to measure the impact of
future optimizations without relying on wall-clock benchmarks alone.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
commit-reach.c | 5 +++++
t/t6600-test-reach.sh | 21 +++++++++++++++++++++
2 files changed, 26 insertions(+)
diff --git a/commit-reach.c b/commit-reach.c
index a9483759e0..f6a438550b 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -11,6 +11,7 @@
#include "tag.h"
#include "commit-reach.h"
#include "ewah/ewok.h"
+#include "trace2.h"
/* Remember to update object flag allocation in object.h */
#define PARENT1 (1u<<16)
@@ -112,6 +113,7 @@ static int paint_down_to_common(struct repository *r,
{ compare_commits_by_gen_then_commit_date }
};
int i;
+ int steps = 0;
timestamp_t last_gen = GENERATION_NUMBER_INFINITY;
struct commit_list **tail = result;
@@ -135,6 +137,7 @@ static int paint_down_to_common(struct repository *r,
struct commit_list *parents;
int flags;
timestamp_t generation = commit_graph_generation(commit);
+ steps++;
if (min_generation && generation > last_gen)
BUG("bad generation skip %"PRItime" > %"PRItime" at %s",
@@ -190,6 +193,8 @@ static int paint_down_to_common(struct repository *r,
}
clear_nonstale_queue(&queue);
+ trace2_data_intmax("paint_down_to_common", r,
+ "steps", steps);
commit_list_sort_by_date(result);
return 0;
}
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index 4b771b4c58..c1109fb42f 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -319,6 +319,27 @@ test_expect_success 'get_merge_bases_many:mixed-finite-infinity' '
test_all_modes get_merge_bases_many
'
+test_expect_success 'merge-base --all commit-walk steps' '
+ test_when_finished rm -rf .git/objects/info/commit-graph \
+ .git/objects/info/commit-graphs &&
+ rm -rf .git/objects/info/commit-graph \
+ .git/objects/info/commit-graphs &&
+
+ GIT_TRACE2_EVENT="$(pwd)/trace-none.txt" \
+ git merge-base --all commit-9-9 commit-9-1 >actual &&
+ test_trace2_data paint_down_to_common steps 81 <trace-none.txt &&
+
+ cp commit-graph-full .git/objects/info/commit-graph &&
+ GIT_TRACE2_EVENT="$(pwd)/trace-full.txt" \
+ git merge-base --all commit-9-9 commit-9-1 >actual &&
+ test_trace2_data paint_down_to_common steps 80 <trace-full.txt &&
+
+ cp commit-graph-half .git/objects/info/commit-graph &&
+ GIT_TRACE2_EVENT="$(pwd)/trace-half.txt" \
+ git merge-base --all commit-9-9 commit-9-1 >actual &&
+ test_trace2_data paint_down_to_common steps 81 <trace-half.txt
+'
+
test_expect_success 'reduce_heads' '
cat >input <<-\EOF &&
X:commit-1-10
--
gitgitgadget
^ permalink raw reply related
* [PATCH v2 3/7] t6099, t6600: add side-exhaustion regression tests
From: Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson,
Kristofer Karlsson
In-Reply-To: <pull.2149.v2.git.1782303254.gitgitgadget@gmail.com>
From: Kristofer Karlsson <krka@spotify.com>
Add t6099 to test the case where multiple merge-base candidates exist
and one is an ancestor of another. This exercises the side-exhaustion
optimization in paint_down_to_common together with the
remove_redundant safety net in get_merge_bases_many_0.
Add a mixed finite/INFINITY test to t6600 where one tip is outside
the commit-graph (INFINITY generation) and the other is inside.
This exercises the region transition: the walk starts in the
INFINITY region where side-exhaustion is disabled, then crosses
into the finite region where it can fire.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
t/meson.build | 1 +
t/t6099-merge-base-side-exhaustion.sh | 82 +++++++++++++++++++++++++++
t/t6600-test-reach.sh | 25 ++++++++
3 files changed, 108 insertions(+)
create mode 100755 t/t6099-merge-base-side-exhaustion.sh
diff --git a/t/meson.build b/t/meson.build
index 3219264fe7..ee6ebdffb9 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -786,6 +786,7 @@ integration_tests = [
't6041-bisect-submodule.sh',
't6050-replace.sh',
't6060-merge-index.sh',
+ 't6099-merge-base-side-exhaustion.sh',
't6100-rev-list-in-order.sh',
't6101-rev-parse-parents.sh',
't6102-rev-list-unexpected-objects.sh',
diff --git a/t/t6099-merge-base-side-exhaustion.sh b/t/t6099-merge-base-side-exhaustion.sh
new file mode 100755
index 0000000000..4f1e0d50ef
--- /dev/null
+++ b/t/t6099-merge-base-side-exhaustion.sh
@@ -0,0 +1,82 @@
+#!/bin/sh
+
+test_description='merge-base with ancestor among merge-base candidates
+
+Test that merge-base --all correctly handles cases where
+multiple merge-base candidates exist and one is an ancestor
+of another. The side-exhaustion optimization in
+paint_down_to_common may exit before STALE propagation
+removes the ancestor, but remove_redundant catches it.
+
+Graph shape (parents are below children):
+
+ A ----------- X
+ |\ /|
+ | B---------/ |
+ | | |
+ e2 \ f2
+ | | |
+ e1 d1 f1
+ \ | /
+ \ | /
+ \| /
+ C
+
+A and X are the two tips.
+B and C are both reachable from A and X.
+B reaches C through d1.
+Only B should appear in merge-base --all output.
+'
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success 'setup ancestor merge-base candidate' '
+ test_commit C &&
+
+ git checkout -b d-chain HEAD &&
+ test_commit d1 &&
+ test_commit B &&
+
+ git checkout -b e-path C &&
+ test_commit e1 &&
+ test_commit e2 &&
+
+ git checkout -b f-path C &&
+ test_commit f1 &&
+ test_commit f2 &&
+
+ git checkout -b branch-A e-path &&
+ test_merge A B &&
+
+ git checkout -b branch-X f-path &&
+ test_merge X B &&
+
+ git commit-graph write --reachable
+'
+
+test_expect_success 'merge-base --all excludes ancestor candidate' '
+ git rev-parse B >expected &&
+ git merge-base --all A X >actual &&
+ test_cmp expected actual
+'
+
+test_expect_success 'merge-base (single) finds shallowest' '
+ git rev-parse B >expected &&
+ git merge-base A X >actual &&
+ test_cmp expected actual
+'
+
+# Without commit-graph: generation numbers are INFINITY,
+# side-exhaustion optimization does not fire.
+test_expect_success 'merge-base --all without commit-graph' '
+ rm -f .git/objects/info/commit-graph &&
+ git rev-parse B >expected &&
+ git merge-base --all A X >actual &&
+ test_cmp expected actual
+'
+
+test_done
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index c2e091aad1..4b771b4c58 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -294,6 +294,31 @@ test_expect_success 'get_merge_bases_many:infinity-both-sides' '
test_all_modes get_merge_bases_many
'
+test_expect_success 'setup mixed finite/INFINITY topology' '
+ # Create a commit outside all saved commit-graph files so it always
+ # has INFINITY generation, while its parent (ps-X) is in the graph
+ # with a finite generation. Use the ps-* orphan topology so we do
+ # not pollute the grid-based rev-list tests.
+ git checkout ps-X &&
+ test_env GIT_TEST_COMMIT_GRAPH= test_commit pm-INF
+'
+
+test_expect_success 'get_merge_bases_many:mixed-finite-infinity' '
+ # One tip (pm-INF) is outside the commit-graph with INFINITY
+ # generation; the other (ps-B) is in the graph with finite
+ # generation. The walk starts in the INFINITY region and crosses
+ # into the finite region where side-exhaustion can fire.
+ cat >input <<-\EOF &&
+ A:pm-INF
+ X:ps-B
+ EOF
+ {
+ echo "get_merge_bases_many(A,X):" &&
+ git rev-parse ps-X
+ } >expect &&
+ test_all_modes get_merge_bases_many
+'
+
test_expect_success 'reduce_heads' '
cat >input <<-\EOF &&
X:commit-1-10
--
gitgitgadget
^ permalink raw reply related
* [PATCH v2 2/7] t6600: add test cases for side-exhaustion edge cases
From: Elijah Newren via GitGitGadget @ 2026-06-24 12:14 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson, Elijah Newren
In-Reply-To: <pull.2149.v2.git.1782303254.gitgitgadget@gmail.com>
From: Elijah Newren <newren@gmail.com>
Add test cases to t6600-test-reach.sh that exercise edge cases in the
side-exhaustion optimization for paint_down_to_common():
- in_merge_bases_many:self: commit is both A and one of the X inputs
- get_merge_bases_many:duplicate-twos: duplicate entries in X list
- get_merge_bases_many:pending-stale: STALE transition on an
already-painted commit (ps-* diamond topology)
- get_merge_bases_many:infinity-both-sides: both tips outside the
commit-graph with non-monotonic dates (pi-* topology)
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
t/t6600-test-reach.sh | 111 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 111 insertions(+)
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index b5b314e570..c2e091aad1 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -49,6 +49,62 @@ test_expect_success 'setup' '
git tag -a -m "$x-$i" tag-$x-$i commit-$x-$i || return 1
done
done &&
+
+ # Build a small side topology to exercise the (PARENT1|PARENT2) ->
+ # (PARENT1|PARENT2|STALE) transition in paint_down_to_common(); the
+ # 10x10 grid above does not exercise it because no merge-base candidate
+ # there is a descendant of another, so STALE never reaches a
+ # still-pending candidate.
+ #
+ # ps-X
+ # /|\
+ # / | \
+ # ps-Z ps-B ps-W
+ # | / \ |
+ # | / \ |
+ # |/ \|
+ # ps-T1 ps-T2
+ #
+ # where ps-T1=merge(ps-Z,ps-B), ps-T2=merge(ps-W,ps-B), so
+ # merge-base(ps-T1,ps-T2) = ps-B. During the walk, ps-X transitions
+ # to (PARENT1|PARENT2) via ps-Z and ps-W before ps-B is dequeued;
+ # then the STALE-walk from ps-B transitions ps-X to
+ # (PARENT1|PARENT2|STALE).
+ git checkout --orphan ps-orphan &&
+ test_commit ps-X &&
+ git checkout -b ps-B-br ps-X && test_commit ps-B &&
+ git checkout -b ps-Z-br ps-X && test_commit ps-Z &&
+ git checkout -b ps-W-br ps-X && test_commit ps-W &&
+ git checkout -b ps-T1 ps-Z &&
+ git merge --no-ff -m ps-T1 ps-B &&
+ git checkout -b ps-T2 ps-W &&
+ git merge --no-ff -m ps-T2 ps-B &&
+
+ # Build a side topology that lives entirely outside the half
+ # commit-graph and has non-monotonic commit dates, to exercise the
+ # INFINITY-gate in paint_down_to_common. With both tips outside
+ # the graph, generation is INFINITY and the queue falls back to
+ # commit-date order, which here is non-monotonic.
+ #
+ # pi-X (date 500, PARENT1 tip) --> pi-P, pi-D
+ # pi-D (date 480) --> pi-C
+ # pi-C (date 200) --> pi-B
+ # pi-B (date 100, PARENT2 tip) --> pi-P
+ # pi-P (date 450, root)
+ #
+ # merge-base(pi-X, pi-B) = pi-B (it is an ancestor of pi-X and is
+ # itself one of the queried tips).
+ git checkout --orphan pi-orphan &&
+ test_commit --date "@450 +0000" pi-P &&
+ test_commit --date "@100 +0000" pi-B &&
+ test_commit --date "@200 +0000" pi-C &&
+ test_commit --date "@480 +0000" pi-D &&
+ GIT_AUTHOR_DATE="@500 +0000" GIT_COMMITTER_DATE="@500 +0000" \
+ git commit-tree -p pi-D -p pi-P -m pi-X pi-D^{tree} >pi-X-oid &&
+ pi_x="$(cat pi-X-oid)" &&
+ git branch -f pi-X-br "$pi_x" &&
+ git tag pi-X "$pi_x" &&
+
git commit-graph write --reachable &&
mv .git/objects/info/commit-graph commit-graph-full &&
chmod u+w commit-graph-full &&
@@ -146,6 +202,16 @@ test_expect_success 'in_merge_bases_many:miss-heuristic' '
test_all_modes in_merge_bases_many
'
+test_expect_success 'in_merge_bases_many:self' '
+ cat >input <<-\EOF &&
+ A:commit-6-8
+ X:commit-5-9
+ X:commit-6-8
+ EOF
+ echo "in_merge_bases_many(A,X):1" >expect &&
+ test_all_modes in_merge_bases_many
+'
+
test_expect_success 'is_descendant_of:hit' '
cat >input <<-\EOF &&
A:commit-5-7
@@ -183,6 +249,51 @@ test_expect_success 'get_merge_bases_many' '
test_all_modes get_merge_bases_many
'
+test_expect_success 'get_merge_bases_many:duplicate-twos' '
+ cat >input <<-\EOF &&
+ A:commit-5-7
+ X:commit-4-8
+ X:commit-4-8
+ X:commit-6-6
+ X:commit-6-6
+ X:commit-8-3
+ EOF
+ {
+ echo "get_merge_bases_many(A,X):" &&
+ git rev-parse commit-5-6 \
+ commit-4-7 | sort
+ } >expect &&
+ test_all_modes get_merge_bases_many
+'
+
+test_expect_success 'get_merge_bases_many:pending-stale' '
+ # Exercises the (PARENT1|PARENT2) -> (...|STALE) transition path in
+ # paint_down_to_common(). See the topology comment in the setup test.
+ cat >input <<-\EOF &&
+ A:ps-T1
+ X:ps-T2
+ EOF
+ {
+ echo "get_merge_bases_many(A,X):" &&
+ git rev-parse ps-B
+ } >expect &&
+ test_all_modes get_merge_bases_many
+'
+
+test_expect_success 'get_merge_bases_many:infinity-both-sides' '
+ # Exercises the push-time INFINITY-gate in paint_down_to_common(). See
+ # the pi-* topology comment in the setup test.
+ cat >input <<-\EOF &&
+ A:pi-X
+ X:pi-B
+ EOF
+ {
+ echo "get_merge_bases_many(A,X):" &&
+ git rev-parse pi-B
+ } >expect &&
+ test_all_modes get_merge_bases_many
+'
+
test_expect_success 'reduce_heads' '
cat >input <<-\EOF &&
X:commit-1-10
--
gitgitgadget
^ permalink raw reply related
* [PATCH v2 1/7] Documentation/technical: add paint-down-to-common doc
From: Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson,
Kristofer Karlsson
In-Reply-To: <pull.2149.v2.git.1782303254.gitgitgadget@gmail.com>
From: Kristofer Karlsson <krka@spotify.com>
Add a technical document describing the paint_down_to_common()
algorithm used for merge-base computation, covering the paint
walk, generation number regions, and termination conditions.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
---
Documentation/Makefile | 1 +
Documentation/technical/meson.build | 1 +
.../technical/paint-down-to-common.adoc | 114 ++++++++++++++++++
commit-reach.c | 6 +-
4 files changed, 121 insertions(+), 1 deletion(-)
create mode 100644 Documentation/technical/paint-down-to-common.adoc
diff --git a/Documentation/Makefile b/Documentation/Makefile
index 2699f0b24a..f8dea4b395 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -129,6 +129,7 @@ TECH_DOCS += technical/long-running-process-protocol
TECH_DOCS += technical/multi-pack-index
TECH_DOCS += technical/packfile-uri
TECH_DOCS += technical/pack-heuristics
+TECH_DOCS += technical/paint-down-to-common
TECH_DOCS += technical/parallel-checkout
TECH_DOCS += technical/partial-clone
TECH_DOCS += technical/platform-support
diff --git a/Documentation/technical/meson.build b/Documentation/technical/meson.build
index ec07088c57..9ce11d5e48 100644
--- a/Documentation/technical/meson.build
+++ b/Documentation/technical/meson.build
@@ -18,6 +18,7 @@ articles = [
'multi-pack-index.adoc',
'packfile-uri.adoc',
'pack-heuristics.adoc',
+ 'paint-down-to-common.adoc',
'parallel-checkout.adoc',
'partial-clone.adoc',
'platform-support.adoc',
diff --git a/Documentation/technical/paint-down-to-common.adoc b/Documentation/technical/paint-down-to-common.adoc
new file mode 100644
index 0000000000..c10d5d2887
--- /dev/null
+++ b/Documentation/technical/paint-down-to-common.adoc
@@ -0,0 +1,114 @@
+Merge-Base Computation and paint_down_to_common()
+==================================================
+
+The function `paint_down_to_common()` in `commit-reach.c` computes merge
+bases by walking the commit graph backwards from two sets of tips and
+finding where their ancestry meets.
+
+Use cases
+---------
+
+Computing merge bases is used in two different ways:
+
+ 1. *Finding all merge bases* (`merge-base --all`, `merge-tree`,
+ `merge`, `rebase`). A merge base is a common ancestor that is
+ not itself an ancestor of another common ancestor.
+
+ 2. *Ancestry checks* (`in_merge_bases`, used by `merge-base
+ --is-ancestor`, `branch -d`, `fetch`). These ask: "is commit A
+ an ancestor of commit B?" If a common ancestor equals one of the
+ inputs, that input is necessarily the only merge base -- no other
+ common ancestor can be both as recent and not an ancestor of it.
+
+Both use cases share the same algorithm and implementation.
+
+Algorithm
+---------
+
+Given a commit `one` and a set of commits `twos[]`, the walk paints
+commits with two colors:
+
+ - PARENT1: reachable from `one`
+ - PARENT2: reachable from any commit in `twos[]`
+
+The walk uses a priority queue ordered by generation number (falling
+back to commit date when generation numbers are unavailable). Each
+step dequeues the highest-priority commit (this is when we say a
+commit is "visited") and propagates its paint flags to its parents,
+enqueuing them if they gained new flags. When a commit receives
+both PARENT1 and PARENT2, it is a merge-base candidate. A candidate
+gains the STALE flag so its ancestors propagate staleness -- any
+deeper common ancestor is necessarily redundant.
+
+INFINITY and finite generation regions
+--------------------------------------
+
+The commit-graph stores a generation number for each commit. Commits
+not in the commit-graph have generation `GENERATION_NUMBER_INFINITY`. The
+graph is closed under reachability: if a commit is in the graph, all
+its ancestors are too. This partitions the commit graph into two regions:
+
+....
+ +---------------------------------------+
+ | INFINITY region |
+ | generation = INFINITY |
+ | queue order: heuristic (commit date) |
+ +---------------------------------------+
+ |
+ v
+ +---------------------------------------+
+ | Finite region |
+ | generation = finite |
+ | queue order: topological |
+ +---------------------------------------+
+....
+
+When the commit-graph is enabled, the INFINITY region is typically
+very small -- it only contains commits added since the last
+commit-graph refresh.
+
+All reachable INFINITY-generation commits are visited before any
+finite-generation commit, because INFINITY is larger than any finite
+value. Once the walk crosses into the finite region, it stays there.
+
+In the finite region, generation ordering guarantees topological
+traversal: children are always visited before their parents. This
+means that paint on already-visited commits is final -- no future
+traversal step can add paint to them.
+
+In the INFINITY region, commit-date ordering can violate this: a
+parent with a later date can be visited before a child with an earlier
+date. Paint flags are therefore NOT final at visit time, and a
+commit visited with only one side's paint may later gain the other.
+
+Paint flags are only added, never removed. Since each flag can be set
+at most once per commit, the number of times a commit can be
+re-enqueued is bounded by the number of flag transitions.
+
+Termination
+-----------
+
+The walk uses a `nonstale_queue` wrapper around `prio_queue` that
+tracks `max_nonstale`: the lowest-priority non-stale commit enqueued
+so far. Once that commit is dequeued, every remaining entry is known
+to be STALE and the loop terminates. Specifically, the main loop
+ends when one of the following conditions holds:
+
+ 1. The queue is empty.
+ 2. `max_nonstale` has been dequeued, meaning the queue only contains
+ STALE entries.
+
+Stale entry condition
+~~~~~~~~~~~~~~~~~~~~~
+Once all queued entries are stale, no new merge-base candidates can
+be discovered -- that requires at least one non-stale commit from
+each side meeting. Continuing the walk could still invalidate
+existing candidates by proving one is an ancestor of another, but
+`remove_redundant()` handles that as a post-processing step, so it
+is safe to exit early.
+
+Related documentation
+---------------------
+
+ - `Documentation/technical/commit-graph.adoc` -- generation numbers
+ and the reachability closure property.
diff --git a/commit-reach.c b/commit-reach.c
index 5df471a313..a9483759e0 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -96,7 +96,11 @@ static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue)
return commit;
}
-/* all input commits in one and twos[] must have been parsed! */
+/*
+ * See Documentation/technical/paint-down-to-common.adoc
+ *
+ * All input commits in one and twos[] must have been parsed!
+ */
static int paint_down_to_common(struct repository *r,
struct commit *one, int n,
struct commit **twos,
--
gitgitgadget
^ permalink raw reply related
* [PATCH v2 0/7] commit-reach: terminate merge-base walk when one side is exhausted
From: Kristofer Karlsson via GitGitGadget @ 2026-06-24 12:14 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Elijah Newren, Kristofer Karlsson
In-Reply-To: <pull.2149.git.1781951820.gitgitgadget@gmail.com>
commit-reach: terminate merge-base walk when one side is exhausted
Optimize paint_down_to_common() for merge-base queries that hit large
one-sided histories.
When the walk from one side reaches a commit with a very low generation
number that the other side never paints, the walk is forced to drain most of
the graph. A common trigger is a repository import that grafts a separate
history with its own root, but any merge that introduces a low-generation
commit never painted by the other side has the same effect.
A new merge-base candidate can only be discovered when exclusive PARENT1 and
PARENT2 paint meet. This series teaches paint_down_to_common() to stop as
soon as one side has no exclusive commits left in the queue; once one side
is exhausted, no further candidates can appear.
origin/HEAD o o PR HEAD
| |
(import) o :
/ \ /
| o merge-base
| |
: : (~2.5M commits)
| |
import root main root
In the RFC thread [1], Derrick Stolee provided a criss-cross counterexample
that sharpened the halt condition, and Elijah Newren independently
discovered the same optimization and shared an implementation in PR #2150
[2]. Patches 2-3 incorporate test cases from Elijah's branch.
This series implements the optimization only after the walk enters the
finite-generation region, where generation ordering guarantees that paint on
visited commits is final.
Patch layout:
1/7 Documentation/technical: add paint-down-to-common doc 2/7 t6600: add
test cases for side-exhaustion edge cases 3/7 t6099, t6600: add
side-exhaustion regression tests 4/7 commit-reach: add trace2
instrumentation to paint_down_to_common() 5/7 commit-reach: introduce struct
paint_state with per-side counters 6/7 commit-reach: remove unused
nonstale_queue dedup wrappers 7/7 commit-reach: terminate merge-base walk
when one paint side is exhausted
Benchmarks
Step counts are deterministic (measured via trace2_data_intmax added in
patch 4). Wall-clock times are medians over 10-20 runs with CPU governor set
to performance.
2.6M-commit monorepo with commit-graph (baseline v2.55.0-rc1):
steps wall-clock
merge-base --all (across import) 2682391 -> 53521 7.26s -> 88ms
merge-base --all (1000 apart) 2659607 -> 1106 6.98s -> 8ms
merge-tree (across import) - 8.11s -> 100ms
git.git (88k commits, commit-graph):
steps wall-clock
merge-base --all v2.0.0 v2.55.0-rc1 72264 -> 44589 82ms -> 49ms
merge-base --all HEAD HEAD~1000 9873 -> 3817 19ms -> 9ms
merge-base --all HEAD HEAD~10000 72285 -> 41523 80ms -> 48ms
merge-base HEAD HEAD~1000 - 9ms -> 9ms
merge-base --is-ancestor HEAD~1000 HEAD - 6ms -> 6ms
Changes since v1:
* Reordered patches: documentation first (describing the existing
algorithm), tests before code changes, so they demonstrate passing with
old logic first.
* Dropped the ahead_behind decoupling patch. paint_state is now a NEW
struct alongside nonstale_queue instead of replacing it. ahead_behind()
is completely untouched.
* Removed nonstale_queue_put_dedup() and nonstale_queue_get_dedup() (dead
code after the conversion) in a separate commit.
* Renamed: struct paint_queue -> paint_state, field pq -> queue,
paint_count_add/remove -> paint_count_update (single function with signed
delta parameter).
* Split the old paint_count_transition (which handled both old and new
flags in one call) into separate remove/add calls with a signed delta.
This eliminates the need for the case 0 handler (which tracked "not in
the queue") and allows an exhaustive switch on (PARENT1 | PARENT2 |
STALE) that documents all valid flag combinations, with BUG() in default.
* Moved all termination conditions into paint_queue_get(). The all-zero
check and the side-exhaustion check are merged under a shared
!pending_merge_bases guard. paint_queue_get() derives the generation from
the dequeued commit itself, so no extra parameter is needed.
* Added trace2_data_intmax() instrumentation to report the number of
commits visited per paint walk (separate commit), with deterministic
step-count assertions in t6600.
* Expanded switch statements to multi-line format per .clang-format.
* Used !count style throughout instead of count == 0.
* Updated technical documentation alongside code changes.
* Added benchmark data (both git-bench wall-clock and trace2 step counts)
to commit messages.
[1]
https://lore.kernel.org/git/CAL71e4Ps-2_0+uuZu43N9pFnXBemoAohPs_eyRJf8taXHJPAXQ@mail.gmail.com/T/#u
[2] https://github.com/gitgitgadget/git/pull/2150
Elijah Newren (1):
t6600: add test cases for side-exhaustion edge cases
Kristofer Karlsson (6):
Documentation/technical: add paint-down-to-common doc
t6099, t6600: add side-exhaustion regression tests
commit-reach: add trace2 instrumentation to paint_down_to_common()
commit-reach: introduce struct paint_state with per-side counters
commit-reach: remove unused nonstale_queue dedup wrappers
commit-reach: terminate merge-base walk when one paint side is
exhausted
Documentation/Makefile | 1 +
Documentation/technical/meson.build | 1 +
.../technical/paint-down-to-common.adoc | 128 ++++++++++++++
commit-reach.c | 119 ++++++++++---
t/meson.build | 1 +
t/t6099-merge-base-side-exhaustion.sh | 82 +++++++++
t/t6600-test-reach.sh | 157 ++++++++++++++++++
7 files changed, 464 insertions(+), 25 deletions(-)
create mode 100644 Documentation/technical/paint-down-to-common.adoc
create mode 100755 t/t6099-merge-base-side-exhaustion.sh
base-commit: ab776a62a78576513ee121424adb19597fbb7613
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2149%2Fspkrka%2Fside-exhaust-pr-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2149/spkrka/side-exhaust-pr-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/2149
Range-diff vs v1:
1: 5492acda0a < -: ---------- commit-reach: decouple ahead_behind from nonstale_queue
6: 9cbfc67d72 ! 1: 19ed743bd1 Documentation/technical: add paint-down-to-common doc
@@ Commit message
Documentation/technical: add paint-down-to-common doc
Add a technical document describing the paint_down_to_common()
- algorithm used for merge-base computation.
+ algorithm used for merge-base computation, covering the paint
+ walk, generation number regions, and termination conditions.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
@@ Documentation/technical/paint-down-to-common.adoc (new)
+Termination
+-----------
+
-+Termination happens when we can prove that no extra progress is
-+possible. We are done with the main loop when one of the following
-+conditions holds:
++The walk uses a `nonstale_queue` wrapper around `prio_queue` that
++tracks `max_nonstale`: the lowest-priority non-stale commit enqueued
++so far. Once that commit is dequeued, every remaining entry is known
++to be STALE and the loop terminates. Specifically, the main loop
++ends when one of the following conditions holds:
+
+ 1. The queue is empty.
-+ 2. The queue only contains STALE entries.
-+ 3. Side-exhaustion: the walk has reached the finite region and one
-+ of the sides is fully exhausted.
-+
-+The loop waits for all pending merge-base candidates to be popped
-+and recorded before any early exit fires, so no separate drain phase
-+is needed after termination.
++ 2. `max_nonstale` has been dequeued, meaning the queue only contains
++ STALE entries.
+
+Stale entry condition
+~~~~~~~~~~~~~~~~~~~~~
-+If all entries are stale we cannot find any new merge bases since
-+that requires at least one enqueued side node meeting the other side.
-+However, we could still invalidate merge bases (if there are more
-+than one). This is unnecessary since `remove_redundant()` will clean
-+that up as a post-process step.
-+
-+Side-exhaustion
-+~~~~~~~~~~~~~~~
-+A commit is *exclusive* to one side if it carries that side's paint
-+but not the other (e.g. PARENT1 without PARENT2).
-+
-+If we have reached the finite region of the graph, no future
-+traversal step can add paint to an already-visited commit. Thus if
-+there are no exclusive PARENT2 commits in the queue, no additional
-+PARENT2 paint can be introduced into the walk. Even if exclusive
-+PARENT1 commits remain, no new merge-base candidates can be
-+discovered. The same holds symmetrically for PARENT1.
-+
-+This invariant is only valid in the finite region of the graph.
++Once all queued entries are stale, no new merge-base candidates can
++be discovered -- that requires at least one non-stale commit from
++each side meeting. Continuing the walk could still invalidate
++existing candidates by proving one is an ancestor of another, but
++`remove_redundant()` handles that as a post-processing step, so it
++is safe to exit early.
+
+Related documentation
+---------------------
+
+ - `Documentation/technical/commit-graph.adoc` -- generation numbers
+ and the reachability closure property.
+
+ ## commit-reach.c ##
+@@ commit-reach.c: static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue)
+ return commit;
+ }
+
+-/* all input commits in one and twos[] must have been parsed! */
++/*
++ * See Documentation/technical/paint-down-to-common.adoc
++ *
++ * All input commits in one and twos[] must have been parsed!
++ */
+ static int paint_down_to_common(struct repository *r,
+ struct commit *one, int n,
+ struct commit **twos,
4: 91372b975f ! 2: 6151b8e0a3 t6600: add test cases for side-exhaustion edge cases
@@ t/t6600-test-reach.sh: test_expect_success 'setup' '
+ # ps-T1 ps-T2
+ #
+ # where ps-T1=merge(ps-Z,ps-B), ps-T2=merge(ps-W,ps-B), so
-+ # merge-base(ps-T1,ps-T2) = ps-B. During the walk, ps-X transitions
++ # merge-base(ps-T1,ps-T2) = ps-B. During the walk, ps-X transitions
+ # to (PARENT1|PARENT2) via ps-Z and ps-W before ps-B is dequeued;
+ # then the STALE-walk from ps-B transitions ps-X to
+ # (PARENT1|PARENT2|STALE).
@@ t/t6600-test-reach.sh: test_expect_success 'setup' '
+
+ # Build a side topology that lives entirely outside the half
+ # commit-graph and has non-monotonic commit dates, to exercise the
-+ # INFINITY-gate in paint_down_to_common. With both tips outside
++ # INFINITY-gate in paint_down_to_common. With both tips outside
+ # the graph, generation is INFINITY and the queue falls back to
+ # commit-date order, which here is non-monotonic.
+ #
@@ t/t6600-test-reach.sh: test_expect_success 'get_merge_bases_many' '
+
+test_expect_success 'get_merge_bases_many:pending-stale' '
+ # Exercises the (PARENT1|PARENT2) -> (...|STALE) transition path in
-+ # paint_down_to_common(). See the topology comment in the setup test.
++ # paint_down_to_common(). See the topology comment in the setup test.
+ cat >input <<-\EOF &&
+ A:ps-T1
+ X:ps-T2
@@ t/t6600-test-reach.sh: test_expect_success 'get_merge_bases_many' '
+'
+
+test_expect_success 'get_merge_bases_many:infinity-both-sides' '
-+ # Exercises the push-time INFINITY-gate in paint_down_to_common(). See
++ # Exercises the push-time INFINITY-gate in paint_down_to_common(). See
+ # the pi-* topology comment in the setup test.
+ cat >input <<-\EOF &&
+ A:pi-X
5: faf5bc98ed ! 3: 90f09ecb5c t6099, t6600: add side-exhaustion regression tests
@@ t/t6099-merge-base-side-exhaustion.sh (new)
+
+Test that merge-base --all correctly handles cases where
+multiple merge-base candidates exist and one is an ancestor
-+of another. The side-exhaustion optimization in
++of another. The side-exhaustion optimization in
+paint_down_to_common may exit before STALE propagation
+removes the ancestor, but remove_redundant catches it.
+
@@ t/t6600-test-reach.sh: test_expect_success 'get_merge_bases_many:infinity-both-s
+test_expect_success 'setup mixed finite/INFINITY topology' '
+ # Create a commit outside all saved commit-graph files so it always
+ # has INFINITY generation, while its parent (ps-X) is in the graph
-+ # with a finite generation. Use the ps-* orphan topology so we do
++ # with a finite generation. Use the ps-* orphan topology so we do
+ # not pollute the grid-based rev-list tests.
+ git checkout ps-X &&
+ test_env GIT_TEST_COMMIT_GRAPH= test_commit pm-INF
@@ t/t6600-test-reach.sh: test_expect_success 'get_merge_bases_many:infinity-both-s
+test_expect_success 'get_merge_bases_many:mixed-finite-infinity' '
+ # One tip (pm-INF) is outside the commit-graph with INFINITY
+ # generation; the other (ps-B) is in the graph with finite
-+ # generation. The walk starts in the INFINITY region and crosses
++ # generation. The walk starts in the INFINITY region and crosses
+ # into the finite region where side-exhaustion can fire.
+ cat >input <<-\EOF &&
+ A:pm-INF
-: ---------- > 4: 6ade4df2ed commit-reach: add trace2 instrumentation to paint_down_to_common()
2: 316e4dfe26 ! 5: f24edd45f0 commit-reach: introduce struct paint_queue with per-side counters
@@ Metadata
Author: Kristofer Karlsson <krka@spotify.com>
## Commit message ##
- commit-reach: introduce struct paint_queue with per-side counters
+ commit-reach: introduce struct paint_state with per-side counters
- Replace the nonstale_queue abstraction in paint_down_to_common() with
- a new paint_queue struct that tracks per-side commit counts. Each
- non-stale queued commit occupies exactly one counter bucket based on
- its paint flags: PARENT1-only, PARENT2-only, or both sides (a pending
+ Add a paint_state struct for use by paint_down_to_common() that
+ wraps a prio_queue with per-side commit counters. Each non-stale
+ queued commit occupies exactly one counter bucket based on its
+ paint flags: PARENT1-only, PARENT2-only, or both sides (a pending
merge-base candidate).
- The counters are maintained by paint_count_transition() which handles
- all flag changes as bucket transfers: remove from the old bucket, add
- to the new one. Either step is a no-op when the respective state has
- no bucket (stale or zero).
+ The counters are maintained by paint_count_update() which adjusts
+ the appropriate bucket by a signed delta. An exhaustive switch on
+ the paint+stale bits documents all valid flag combinations in one
+ place.
- The loop now drains the queue via paint_queue_get() and breaks when
- all counters reach zero, replacing the old pointer-based termination
- (max_nonstale). This is equivalent behavior.
+ Convert paint_down_to_common() to use paint_state. The loop now
+ drains the queue via paint_queue_get() which returns NULL when all
+ counters reach zero, replacing the old pointer-based termination
+ (max_nonstale). This is equivalent behavior -- both conditions
+ detect that no non-stale entries remain.
+
+ The existing nonstale_queue is left in place for ahead_behind().
+
+ Step counts (via trace2 from the previous commit) are identical
+ before and after this refactoring, confirming no behavioral change.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
+ ## Documentation/technical/paint-down-to-common.adoc ##
+@@ Documentation/technical/paint-down-to-common.adoc: re-enqueued is bounded by the number of flag transitions.
+ Termination
+ -----------
+
+-The walk uses a `nonstale_queue` wrapper around `prio_queue` that
+-tracks `max_nonstale`: the lowest-priority non-stale commit enqueued
+-so far. Once that commit is dequeued, every remaining entry is known
+-to be STALE and the loop terminates. Specifically, the main loop
++The walk tracks the number of commits of each type in the queue
++(PARENT1-only, PARENT2-only, pending merge-base). The main loop
+ ends when one of the following conditions holds:
+
+ 1. The queue is empty.
+- 2. `max_nonstale` has been dequeued, meaning the queue only contains
+- STALE entries.
++ 2. The queue contains only stale entries.
+
+ Stale entry condition
+ ~~~~~~~~~~~~~~~~~~~~~
+
## commit-reach.c ##
-@@ commit-reach.c: static int compare_commits_by_gen(const void *_a, const void *_b)
+@@ commit-reach.c: static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue)
+ return commit;
}
- /*
-- * A prio_queue with O(1) termination check. 'max_nonstale' tracks
-- * the lowest-priority non-stale commit enqueued so far; once it is
-- * popped, every remaining entry is known to be STALE.
++/*
+ * Priority queue with per-side commit counters for paint_down_to_common().
+ * Each non-stale queued commit occupies exactly one bucket: PARENT1-only,
+ * PARENT2-only, or both (a pending merge-base candidate).
- */
--struct nonstale_queue {
-+struct paint_queue {
- struct prio_queue pq;
-- struct commit *max_nonstale;
++ */
++struct paint_state {
++ struct prio_queue queue;
+ int p1_count;
+ int p2_count;
+ int pending_merge_bases;
- };
-
--static void nonstale_queue_put(struct nonstale_queue *queue,
-- struct commit *c)
-+/*
-+ * Adjust per-side counters for a paint-state transition. Non-stale
-+ * commits are counted in one of three counters: PARENT1-only,
-+ * PARENT2-only, or both. Zero means "not in the queue" (used on
-+ * enqueue/dequeue); stale commits are not counted at all.
-+ */
-+static void paint_count_transition(struct paint_queue *queue,
-+ unsigned old_flags, unsigned new_flags)
- {
-- struct commit *old = queue->max_nonstale;
-+ unsigned old_paint = old_flags & (PARENT1 | PARENT2 | STALE);
-+ unsigned new_paint = new_flags & (PARENT1 | PARENT2 | STALE);
-
-- prio_queue_put(&queue->pq, c);
-- if (c->object.flags & STALE)
-+ if (old_paint == new_paint)
- return;
-- if (!old || queue->pq.compare(old, c, queue->pq.cb_data) <= 0)
-- queue->max_nonstale = c;
--}
--
--static struct commit *nonstale_queue_get(struct nonstale_queue *queue)
--{
-- struct commit *commit = prio_queue_get(&queue->pq);
-
-- if (commit == queue->max_nonstale)
-- queue->max_nonstale = NULL;
--
-- return commit;
-+ if (!(old_paint & STALE)) {
-+ switch (old_paint & (PARENT1 | PARENT2)) {
-+ case 0: break;
-+ case PARENT1: queue->p1_count--; break;
-+ case PARENT2: queue->p2_count--; break;
-+ case PARENT1 | PARENT2: queue->pending_merge_bases--; break;
-+ default: BUG("unexpected paint state");
-+ }
-+ }
-+ if (!(new_paint & STALE)) {
-+ switch (new_paint & (PARENT1 | PARENT2)) {
-+ case 0: break;
-+ case PARENT1: queue->p1_count++; break;
-+ case PARENT2: queue->p2_count++; break;
-+ case PARENT1 | PARENT2: queue->pending_merge_bases++; break;
-+ default: BUG("unexpected paint state");
-+ }
++};
++
++static void paint_count_update(struct paint_state *state,
++ unsigned flags, int delta)
++{
++ switch (flags & (PARENT1 | PARENT2 | STALE)) {
++ case PARENT1:
++ state->p1_count += delta;
++ break;
++
++ case PARENT2:
++ state->p2_count += delta;
++ break;
++
++ case PARENT1 | PARENT2:
++ state->pending_merge_bases += delta;
++ break;
++
++ case PARENT1 | PARENT2 | STALE:
++ break;
++
++ default:
++ BUG("unexpected paint state");
+ }
- }
-
--static void clear_nonstale_queue(struct nonstale_queue *queue)
-+static void paint_queue_put(struct paint_queue *queue,
++}
++
++static void paint_queue_put(struct paint_state *state,
+ struct commit *c, unsigned add_flags)
- {
-- clear_prio_queue(&queue->pq);
-- queue->max_nonstale = NULL;
--}
++{
+ unsigned old_flags = c->object.flags;
+ c->object.flags |= add_flags;
-
--static void nonstale_queue_put_dedup(struct nonstale_queue *queue,
-- struct commit *c)
--{
-- if (c->object.flags & ENQUEUED)
-- return;
-- c->object.flags |= ENQUEUED;
-- nonstale_queue_put(queue, c);
++
+ if (old_flags & ENQUEUED) {
-+ paint_count_transition(queue, old_flags, c->object.flags);
++ paint_count_update(state, old_flags, -1);
++ paint_count_update(state, c->object.flags, 1);
+ } else {
+ c->object.flags |= ENQUEUED;
-+ prio_queue_put(&queue->pq, c);
-+ paint_count_transition(queue, 0, c->object.flags);
++ prio_queue_put(&state->queue, c);
++ paint_count_update(state, c->object.flags, 1);
+ }
- }
-
--static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue)
-+static struct commit *paint_queue_get(struct paint_queue *queue)
- {
-- struct commit *commit = nonstale_queue_get(queue);
-+ struct commit *commit = prio_queue_get(&queue->pq);
-
-- if (commit)
++}
++
++static struct commit *paint_queue_get(struct paint_state *state)
++{
++ struct commit *commit;
++
++ if (!state->p1_count && !state->p2_count &&
++ !state->pending_merge_bases)
++ return NULL;
++
++ commit = prio_queue_get(&state->queue);
+ if (commit) {
- commit->object.flags &= ~ENQUEUED;
-+ paint_count_transition(queue, commit->object.flags, 0);
++ commit->object.flags &= ~ENQUEUED;
++ paint_count_update(state, commit->object.flags, -1);
+ }
- return commit;
- }
-
++ return commit;
++}
++
+ /*
+ * See Documentation/technical/paint-down-to-common.adoc
+ *
@@ commit-reach.c: static int paint_down_to_common(struct repository *r,
enum merge_base_flags mb_flags,
struct commit_list **result)
{
- struct nonstale_queue queue = {
- { compare_commits_by_gen_then_commit_date }
-+ struct paint_queue queue = {
-+ .pq = { compare_commits_by_gen_then_commit_date }
++ struct paint_state state = {
++ .queue = { compare_commits_by_gen_then_commit_date }
};
+ struct commit *commit;
int i;
+ int steps = 0;
timestamp_t last_gen = GENERATION_NUMBER_INFINITY;
struct commit_list **tail = result;
-@@ commit-reach.c: static int paint_down_to_common(struct repository *r,
+
+ if (!min_generation && !corrected_commit_dates_enabled(r))
+- queue.pq.compare = compare_commits_by_commit_date;
++ state.queue.compare = compare_commits_by_commit_date;
+
+ one->object.flags |= PARENT1;
+ if (!n) {
commit_list_append(one, result);
return 0;
}
- nonstale_queue_put_dedup(&queue, one);
-+ paint_queue_put(&queue, one, 0);
++ paint_queue_put(&state, one, 0);
- for (i = 0; i < n; i++) {
- twos[i]->object.flags |= PARENT2;
- nonstale_queue_put_dedup(&queue, twos[i]);
- }
+ for (i = 0; i < n; i++)
-+ paint_queue_put(&queue, twos[i], PARENT2);
++ paint_queue_put(&state, twos[i], PARENT2);
- while (queue.max_nonstale) {
- struct commit *commit = nonstale_queue_get_dedup(&queue);
-+ while ((commit = paint_queue_get(&queue))) {
++ while ((commit = paint_queue_get(&state))) {
struct commit_list *parents;
int flags;
timestamp_t generation = commit_graph_generation(commit);
@@ commit-reach.c: static int paint_down_to_common(struct repository *r,
continue;
if (repo_parse_commit(r, p)) {
- clear_nonstale_queue(&queue);
-+ clear_prio_queue(&queue.pq);
++ clear_prio_queue(&state.queue);
commit_list_free(*result);
*result = NULL;
/*
@@ commit-reach.c: static int paint_down_to_common(struct repository *r,
}
- p->object.flags |= flags;
- nonstale_queue_put_dedup(&queue, p);
-+ paint_queue_put(&queue, p, flags);
++ paint_queue_put(&state, p, flags);
}
-+
-+ if (queue.p1_count + queue.p2_count +
-+ queue.pending_merge_bases == 0)
-+ break;
}
- clear_nonstale_queue(&queue);
-+ clear_prio_queue(&queue.pq);
++ clear_prio_queue(&state.queue);
+ trace2_data_intmax("paint_down_to_common", r,
+ "steps", steps);
commit_list_sort_by_date(result);
- return 0;
- }
-: ---------- > 6: 8c72f01083 commit-reach: remove unused nonstale_queue dedup wrappers
3: ed12a5cb5b ! 7: d84b932e5b commit-reach: terminate merge-base walk when one paint side is exhausted
@@ Commit message
commit-reach: terminate merge-base walk when one paint side is exhausted
Add an early termination check to paint_down_to_common() using the
- per-side counters introduced in the previous commit. Once the walk
- enters the finite-generation region, terminate early when one side's
- exclusive count drops to zero -- no new merge-base can form without
- both paint sides meeting.
+ per-side counters introduced earlier. Once the walk enters the
+ finite-generation region, terminate early when one side's exclusive
+ count drops to zero -- no new merge-base can form without both paint
+ sides meeting.
The check also waits for pending_merge_bases to reach zero, ensuring
- all merge-base candidates have been popped and recorded before
+ all merge-base candidates have been dequeued and recorded before
exiting.
The INFINITY gate ensures correctness: commits without a commit-graph
@@ Commit message
once the walk enters the finite-generation region where ordering
guarantees hold.
- On large repositories with commit-graph, this yields 100-1000x
- speedups for merge-base queries where one side (e.g. a PR branch) is
- much smaller than the other.
+ Step counts measured with trace2 on git.git with commit-graph:
+
+ merge-base --all v2.0.0 v2.55.0-rc1:
+ before: 72264 steps after: 44589 steps
+
+ merge-base --all v2.55.0-rc1 v2.55.0-rc1~5:
+ before: 110 steps after: 7 steps
Helped-by: Derrick Stolee <stolee@gmail.com>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
+ ## Documentation/technical/paint-down-to-common.adoc ##
+@@ Documentation/technical/paint-down-to-common.adoc: ends when one of the following conditions holds:
+
+ 1. The queue is empty.
+ 2. The queue contains only stale entries.
++ 3. Side exhaustion: no pure PARENT1 or pure PARENT2 commits
++ remain in the queue, no pending merge-base candidates exist,
++ and the walk has entered the finite-generation region.
+
+ Stale entry condition
+ ~~~~~~~~~~~~~~~~~~~~~
+@@ Documentation/technical/paint-down-to-common.adoc: existing candidates by proving one is an ancestor of another, but
+ `remove_redundant()` handles that as a post-processing step, so it
+ is safe to exit early.
+
++Side-exhaustion condition
++~~~~~~~~~~~~~~~~~~~~~~~~~
++A new merge-base requires commits from both sides to meet. When one
++side's exclusive counter reaches zero and there are no pending
++merge-base candidates, no future traversal step can produce a new
++candidate.
++
++This optimization only activates in the finite-generation region
++where topological ordering holds. In that region, children are
++always visited before parents, so paint flags are final at visit
++time and an exhausted side cannot reappear. In the INFINITY region,
++commit-date ordering can violate this guarantee, so the check is
++skipped.
++
+ Related documentation
+ ---------------------
+
+
## commit-reach.c ##
-@@ commit-reach.c: static int paint_down_to_common(struct repository *r,
- if (queue.p1_count + queue.p2_count +
- queue.pending_merge_bases == 0)
- break;
+@@ commit-reach.c: static void paint_queue_put(struct paint_state *state,
+
+ static struct commit *paint_queue_get(struct paint_state *state)
+ {
+- struct commit *commit;
++ struct commit *commit = prio_queue_get(&state->queue);
+
+- if (!state->p1_count && !state->p2_count &&
+- !state->pending_merge_bases)
++ if (!commit)
+ return NULL;
+
+- commit = prio_queue_get(&state->queue);
+- if (commit) {
+- commit->object.flags &= ~ENQUEUED;
+- paint_count_update(state, commit->object.flags, -1);
++ commit->object.flags &= ~ENQUEUED;
+
++ if (!state->pending_merge_bases) {
++ if (!state->p1_count && !state->p2_count)
++ return NULL;
+ /*
+ * Side exhaustion: a new merge-base can only form
+ * when both PARENT1-only and PARENT2-only commits
-+ * remain in the queue. In the finite-generation
++ * remain in the queue. In the finite-generation
+ * region the queue is ordered topologically, so
+ * no future step can add paint to visited commits
+ * and an exhausted side cannot reappear.
+ */
-+ if (generation < GENERATION_NUMBER_INFINITY &&
-+ queue.pending_merge_bases == 0 &&
-+ (queue.p1_count == 0 || queue.p2_count == 0))
-+ break;
++ if ((!state->p1_count || !state->p2_count) &&
++ commit_graph_generation(commit) < GENERATION_NUMBER_INFINITY)
++ return NULL;
}
++
++ paint_count_update(state, commit->object.flags, -1);
+ return commit;
+ }
+
+
+ ## t/t6600-test-reach.sh ##
+@@ t/t6600-test-reach.sh: test_expect_success 'merge-base --all commit-walk steps' '
+ cp commit-graph-full .git/objects/info/commit-graph &&
+ GIT_TRACE2_EVENT="$(pwd)/trace-full.txt" \
+ git merge-base --all commit-9-9 commit-9-1 >actual &&
+- test_trace2_data paint_down_to_common steps 80 <trace-full.txt &&
++ test_trace2_data paint_down_to_common steps 9 <trace-full.txt &&
+
+ cp commit-graph-half .git/objects/info/commit-graph &&
+ GIT_TRACE2_EVENT="$(pwd)/trace-half.txt" \
+ git merge-base --all commit-9-9 commit-9-1 >actual &&
+- test_trace2_data paint_down_to_common steps 81 <trace-half.txt
++ test_trace2_data paint_down_to_common steps 57 <trace-half.txt
+ '
- clear_prio_queue(&queue.pq);
+ test_expect_success 'reduce_heads' '
--
gitgitgadget
^ permalink raw reply
* Re: [PATCH GSoC RFC v13 06/12] connect: refactor packet writing
From: Pablo Sabater @ 2026-06-24 12:08 UTC (permalink / raw)
To: Karthik Nayak
Cc: gitster, peff, eric.peijian, chriscool, git, jltobler, toon,
chandrapratap3519
In-Reply-To: <CAOLa=ZSvxXuf_bSzKMvViNQ5MuDAqxnQdo4asF9vfMhJaDQcVw@mail.gmail.com>
El lun, 22 jun 2026 a las 22:43, Karthik Nayak
(<karthik.188@gmail.com>) escribió:
>
> Pablo Sabater <pabloosabaterr@gmail.com> writes:
>
> [snip]
>
> > diff --git a/connect.c b/connect.c
> > index 1dced8e632..78c69d4485 100644
> > --- a/connect.c
> > +++ b/connect.c
> > @@ -700,16 +700,16 @@ int server_supports(const char *feature)
> > return !!server_feature_value(feature, NULL);
> > }
> >
> > -void write_fetch_command_and_capabilities(struct strbuf *req_buf,
> > - const struct string_list *server_options)
> > +void write_command_and_capabilities(struct strbuf *req_buf, const char *command,
> > + const struct string_list *server_options)
> > {
> > const char *hash_name;
> > int advertise_sid;
> >
> > repo_config_get_bool(the_repository, "transfer.advertisesid", &advertise_sid);
> >
> > - ensure_server_supports_v2("fetch");
> > - packet_buf_write(req_buf, "command=fetch");
> > + ensure_server_supports_v2(command);
> > + packet_buf_write(req_buf, "command=%s", command);
> > if (server_supports_v2("agent"))
> > packet_buf_write(req_buf, "agent=%s", git_user_agent_sanitized());
> > if (advertise_sid && server_supports_v2("session-id"))
> > @@ -727,7 +727,7 @@ void write_fetch_command_and_capabilities(struct strbuf *req_buf,
> > die(_("mismatched algorithms: client %s; server %s"),
> > the_hash_algo->name, hash_name);
> > packet_buf_write(req_buf, "object-format=%s", the_hash_algo->name);
> > - } else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1_LEGACY) {
> > + } else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1) {
> > die(_("the server does not support algorithm '%s'"),
> > the_hash_algo->name);
> > }
>
> Why did we make this change? If the server doesn't support v2, then the
> object format should be `GIT_HASH_SHA1_LEGACY`. While the value of it is
> indeed `GIT_HASH_SHA1`, it indicates a scenario where there was no
> option to select object hash, which is the scenario here.
>
> If there is a reason to make such a change, perhaps we should highlight
> this in the commit message.
Hi!
There should be no diff related to that line, In some point between
Eric's last version (v11) and mine's firs (v12) the original code
changed. On the diff from v11 [1] the object format is the same, i
didn't notice this change and it's wrong, I'll fix it for v14, Thanks!
>
> > diff --git a/connect.h b/connect.h
> > index c4f6ea4b0a..8f4c523892 100644
> > --- a/connect.h
> > +++ b/connect.h
> > @@ -34,8 +34,12 @@ void check_stateless_delimiter(int stateless_rpc,
> > struct packet_reader *reader,
> > const char *error);
> >
> > +/*
> > + * Writes a command along with the requested server capabilities/features into a
> > + * request buffer.
> > + */
> > struct string_list;
>
> The comment should be above the function and not the forward
> declaration.
True, I'll fix it for v14.
>
> While we're here, why not `#include "string-list.h"` and remove the
> forward declaration, is there a circular dependency?
I believe this was right because from what I know forward declarations
are prefered in headers when in this case, the struct is only used as
a pointer. Investigating, this came from a review from patrick [2].
[snip]
[1]: https://lore.kernel.org/git/20250221190451.12536-5-eric.peijian@gmail.com/
[2]: https://lore.kernel.org/git/Z0RIqUAoEob8lGfM@pks.im/
Thanks for the review,
Pablo.
^ permalink raw reply
* Re: [PATCH v3 0/2] doc: clarify review replies and reroll timing
From: Patrick Steinhardt @ 2026-06-24 11:47 UTC (permalink / raw)
To: Weijie Yuan; +Cc: git, gitster
In-Reply-To: <cover.1782028813.git.wy@wyuan.org>
On Sun, Jun 21, 2026 at 04:04:36PM +0800, Weijie Yuan wrote:
> Changes in v3:
>
> - Reworked the substantial-rework case. Instead of suggesting that
> authors send a new version sooner, the text now advises authors not
> to rush out an updated version before reviewing the larger changes
> carefully. It recommends replying to the review that prompted the
> rewrite, saying that a substantial rework is planned, and pointing
> out which parts of the current series will become obsolete.
>
> - Dropped the advice that a topic close to being accepted may justify
> a quicker reroll.
>
> - Removed "how close the topic is to being accepted" from the short
> reroll-timing guidance in Documentation/SubmittingPatches.
>
> - Updated the commit message of patch 2 accordingly.
I'm happy with this version, thanks!
Patrick
^ permalink raw reply
* Re: [PATCH v3 2/2] doc: advise batching patch rerolls
From: Patrick Steinhardt @ 2026-06-24 11:46 UTC (permalink / raw)
To: Weijie Yuan; +Cc: git, gitster
In-Reply-To: <e1050a6ef5e26299b2c6d9743067fe3d7f4f8071.1782028813.git.wy@wyuan.org>
On Sun, Jun 21, 2026 at 04:05:34PM +0800, Weijie Yuan wrote:
> diff --git a/Documentation/MyFirstContribution.adoc b/Documentation/MyFirstContribution.adoc
> index 00704ab91e..35105bc3b4 100644
> --- a/Documentation/MyFirstContribution.adoc
> +++ b/Documentation/MyFirstContribution.adoc
> @@ -1330,6 +1330,28 @@ previous one" patches over 2 days), reviewers would strongly prefer if a
> single polished version came 2 days later instead, and that version with
> fewer mistakes were the only one they would need to review.
>
> +This consideration applies not only when going from the initial patch to v2,
> +but also to later iterations of the same series. There is no fixed rule for how
> +long to wait before sending a new version. A useful default is to send at most
> +one new version of the same patch series per day. This gives multiple reviewers
> +time to comment, gives reviewers across time zones a fair chance to
> +participate, lets you batch feedback together, and gives you time to think
> +through the comments you received. Knowing that you should not immediately send
> +another version also encourages you to review the patches more carefully before
> +sending them, catch small mistakes such as typos and off-by-one errors
> +yourself, and let reviewers spend more of their attention on design,
> +algorithms, and other substantial issues.
> +
> +The right timing depends on the topic and the feedback. Larger series usually
> +need more review time. If the only comments so far are minor, such as typo
> +fixes, it often makes sense to wait a little longer in case deeper reviews are
> +still coming. If the comments call for substantial rework, do not rush out an
> +updated version before you have reviewed the larger changes carefully. Instead,
> +reply to the review that prompted the rewrite, say that you are preparing a
> +substantial rework, and mention which parts of the current series will become
> +obsolete so reviewers can avoid spending time on them until the updated series
> +is ready.
Makes sense.
Patrick
^ permalink raw reply
* Re: [PATCH v2 2/2] doc: advise batching patch rerolls
From: Patrick Steinhardt @ 2026-06-24 11:46 UTC (permalink / raw)
To: Weijie Yuan; +Cc: Junio C Hamano, git
In-Reply-To: <ajVCD51lLvHreyJB@wyuan.org>
On Fri, Jun 19, 2026 at 09:20:15PM +0800, Weijie Yuan wrote:
> Sorry for the late reply. I spent some time looking back through the
> discussions on earlier patch series, to check my patch itself, of course
> because I'm apparently a newcomer here.
>
> On Wed, Jun 17, 2026 at 10:50:53AM -0700, Junio C Hamano wrote:
> > > If the comments require substantial rework, sending a new version
> > > +sooner may save reviewers from spending time on a version you already know will
> > > +change significantly.
> >
> > I am not sure about this one. Even though the intention to avoid
> > wasting reviewers' time spent on reading through the previous
> > version that will be invalidated is a good one, by definition, a
> > substantial rework will naturally take time, and it is better not to
> > rush and send an updated version with substantial changes that you
> > yourself haven't had a chance to thoroughly review yet.
> >
> > In such a case, it would be a better idea to respond to the review
> > that made you realize a substantial rewrite is needed with a simple
> > "I'll make a substantial rework based on this comment, which would
> > invalidate this and that part of the current patch series, so please
> > do not waste reviewer cycles on these parts until I send an updated
> > series out" message.
>
> I think the approach you recommended is obviously more reasonable.
>
> It would be better to give everyone a heads-up "I am working on a
> new version."
>
> I will improve this part accordingly.
Yeah, that works for me, as well.
> > > If the topic is close to being accepted and the remaining
> > > +comments are small, a quicker new version may also be fine.
> >
> > I am not sure if this needs to be codified.
> >
> > I often see (e.g., in patches from Patrick) that an iteration is
> > marked clearly as final candidate that the author is not aware of
> > any outstanding issues. This encourages reviewers to ask "what
> > about this one raised there?" to remind what is missed, or chime in
> > with "yup, this looks good" to show support. Such a note is highly
> > recommended, but I do not see a need to say "the (supposedly) final
> > one is specifically allowed to be sent without waiting" even then.
>
> Actually I thought Patrick would say something here ;-) so I waited a
> few more days to see whether anyone else had any suggestions.
>
> But here I think Patrick's original intention is: If your series is
> *close* to be accepted, (while I'm not sure what the precise definition
> of this "close to be accepted", does it means: commented by Junio with
> "Looks good", or reviewed by the community/core contributors with "Makes
> sense"?) and this time there happens to be a small issue, you can
> re-roll quickly to make your series more "sturdy" to wait for
> maintainer's final examination and further merges.
>
> So, I think the situation you are describing here is that this version
> of the patch has already been declared by the *author* to be the final
> version. (i.e. waiting for Junio to do the last exam)
My "close to be accepted" feeling is when you've had multiple rounds of
design discussions already, everyone is on the same page, and all you
got on the last review round is a couple of typo fixes.
But all of this is highly subjective, so it'll always depend and it
won't be easy to codify all of that. Nor is that necessary, I guess. We
really only want to provide some rough guidance.
> Therefore, I do not think the two situations conflict with each other,
> or are directly related. One concerns a patch that is already close to
> receiving the maintainer's final verdict, where a minor issue is
> discovered and the author quickly rerolls it. The other concerns an
> author who, without realizing that some issues remain unresolved, rushes
> to send what they believe to be the final version and then waits for the
> maintainer to review it.
>
> For the latter case, I think it would be better to add a sentence along
> the lines of: "Before sending a new version/the final version, check
> once more whether there are any unresolved issues," if the existing
> documentation does not already make this clear.
I think that should mostly be clear with our documentation. And
eventually, we should also expect people to have some common sense :)
> That said, I am not familiar with how patch discussions have played out
> in the past, so please directly point out any mistakes in my
> understanding. I have to admit that, by this point in writing the
> message, I have become a little tangled up in my own reasoning.
I guess that's kind of expected, mostly because many of these things are
highly subjective and will depend on the situation. The guidance does
not have to be perfect, you'll probably be able to find counterexamples
for many of the cases.
Patrick
^ permalink raw reply
* Re: [PATCH 0/6] receive-pack: use ODB transactions to stage object writes
From: Patrick Steinhardt @ 2026-06-24 11:27 UTC (permalink / raw)
To: Justin Tobler; +Cc: git
In-Reply-To: <20260624041920.2601961-1-jltobler@gmail.com>
On Tue, Jun 23, 2026 at 11:19:14PM -0500, Justin Tobler wrote:
> Greetings,
>
> This patch series replaces direct usage of the `tmp_objdir` interfaces
> in git-receive-pack(1) to instead use the `odb_transaction` interfaces
> to create/manage a staging area to write objects to. The purpose of this
> change is to get git-receive-pack(1) one step closer to being ODB
> backend agnostic. For now, the object writes themselves are still
> "files" backend specific due to being handled by the git-index-pack(1)
> and git-unpack-objects(1) child processes. This will be tackled in a
> separate series though.
Thanks, this was a pleasant read. I've got a bunch of comments, but
overall I really like the direction of this patch series.
Patrick
^ permalink raw reply
* Re: [PATCH 6/6] builtin/receive-pack: stage incoming objects via ODB transactions
From: Patrick Steinhardt @ 2026-06-24 11:26 UTC (permalink / raw)
To: Justin Tobler; +Cc: git
In-Reply-To: <20260624041920.2601961-7-jltobler@gmail.com>
On Tue, Jun 23, 2026 at 11:19:20PM -0500, Justin Tobler wrote:
> Objects received by git-receive-pack(1) are quarantined in a temporary
> "incoming" directory and migrated into the object database prior to the
> reference updates. The quarantine is currently managed through
> `tmp_objdir` directly. In a pluggable ODB future, how exactly an object
> gets written to a transaction may vary for a given ODB source. Refactor
> git-receive-pack(1) to use the ODB transaction interfaces to manage the
> object staging area in a more agnostic manner accordingly.
>
> Note that the temporary directory created for git-receive-pack(1) is
> eagerly created and uses a different prefix name. This behavior is
A different prefix name compared to what?
> special cased in the "files" backend by having `odb_transaction_begin()`
> callers that require this behavior provide an `ODB_TRANSACTION_RECEIVE`
> flag.
Okay. I guess this is to retain existing behaviour where the temporary
directory is created lazily everywhere else. Makes me wonder whether we
should eventually change this to just unconditionally create the
directory in all cases so that we can drop this new flag.
It might've also made sense to split this commit up into two: one to
introduce the flag parameter, and then one to do the changes to
git-receive-pack(1).
> diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> index 19eb6a1b61..ee8e03e2ab 100644
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -112,8 +112,6 @@ static enum {
> } use_keepalive;
> static int keepalive_in_sec = 5;
>
> -static struct tmp_objdir *tmp_objdir;
> -
> static struct proc_receive_ref {
> unsigned int want_add:1,
> want_delete:1,
I assume the goal is that we convert all other users of the tmp-objdir
subsystem to also use transactions eventually, so that this becomes an
implementation detail fo the files transaction?
> @@ -2106,14 +2104,13 @@ static void execute_commands(struct command *commands,
> * Now we'll start writing out refs, which means the objects need
> * to be in their final positions so that other processes can see them.
> */
> - if (tmp_objdir_migrate(tmp_objdir) < 0) {
> + if (odb_transaction_commit(the_repository->objects->transaction)) {
> for (cmd = commands; cmd; cmd = cmd->next) {
> if (!cmd->error_string)
> cmd->error_string = "unable to migrate objects to permanent storage";
> }
> return;
> }
> - tmp_objdir = NULL;
We don't need to unset the transaction because that's what
`odb_transaction_commit()` already does for us, I assume?
> @@ -2326,7 +2323,8 @@ static void push_header_arg(struct strvec *args, struct pack_header *hdr)
> ntohl(hdr->hdr_version), ntohl(hdr->hdr_entries));
> }
>
> -static const char *unpack(int err_fd, struct shallow_info *si)
> +static const char *unpack(int err_fd, struct shallow_info *si,
> + struct odb_transaction *transaction)
> {
> struct pack_header hdr;
> const char *hdr_err;
It feels a bit weird that we sometimes pass the transaction as
parameter, whereas othertimes we access it via `the_repository`.
> @@ -2351,20 +2349,7 @@ static const char *unpack(int err_fd, struct shallow_info *si)
> strvec_push(&child.args, alt_shallow_file);
> }
>
> - tmp_objdir = tmp_objdir_create(the_repository, "incoming");
> - if (!tmp_objdir) {
> - if (err_fd > 0)
> - close(err_fd);
> - return "unable to create temporary object directory";
> - }
> - strvec_pushv(&child.env, tmp_objdir_env(tmp_objdir));
> -
> - /*
> - * Normally we just pass the tmp_objdir environment to the child
> - * processes that do the heavy lifting, but we may need to see these
> - * objects ourselves to set up shallow information.
> - */
> - tmp_objdir_add_as_alternate(tmp_objdir);
> + strvec_pushv(&child.env, odb_transaction_env(transaction));
Interesting, this here seems like a change in behaviour. Previously we
added the transactions as an alternate, but now we only propagate it via
the environment. I didn't see this mentioned in the commit message.
> @@ -2707,7 +2694,10 @@ int cmd_receive_pack(int argc,
> if (!si.nr_ours && !si.nr_theirs)
> shallow_update = 0;
> if (!delete_only(commands)) {
> - unpack_status = unpack_with_sideband(&si);
> + if (odb_transaction_begin(the_repository->objects, &transaction, ODB_TRANSACTION_RECEIVE))
> + unpack_status = "unable to start ODB transaction";
s/ODB/object/
This may be visible to the user, and "ODB" may mean nothing to them.
> diff --git a/object-file.c b/object-file.c
> index 14064d188a..e7958753ec 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1702,7 +1703,8 @@ static const char **odb_transaction_files_env(struct odb_transaction *base)
> }
>
> int odb_transaction_files_begin(struct odb_source *source,
> - struct odb_transaction **out)
> + struct odb_transaction **out,
> + enum odb_transaction_flags flags)
> {
> struct odb_transaction_files *transaction;
> struct object_database *odb = source->odb;
> @@ -1717,6 +1719,20 @@ int odb_transaction_files_begin(struct odb_source *source,
> transaction->base.commit = odb_transaction_files_commit;
> transaction->base.write_object_stream = odb_transaction_files_write_object_stream;
> transaction->base.env = odb_transaction_files_env;
> +
> + transaction->prefix = "bulk-fsync";
> + if (flags & ODB_TRANSACTION_RECEIVE) {
> + /*
> + * ODB transactions for git-receive-pack(1) eagerly create a
> + * temporary directory and use a different prefix.
> + */
> + transaction->prefix = "incoming";
> + if (odb_transaction_files_prepare(&transaction->base)) {
> + free(transaction);
> + return -1;
> + }
> + }
> +
Okay, makes sense. I really wonder whether we need to insist this much
on the exact name used by this, but better be safe than sorry for now I
guess.
And as mentioned before, I also wonder whether it really makes sense to
have the lazy creation of the tmp-objdir. Maybe add a NEEDSWORK item
here that mentions we want to investigate whether this is even needed at
all?
> diff --git a/odb/transaction.h b/odb/transaction.h
> index 536458297b..78392ff13d 100644
> --- a/odb/transaction.h
> +++ b/odb/transaction.h
> @@ -44,6 +43,10 @@ struct odb_transaction {
> const char **(*env)(struct odb_transaction *transaction);
> };
>
> +enum odb_transaction_flags {
> + ODB_TRANSACTION_RECEIVE = (1 << 0),
> +};
It's not clear at all what this flag does based on its name, so we
should have documentation for it.
Patrick
^ permalink raw reply
* Re: [PATCH 5/6] odb/transaction: add transaction env interface
From: Patrick Steinhardt @ 2026-06-24 11:26 UTC (permalink / raw)
To: Justin Tobler; +Cc: git
In-Reply-To: <20260624041920.2601961-6-jltobler@gmail.com>
On Tue, Jun 23, 2026 at 11:19:19PM -0500, Justin Tobler wrote:
> The ODB transaction backend is responsible for creating/managing its own
> staging area for writing objects. Other child processes spawned by Git
> may need to access to uncommitted objects or write new objects in the
s/may need to access to/may need access to/
> staging area though.
>
> Introduce `odb_transaction_env()` which is expected to provide the set
> of environment variables needed by a child process to access the
> transaction staging area.
Possessive s is missing, I think.
> diff --git a/object-file.c b/object-file.c
> index 696f05dc2d..14064d188a 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1691,6 +1691,16 @@ static int odb_transaction_files_commit(struct odb_transaction *base)
> return 0;
> }
>
> +static const char **odb_transaction_files_env(struct odb_transaction *base)
> +{
> + struct odb_transaction_files *transaction =
> + container_of(base, struct odb_transaction_files, base);
> +
> + odb_transaction_files_prepare(&transaction->base);
> +
> + return tmp_objdir_env(transaction->objdir);
> +}
> +
> int odb_transaction_files_begin(struct odb_source *source,
> struct odb_transaction **out)
> {
Makes sense. Transactions may have a different way to quarantine the
write than using a quarantine directory. So making this functionality
pluggable so that backends may expose a separate set of environment
variables feels sensible.
> diff --git a/odb/transaction.h b/odb/transaction.h
> index 7898770071..536458297b 100644
> --- a/odb/transaction.h
> +++ b/odb/transaction.h
> @@ -32,6 +32,16 @@ struct odb_transaction {
> int (*write_object_stream)(struct odb_transaction *transaction,
> struct odb_write_stream *stream, size_t len,
> struct object_id *oid);
> +
> + /*
> + * This callback is expected to return a NULL-terminated array of
> + * environment variables that a child process should inherit so
> + * that its object writes participate in the transaction. The
> + * returned array is owned by the backend and remains valid until
> + * the transaction ends. May return NULL when the backend does not
> + * need to expose any state to child processes.
> + */
> + const char **(*env)(struct odb_transaction *transaction);
Would it make more sense to adapt this function so that:
- It receives a `struct strvec` as input that the environment
variables are to be amended to.
- It returns a normal error code to indicate errors?
Patrick
^ permalink raw reply
* Re: [PATCH 4/6] odb/transaction: propagate commit errors
From: Patrick Steinhardt @ 2026-06-24 11:26 UTC (permalink / raw)
To: Justin Tobler; +Cc: git
In-Reply-To: <20260624041920.2601961-5-jltobler@gmail.com>
On Tue, Jun 23, 2026 at 11:19:18PM -0500, Justin Tobler wrote:
> diff --git a/odb/transaction.h b/odb/transaction.h
> index cd6d50f2e5..7898770071 100644
> --- a/odb/transaction.h
> +++ b/odb/transaction.h
> @@ -54,7 +54,7 @@ static inline void odb_transaction_begin_or_die(struct object_database *odb,
> * Commits an ODB transaction making the written objects visible. If the
> * specified transaction is NULL, the function is a no-op.
> */
> -void odb_transaction_commit(struct odb_transaction *transaction);
> +int odb_transaction_commit(struct odb_transaction *transaction);
Should the function comment be amended, as well? We should definitely
point out that calling this with a NULL transaction also returns
success.
Patrick
^ permalink raw reply
* Re: [PATCH 3/6] odb/transaction: propagate begin errors
From: Patrick Steinhardt @ 2026-06-24 11:26 UTC (permalink / raw)
To: Justin Tobler; +Cc: git
In-Reply-To: <20260624041920.2601961-4-jltobler@gmail.com>
On Tue, Jun 23, 2026 at 11:19:17PM -0500, Justin Tobler wrote:
> diff --git a/odb/transaction.c b/odb/transaction.c
> index b16e07aebf..d3de01db50 100644
> --- a/odb/transaction.c
> +++ b/odb/transaction.c
> @@ -2,14 +2,20 @@
> #include "odb/source.h"
> #include "odb/transaction.h"
>
> -struct odb_transaction *odb_transaction_begin(struct object_database *odb)
> +int odb_transaction_begin(struct object_database *odb,
> + struct odb_transaction **out)
> {
> - if (odb->transaction)
> - return NULL;
> + int ret;
>
> - odb_source_begin_transaction(odb->sources, &odb->transaction);
> + if (odb->transaction) {
> + *out = NULL;
> + return 0;
> + }
Hm. So we may return successful, but not set the `out` pointer to a
transaction. And...
> diff --git a/odb/transaction.h b/odb/transaction.h
> index f4c1ebfaaa..cd6d50f2e5 100644
> --- a/odb/transaction.h
> +++ b/odb/transaction.h
> @@ -33,11 +35,20 @@ struct odb_transaction {
> };
>
> /*
> - * Starts an ODB transaction. Subsequent objects are written to the transaction
> - * and not committed until odb_transaction_commit() is invoked on the
> - * transaction. If the ODB already has a pending transaction, NULL is returned.
> + * Starts an ODB transaction and returns it via `out`. Subsequent objects are
> + * written to the transaction and not committed until odb_transaction_commit()
> + * is invoked on the transaction. Returns 0 on success and a negative value on
> + * error. If the ODB already has a pending transaction, `out` is set to NULL.
> */
> -struct odb_transaction *odb_transaction_begin(struct object_database *odb);
> +int odb_transaction_begin(struct object_database *odb,
> + struct odb_transaction **out);
> +
> +static inline void odb_transaction_begin_or_die(struct object_database *odb,
> + struct odb_transaction **out)
> +{
> + if (odb_transaction_begin(odb, out))
> + die(_("failed to start ODB transaction"));
> +}
... we don't special-case that here, either. So a caller may invoke the
function, not die, but it might still not have a valid transaction. That
feels wrong to me.
Patrick
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox