* [PATCH v7 06/11] reset: introduce ability to skip updating HEAD
From: Patrick Steinhardt @ 2026-06-29 7:34 UTC (permalink / raw)
To: git
Cc: Pablo Sabater, Junio C Hamano, Kristoffer Haugsbakk, Phillip Wood,
Christian Couder
In-Reply-To: <20260629-b4-pks-history-drop-v7-0-6e9392a957d8@pks.im>
In a subsequent commit we'll introduce a new caller to
`reset_working_tree()` that really only wants to update the index and
working tree, without updating any references. Introduce a new flag that
makes the caller opt in to updating HEAD and adapt all callers to set
that flag.
Note that in a previous iteration we instead introduced a flag that made
callers opt out of updating any references. This was somewhat awkward
though because we already have the `UPDATE_ORIG_HEAD` flag, so the
result was somewhat inconsistent.
Suggested-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/rebase.c | 14 ++++++++++----
reset.c | 9 +++++++--
reset.h | 9 ++++++---
sequencer.c | 4 +++-
4 files changed, 26 insertions(+), 10 deletions(-)
diff --git a/builtin/rebase.c b/builtin/rebase.c
index 06dcbaf5e8..10a306310c 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -607,7 +607,8 @@ static int move_to_original_branch(struct rebase_options *opts)
strbuf_addf(&head_reflog, "%s (finish): returning to %s",
opts->reflog_action, opts->head_name);
ropts.branch = opts->head_name;
- ropts.flags = RESET_WORKING_TREE_REFS_ONLY;
+ ropts.flags = RESET_WORKING_TREE_REFS_ONLY |
+ RESET_WORKING_TREE_UPDATE_HEAD;
ropts.branch_msg = branch_reflog.buf;
ropts.head_msg = head_reflog.buf;
ret = reset_working_tree(the_repository, &ropts);
@@ -693,6 +694,7 @@ static int run_am(struct rebase_options *opts)
ropts.oid = &opts->orig_head->object.oid;
ropts.branch = opts->head_name;
ropts.default_reflog_action = opts->reflog_action;
+ ropts.flags = RESET_WORKING_TREE_UPDATE_HEAD;
reset_working_tree(the_repository, &ropts);
error(_("\ngit encountered an error while preparing the "
"patches to replay\n"
@@ -862,7 +864,8 @@ static int checkout_up_to_date(struct rebase_options *options)
options->reflog_action, options->switch_to);
ropts.oid = &options->orig_head->object.oid;
ropts.branch = options->head_name;
- ropts.flags = RESET_WORKING_TREE_RUN_POST_CHECKOUT_HOOK;
+ ropts.flags = RESET_WORKING_TREE_RUN_POST_CHECKOUT_HOOK |
+ RESET_WORKING_TREE_UPDATE_HEAD;
if (!ropts.branch)
ropts.flags |= RESET_WORKING_TREE_DETACH;
ropts.head_msg = buf.buf;
@@ -1384,7 +1387,8 @@ int cmd_rebase(int argc,
rerere_clear(the_repository, &merge_rr);
string_list_clear(&merge_rr, 1);
- ropts.flags = RESET_WORKING_TREE_HARD;
+ ropts.flags = RESET_WORKING_TREE_HARD |
+ RESET_WORKING_TREE_UPDATE_HEAD;
if (reset_working_tree(the_repository, &ropts) < 0)
die(_("could not discard worktree changes"));
remove_branch_state(the_repository, 0);
@@ -1409,7 +1413,8 @@ int cmd_rebase(int argc,
ropts.oid = &options.orig_head->object.oid;
ropts.head_msg = head_msg.buf;
ropts.branch = options.head_name;
- ropts.flags = RESET_WORKING_TREE_HARD;
+ ropts.flags = RESET_WORKING_TREE_HARD |
+ RESET_WORKING_TREE_UPDATE_HEAD;
if (reset_working_tree(the_repository, &ropts) < 0)
die(_("could not move back to %s"),
oid_to_hex(&options.orig_head->object.oid));
@@ -1877,6 +1882,7 @@ int cmd_rebase(int argc,
ropts.oid = &options.onto->object.oid;
ropts.orig_head = &options.orig_head->object.oid;
ropts.flags = RESET_WORKING_TREE_DETACH |
+ RESET_WORKING_TREE_UPDATE_HEAD |
RESET_WORKING_TREE_UPDATE_ORIG_HEAD |
RESET_WORKING_TREE_RUN_POST_CHECKOUT_HOOK;
ropts.head_msg = msg.buf;
diff --git a/reset.c b/reset.c
index 99f2c1b012..4bde5d8dc6 100644
--- a/reset.c
+++ b/reset.c
@@ -92,6 +92,7 @@ int reset_working_tree(struct repository *r,
const char *switch_to_branch = opts->branch;
unsigned reset_hard = opts->flags & RESET_WORKING_TREE_HARD;
unsigned refs_only = opts->flags & RESET_WORKING_TREE_REFS_ONLY;
+ unsigned update_head = opts->flags & RESET_WORKING_TREE_UPDATE_HEAD;
unsigned update_orig_head = opts->flags & RESET_WORKING_TREE_UPDATE_ORIG_HEAD;
unsigned dry_run = opts->flags & RESET_WORKING_TREE_DRY_RUN;
struct object_id *head = NULL, head_oid;
@@ -113,6 +114,9 @@ int reset_working_tree(struct repository *r,
if (opts->branch_msg && !opts->branch)
BUG("branch reflog message given without a branch");
+ if (update_orig_head && !update_head)
+ BUG("cannot update ORIG_HEAD without updating HEAD" );
+
if (!refs_only && !dry_run && repo_hold_locked_index(r, &lock, LOCK_REPORT_ON_ERROR) < 0) {
ret = -1;
goto leave_reset_head;
@@ -129,7 +133,7 @@ int reset_working_tree(struct repository *r,
oid = &head_oid;
if (refs_only) {
- if (!dry_run)
+ if (!dry_run && update_head)
return update_refs(r, opts, oid, head);
return 0;
}
@@ -197,7 +201,8 @@ int reset_working_tree(struct repository *r,
goto leave_reset_head;
}
- if (oid != &head_oid || update_orig_head || switch_to_branch)
+ if (update_head &&
+ (oid != &head_oid || update_orig_head || switch_to_branch))
ret = update_refs(r, opts, oid, head);
leave_reset_head:
diff --git a/reset.h b/reset.h
index 898e4a1e95..38b2891b53 100644
--- a/reset.h
+++ b/reset.h
@@ -19,14 +19,17 @@ enum reset_working_tree_flags {
/* Only update refs, do not touch the worktree */
RESET_WORKING_TREE_REFS_ONLY = (1 << 3),
- /* Update ORIG_HEAD as well as HEAD */
- RESET_WORKING_TREE_UPDATE_ORIG_HEAD = (1 << 4),
+ /* Update HEAD */
+ RESET_WORKING_TREE_UPDATE_HEAD = (1 << 4),
+
+ /* Update ORIG_HEAD */
+ RESET_WORKING_TREE_UPDATE_ORIG_HEAD = (1 << 5),
/*
* Perform a dry-run by performing the operation without updating
* any user-visible state.
*/
- RESET_WORKING_TREE_DRY_RUN = (1 << 5),
+ RESET_WORKING_TREE_DRY_RUN = (1 << 6),
};
struct reset_working_tree_options {
diff --git a/sequencer.c b/sequencer.c
index 4efe831178..e905b1b2d9 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -4678,7 +4678,8 @@ static void create_autostash_internal(struct repository *r,
has_uncommitted_changes(r, 1)) {
struct child_process stash = CHILD_PROCESS_INIT;
struct reset_working_tree_options ropts = {
- .flags = RESET_WORKING_TREE_HARD,
+ .flags = RESET_WORKING_TREE_HARD |
+ RESET_WORKING_TREE_UPDATE_HEAD,
};
struct object_id oid;
@@ -4873,6 +4874,7 @@ static int checkout_onto(struct repository *r, struct replay_opts *opts,
.oid = onto,
.orig_head = orig_head,
.flags = RESET_WORKING_TREE_DETACH |
+ RESET_WORKING_TREE_UPDATE_HEAD |
RESET_WORKING_TREE_UPDATE_ORIG_HEAD |
RESET_WORKING_TREE_RUN_POST_CHECKOUT_HOOK,
.head_msg = reflog_message(opts, "start", "checkout %s",
--
2.55.0.rc2.803.g1fd1e6609c.dirty
^ permalink raw reply related
* [PATCH v7 07/11] reset: allow the caller to specify the current HEAD object
From: Patrick Steinhardt @ 2026-06-29 7:34 UTC (permalink / raw)
To: git
Cc: Pablo Sabater, Junio C Hamano, Kristoffer Haugsbakk, Phillip Wood,
Christian Couder
In-Reply-To: <20260629-b4-pks-history-drop-v7-0-6e9392a957d8@pks.im>
When calling `reset_working_tree()` we automatically derive the commit
that the callers wants to move from by reading the HEAD commit. Some
callers may already have resolved it, or they may want to move from a
different commit that doesn't match HEAD.
Introduce a new `oid_from` option that lets the caller specify the
commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
reset.c | 5 ++++-
reset.h | 5 +++++
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/reset.c b/reset.c
index 4bde5d8dc6..06f375f296 100644
--- a/reset.c
+++ b/reset.c
@@ -122,7 +122,10 @@ int reset_working_tree(struct repository *r,
goto leave_reset_head;
}
- if (!repo_get_oid(r, "HEAD", &head_oid)) {
+ if (opts->oid_from) {
+ oidcpy(&head_oid, opts->oid_from);
+ head = &head_oid;
+ } else if (!repo_get_oid(r, "HEAD", &head_oid)) {
head = &head_oid;
} else if (!oid || !reset_hard) {
ret = error(_("could not determine HEAD revision"));
diff --git a/reset.h b/reset.h
index 38b2891b53..4c992ba671 100644
--- a/reset.h
+++ b/reset.h
@@ -37,6 +37,11 @@ struct reset_working_tree_options {
* The commit to checkout/reset to. Defaults to HEAD.
*/
const struct object_id *oid;
+ /*
+ * The commit to checkout/reset from when doing a two-way merge. This
+ * is used as one of the sides to merge.
+ */
+ const struct object_id *oid_from;
/*
* Optional value to set ORIG_HEAD. Defaults to HEAD.
*/
--
2.55.0.rc2.803.g1fd1e6609c.dirty
^ permalink raw reply related
* [PATCH v7 08/11] reset: stop assuming that the caller passes in a clean index
From: Patrick Steinhardt @ 2026-06-29 7:34 UTC (permalink / raw)
To: git
Cc: Pablo Sabater, Junio C Hamano, Kristoffer Haugsbakk, Phillip Wood,
Christian Couder
In-Reply-To: <20260629-b4-pks-history-drop-v7-0-6e9392a957d8@pks.im>
In 652bd0211d (rebase: use 'skip_cache_tree_update' option, 2022-11-10),
we updated `reset_working_tree()` to stop updating the index tree cache.
This was done as a performance optimization: the function is only called
by "sequencer.c" and "rebase.c", both of which assume a clean index
before they perform their operation, so we know that the end result will
be a clean index, too. Consequently, we can skip recomputing the cache
as we can instead use `prime_cache_tree()` directly.
In a subsequent commit we're about to add a new caller though where the
assumption doesn't hold anymore: the index may be dirty before calling
`reset_working_tree()`, and consequently we cannot prime the cache with
a given tree anymore as the index and tree will mismatch.
Adapt the logic so that we only skip the cache tree update in case we're
doing a hard reset. While we could introduce logic that only skips the
update in case the incoming index was dirty already, that doesn't really
feel worth it: after all, the mentioned commit says itself that the
performance improvement was negligible anyway.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
reset.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/reset.c b/reset.c
index 06f375f296..ff87e3e357 100644
--- a/reset.c
+++ b/reset.c
@@ -167,10 +167,11 @@ int reset_working_tree(struct repository *r,
unpack_tree_opts.dry_run = dry_run;
unpack_tree_opts.merge = 1;
unpack_tree_opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
- unpack_tree_opts.skip_cache_tree_update = 1;
init_checkout_metadata(&unpack_tree_opts.meta, switch_to_branch, oid, NULL);
- if (reset_hard)
+ if (reset_hard) {
+ unpack_tree_opts.skip_cache_tree_update = 1;
unpack_tree_opts.reset = UNPACK_RESET_PROTECT_UNTRACKED;
+ }
if (!reset_hard && !fill_tree_descriptor(r, &desc[nr++], &head_oid)) {
ret = error(_("failed to find tree of %s"),
@@ -197,7 +198,8 @@ int reset_working_tree(struct repository *r,
goto leave_reset_head;
}
- prime_cache_tree(r, r->index, tree);
+ if (reset_hard)
+ prime_cache_tree(r, r->index, tree);
if (write_locked_index(r->index, &lock, COMMIT_LOCK) < 0) {
ret = error(_("could not write index"));
--
2.55.0.rc2.803.g1fd1e6609c.dirty
^ permalink raw reply related
* [PATCH v7 09/11] replay: expose `replay_result_queue_update()`
From: Patrick Steinhardt @ 2026-06-29 7:34 UTC (permalink / raw)
To: git
Cc: Pablo Sabater, Junio C Hamano, Kristoffer Haugsbakk, Phillip Wood,
Christian Couder
In-Reply-To: <20260629-b4-pks-history-drop-v7-0-6e9392a957d8@pks.im>
Expose `replay_result_queue_update()`, which is used to append another
reference update to the replay result. This function will be used in a
subsequent commit.
Suggested-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
replay.c | 8 ++++----
replay.h | 5 +++++
2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/replay.c b/replay.c
index 4ef8abb607..7c8433107b 100644
--- a/replay.c
+++ b/replay.c
@@ -351,10 +351,10 @@ void replay_result_release(struct replay_result *result)
free(result->updates);
}
-static void replay_result_queue_update(struct replay_result *result,
- const char *refname,
- const struct object_id *old_oid,
- const struct object_id *new_oid)
+void replay_result_queue_update(struct replay_result *result,
+ const char *refname,
+ const struct object_id *old_oid,
+ const struct object_id *new_oid)
{
ALLOC_GROW(result->updates, result->updates_nr + 1, result->updates_alloc);
result->updates[result->updates_nr].refname = xstrdup(refname);
diff --git a/replay.h b/replay.h
index 1851a07705..da83b65345 100644
--- a/replay.h
+++ b/replay.h
@@ -80,6 +80,11 @@ struct replay_result {
void replay_result_release(struct replay_result *result);
+void replay_result_queue_update(struct replay_result *result,
+ const char *refname,
+ const struct object_id *old_oid,
+ const struct object_id *new_oid);
+
/*
* Replay a set of commits onto a new location. Leaves both the working tree,
* index and references untouched. Reference updates caused by the replay will
--
2.55.0.rc2.803.g1fd1e6609c.dirty
^ permalink raw reply related
* [PATCH v7 10/11] builtin/history: split handling of ref updates into two phases
From: Patrick Steinhardt @ 2026-06-29 7:34 UTC (permalink / raw)
To: git
Cc: Pablo Sabater, Junio C Hamano, Kristoffer Haugsbakk, Phillip Wood,
Christian Couder
In-Reply-To: <20260629-b4-pks-history-drop-v7-0-6e9392a957d8@pks.im>
The function `handle_reference_updates()` is used by git-history(1) to
update all references that refer to commits that have been rewritten. As
such, it performs two steps:
- It gathers the references that need to be updated in the first
place.
- It prepares and commits the reference transaction.
In a subsequent commit we'll want to handle those two steps separately.
Prepare for this by splitting up the function into two.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/history.c | 100 +++++++++++++++++++++++++++++++++---------------------
1 file changed, 62 insertions(+), 38 deletions(-)
diff --git a/builtin/history.c b/builtin/history.c
index 0fc06fb204..22b9fcb4a4 100644
--- a/builtin/history.c
+++ b/builtin/history.c
@@ -333,21 +333,17 @@ static int handle_ref_update(struct ref_transaction *transaction,
NULL, NULL, 0, reflog_msg, err);
}
-static int handle_reference_updates(struct rev_info *revs,
- enum ref_action action,
- struct commit *original,
- struct commit *rewritten,
- const char *reflog_msg,
- int dry_run,
- enum replay_empty_commit_action empty)
+static int compute_pending_ref_updates(struct rev_info *revs,
+ enum ref_action action,
+ struct commit *original,
+ struct commit *rewritten,
+ enum replay_empty_commit_action empty,
+ struct replay_result *result)
{
const struct name_decoration *decoration;
struct replay_revisions_options opts = {
.empty = empty,
};
- struct replay_result result = { 0 };
- struct ref_transaction *transaction = NULL;
- struct strbuf err = STRBUF_INIT;
char hex[GIT_MAX_HEXSZ + 1];
bool detached_head;
int head_flags = 0;
@@ -359,34 +355,13 @@ static int handle_reference_updates(struct rev_info *revs,
opts.onto = oid_to_hex_r(hex, &rewritten->object.oid);
- ret = replay_revisions(revs, &opts, &result);
+ ret = replay_revisions(revs, &opts, result);
if (ret)
- goto out;
+ return ret;
if (action != REF_ACTION_BRANCHES && action != REF_ACTION_HEAD)
BUG("unsupported ref action %d", action);
- if (!dry_run) {
- transaction = ref_store_transaction_begin(get_main_ref_store(revs->repo), 0, &err);
- if (!transaction) {
- ret = error(_("failed to begin ref transaction: %s"), err.buf);
- goto out;
- }
- }
-
- for (size_t i = 0; i < result.updates_nr; i++) {
- ret = handle_ref_update(transaction,
- result.updates[i].refname,
- &result.updates[i].new_oid,
- &result.updates[i].old_oid,
- reflog_msg, &err);
- if (ret) {
- ret = error(_("failed to update ref '%s': %s"),
- result.updates[i].refname, err.buf);
- goto out;
- }
- }
-
/*
* `replay_revisions()` only updates references that are
* ancestors of `rewritten`, so we need to manually
@@ -414,14 +389,41 @@ static int handle_reference_updates(struct rev_info *revs,
!detached_head)
continue;
+ replay_result_queue_update(result, decoration->name,
+ &original->object.oid,
+ &rewritten->object.oid);
+ }
+
+ return 0;
+}
+
+static int apply_pending_ref_updates(struct repository *repo,
+ const struct replay_result *result,
+ const char *reflog_msg,
+ int dry_run)
+{
+ struct ref_transaction *transaction = NULL;
+ struct strbuf err = STRBUF_INIT;
+ int ret;
+
+ if (!dry_run) {
+ transaction = ref_store_transaction_begin(get_main_ref_store(repo),
+ 0, &err);
+ if (!transaction) {
+ ret = error(_("failed to begin ref transaction: %s"), err.buf);
+ goto out;
+ }
+ }
+
+ for (size_t i = 0; i < result->updates_nr; i++) {
ret = handle_ref_update(transaction,
- decoration->name,
- &rewritten->object.oid,
- &original->object.oid,
+ result->updates[i].refname,
+ &result->updates[i].new_oid,
+ &result->updates[i].old_oid,
reflog_msg, &err);
if (ret) {
ret = error(_("failed to update ref '%s': %s"),
- decoration->name, err.buf);
+ result->updates[i].refname, err.buf);
goto out;
}
}
@@ -435,11 +437,33 @@ static int handle_reference_updates(struct rev_info *revs,
out:
ref_transaction_free(transaction);
- replay_result_release(&result);
strbuf_release(&err);
return ret;
}
+static int handle_reference_updates(struct rev_info *revs,
+ enum ref_action action,
+ struct commit *original,
+ struct commit *rewritten,
+ const char *reflog_msg,
+ int dry_run,
+ enum replay_empty_commit_action empty)
+{
+ struct replay_result result = { 0 };
+ int ret;
+
+ ret = compute_pending_ref_updates(revs, action, original, rewritten,
+ empty, &result);
+ if (ret)
+ goto out;
+
+ ret = apply_pending_ref_updates(revs->repo, &result, reflog_msg, dry_run);
+
+out:
+ replay_result_release(&result);
+ return ret;
+}
+
static int commit_became_empty(struct repository *repo,
struct commit *original,
struct tree *result)
--
2.55.0.rc2.803.g1fd1e6609c.dirty
^ permalink raw reply related
* [PATCH v7 11/11] builtin/history: implement "drop" subcommand
From: Patrick Steinhardt @ 2026-06-29 7:34 UTC (permalink / raw)
To: git
Cc: Pablo Sabater, Junio C Hamano, Kristoffer Haugsbakk, Phillip Wood,
Christian Couder
In-Reply-To: <20260629-b4-pks-history-drop-v7-0-6e9392a957d8@pks.im>
A common operation when editing the commit history is to drop a specific
commit from the history entirely, but this operation is not currently
covered by git-history(1).
A couple of noteworthy bits:
- This is the first git-history(1) command that will ultimately result
in changes to both the index and the working tree. We thus have to
add logic to merge resulting changes into those.
- It is still not possible to replay merge commits, so this limitation
is inherited for the new "drop" command.
- For now we refuse to drop root commits. While we _can_ indeed drop
root commits in the general case, there are edge cases where the
resulting history would become completely empty. This is thus left
to a subsequent patch series.
Other than that, most of the logic is rather straight-forward as we can
continue to build on the preexisting logic in git-history(1) for most of
the part.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
Documentation/git-history.adoc | 38 ++-
builtin/history.c | 186 ++++++++++++++
t/meson.build | 1 +
t/t3454-history-drop.sh | 537 +++++++++++++++++++++++++++++++++++++++++
4 files changed, 761 insertions(+), 1 deletion(-)
diff --git a/Documentation/git-history.adoc b/Documentation/git-history.adoc
index 2ba8121795..28b477cd37 100644
--- a/Documentation/git-history.adoc
+++ b/Documentation/git-history.adoc
@@ -8,6 +8,7 @@ git-history - EXPERIMENTAL: Rewrite history
SYNOPSIS
--------
[synopsis]
+git history drop <commit> [--dry-run] [--update-refs=(branches|head)] [--empty=(drop|keep|abort)]
git history fixup <commit> [--dry-run] [--update-refs=(branches|head)] [--reedit-message] [--empty=(drop|keep|abort)]
git history reword <commit> [--dry-run] [--update-refs=(branches|head)]
git history split <commit> [--dry-run] [--update-refs=(branches|head)] [--] [<pathspec>...]
@@ -51,13 +52,28 @@ be stateful operations. The limitation can be lifted once (if) Git learns about
first-class conflicts.
When using `fixup` with `--empty=drop`, dropping the root commit is not yet
-supported.
+supported. Likewise, `drop` cannot remove the root commit or a merge commit.
COMMANDS
--------
The following commands are available to rewrite history in different ways:
+`drop <commit>`::
+ Remove the specified commit from the history. All descendants of the
+ commit are replayed directly onto its parent.
++
+The root commit cannot be dropped as that may lead to edge cases where refs
+end up with no commits anymore. Merge commits cannot be dropped either; see
+LIMITATIONS.
++
+If `HEAD` points at a commit that is to be rewritten, the index and working
+tree are updated to match the new `HEAD`. The command aborts before any
+references are updated in case local modifications would be overwritten.
++
+If replaying any descendant would result in a conflict, the command aborts
+with an error.
+
`fixup <commit>`::
Apply the currently staged changes to the specified commit. This is
similar in nature to `git commit --fixup=<commit>` followed by `git
@@ -170,6 +186,26 @@ The staged addition of `unrelated.txt` has been incorporated into the `first`
commit. All descendant commits have been replayed on top of the rewritten
history.
+Drop a commit
+~~~~~~~~~~~~~
+
+----------
+$ git log --oneline
+abc1234 (HEAD -> main) third
+def5678 second
+ghi9012 first
+
+$ git history drop 'main^{/second}'
+
+$ git log --oneline
+jkl3456 (HEAD -> main) third
+ghi9012 first
+----------
+
+The `second` commit has been removed from the history, and `third` has been
+replayed directly on top of `first`. All branches that pointed at the dropped
+commit have been moved to its parent.
+
Split a commit
~~~~~~~~~~~~~~
diff --git a/builtin/history.c b/builtin/history.c
index 22b9fcb4a4..7944207f38 100644
--- a/builtin/history.c
+++ b/builtin/history.c
@@ -17,13 +17,17 @@
#include "read-cache.h"
#include "refs.h"
#include "replay.h"
+#include "reset.h"
#include "revision.h"
#include "sequencer.h"
#include "strvec.h"
#include "tree.h"
+#include "tree-walk.h"
#include "unpack-trees.h"
#include "wt-status.h"
+#define GIT_HISTORY_DROP_USAGE \
+ N_("git history drop <commit> [--dry-run] [--update-refs=(branches|head)] [--empty=(drop|keep|abort)]")
#define GIT_HISTORY_FIXUP_USAGE \
N_("git history fixup <commit> [--dry-run] [--update-refs=(branches|head)] [--reedit-message] [--empty=(drop|keep|abort)]")
#define GIT_HISTORY_REWORD_USAGE \
@@ -999,12 +1003,193 @@ static int cmd_history_split(int argc,
return ret;
}
+static int update_worktree(struct repository *repo,
+ const struct commit *old_head,
+ const struct commit *new_head,
+ bool dry_run)
+{
+ struct reset_working_tree_options opts = {
+ .oid_from = &old_head->object.oid,
+ .oid = &new_head->object.oid,
+ };
+ if (dry_run)
+ opts.flags |= RESET_WORKING_TREE_DRY_RUN;
+ return reset_working_tree(repo, &opts);
+}
+
+static int find_head_tree_change(struct repository *repo,
+ const struct replay_result *result,
+ struct commit **old_head,
+ struct commit **new_head,
+ bool *changed)
+{
+ const struct replay_ref_update *head_update = NULL;
+ struct commit *old_head_commit, *new_head_commit;
+ struct tree *old_head_tree, *new_head_tree;
+ const char *head_target;
+ int head_flags;
+
+ *changed = false;
+
+ head_target = refs_resolve_ref_unsafe(get_main_ref_store(repo),
+ "HEAD", RESOLVE_REF_NO_RECURSE,
+ NULL, &head_flags);
+ if (!head_target)
+ return error(_("cannot look up HEAD"));
+ if (!(head_flags & REF_ISSYMREF))
+ head_target = "HEAD";
+
+ for (size_t i = 0; i < result->updates_nr; i++) {
+ if (!strcmp(result->updates[i].refname, head_target)) {
+ head_update = &result->updates[i];
+ break;
+ }
+ }
+
+ if (!head_update)
+ return 0;
+
+ old_head_commit = lookup_commit_reference(repo, &head_update->old_oid);
+ new_head_commit = lookup_commit_reference(repo, &head_update->new_oid);
+ if (!old_head_commit || !new_head_commit)
+ return error(_("cannot resolve HEAD commit"));
+
+ old_head_tree = repo_get_commit_tree(repo, old_head_commit);
+ new_head_tree = repo_get_commit_tree(repo, new_head_commit);
+ if (!old_head_tree || !new_head_tree)
+ return error(_("cannot resolve tree for HEAD"));
+
+ if (oideq(&old_head_tree->object.oid, &new_head_tree->object.oid))
+ return 0;
+
+ *old_head = old_head_commit;
+ *new_head = new_head_commit;
+ *changed = true;
+
+ return 0;
+}
+
+static int cmd_history_drop(int argc,
+ const char **argv,
+ const char *prefix,
+ struct repository *repo)
+{
+ const char * const usage[] = {
+ GIT_HISTORY_DROP_USAGE,
+ NULL,
+ };
+ enum replay_empty_commit_action empty = REPLAY_EMPTY_COMMIT_DROP;
+ enum ref_action action = REF_ACTION_DEFAULT;
+ int dry_run = 0;
+ struct option options[] = {
+ OPT_CALLBACK_F(0, "update-refs", &action, "(branches|head)",
+ N_("control which refs should be updated"),
+ PARSE_OPT_NONEG, parse_ref_action),
+ OPT_BOOL('n', "dry-run", &dry_run,
+ N_("perform a dry-run without updating any refs")),
+ OPT_CALLBACK_F(0, "empty", &empty, "(drop|keep|abort)",
+ N_("how to handle descendants that become empty"),
+ PARSE_OPT_NONEG, parse_opt_empty),
+ OPT_END(),
+ };
+ struct strbuf reflog_msg = STRBUF_INIT;
+ struct commit *original, *rewritten;
+ struct rev_info revs = { 0 };
+ struct replay_result result = { 0 };
+ struct commit *old_head, *new_head;
+ bool head_moves = false;
+ int ret;
+
+ argc = parse_options(argc, argv, prefix, options, usage, 0);
+ if (argc != 1) {
+ ret = error(_("command expects a single revision"));
+ goto out;
+ }
+ repo_config(repo, git_default_config, NULL);
+
+ if (action == REF_ACTION_DEFAULT)
+ action = REF_ACTION_BRANCHES;
+
+ original = lookup_commit_reference_by_name(argv[0]);
+ if (!original) {
+ ret = error(_("commit cannot be found: %s"), argv[0]);
+ goto out;
+ }
+
+ if (!original->parents) {
+ ret = error(_("cannot drop root commit %s: "
+ "it has no parent to replay onto"),
+ argv[0]);
+ goto out;
+ } else if (original->parents->next) {
+ ret = error(_("cannot drop merge commit: %s"), argv[0]);
+ goto out;
+ }
+
+ ret = setup_revwalk(repo, action, original, &revs);
+ if (ret)
+ goto out;
+
+ rewritten = original->parents->item;
+
+ ret = compute_pending_ref_updates(&revs, action, original, rewritten,
+ empty, &result);
+ if (ret) {
+ ret = error(_("failed replaying descendants"));
+ goto out;
+ }
+
+ /*
+ * If HEAD will move as a result of the rewrite then we'll have to
+ * merge in the changes into the worktree and index. This merge can of
+ * course conflict, which will cause the whole operation to abort.
+ *
+ * If we had already updated the refs at that point then we'd have an
+ * inconsistent repository state. So we first perform a dry-run merge
+ * here before updating refs.
+ */
+ if (!is_bare_repository()) {
+ ret = find_head_tree_change(repo, &result, &old_head,
+ &new_head, &head_moves);
+ if (ret < 0)
+ goto out;
+
+ if (head_moves && update_worktree(repo, old_head, new_head, true) < 0) {
+ ret = error(_("dropping this commit would "
+ "overwrite local changes; aborting"));
+ goto out;
+ }
+ }
+
+ strbuf_addf(&reflog_msg, "drop: dropping %s", argv[0]);
+ ret = apply_pending_ref_updates(repo, &result, reflog_msg.buf, dry_run);
+ if (ret < 0) {
+ ret = error(_("failed to update references"));
+ goto out;
+ }
+
+ if (!dry_run && head_moves && update_worktree(repo, old_head, new_head, false) < 0) {
+ ret = error(_("could not update working tree to new commit %s"),
+ oid_to_hex(&new_head->object.oid));
+ goto out;
+ }
+
+ ret = 0;
+
+out:
+ replay_result_release(&result);
+ strbuf_release(&reflog_msg);
+ release_revisions(&revs);
+ return ret;
+}
+
int cmd_history(int argc,
const char **argv,
const char *prefix,
struct repository *repo)
{
const char * const usage[] = {
+ GIT_HISTORY_DROP_USAGE,
GIT_HISTORY_FIXUP_USAGE,
GIT_HISTORY_REWORD_USAGE,
GIT_HISTORY_SPLIT_USAGE,
@@ -1012,6 +1197,7 @@ int cmd_history(int argc,
};
parse_opt_subcommand_fn *fn = NULL;
struct option options[] = {
+ OPT_SUBCOMMAND("drop", &fn, cmd_history_drop),
OPT_SUBCOMMAND("fixup", &fn, cmd_history_fixup),
OPT_SUBCOMMAND("reword", &fn, cmd_history_reword),
OPT_SUBCOMMAND("split", &fn, cmd_history_split),
diff --git a/t/meson.build b/t/meson.build
index 2af8d01279..d5e71056b2 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -399,6 +399,7 @@ integration_tests = [
't3451-history-reword.sh',
't3452-history-split.sh',
't3453-history-fixup.sh',
+ 't3454-history-drop.sh',
't3500-cherry.sh',
't3501-revert-cherry-pick.sh',
't3502-cherry-pick-merge.sh',
diff --git a/t/t3454-history-drop.sh b/t/t3454-history-drop.sh
new file mode 100755
index 0000000000..0f33247212
--- /dev/null
+++ b/t/t3454-history-drop.sh
@@ -0,0 +1,537 @@
+#!/bin/sh
+
+test_description='tests for git-history drop subcommand'
+
+. ./test-lib.sh
+. "$TEST_DIRECTORY/lib-log-graph.sh"
+
+expect_graph () {
+ cat >expect &&
+ lib_test_cmp_graph --format=%s "$@"
+}
+
+expect_log () {
+ git log --format="%s" "$@" >actual &&
+ cat >expect &&
+ test_cmp expect actual
+}
+
+test_expect_success 'errors on missing commit argument' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit initial &&
+ test_must_fail git history drop 2>err &&
+ test_grep "command expects a single revision" err
+ )
+'
+
+test_expect_success 'errors on too many arguments' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit initial &&
+ test_must_fail git history drop HEAD HEAD 2>err &&
+ test_grep "command expects a single revision" err
+ )
+'
+
+test_expect_success 'errors on unknown revision' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit initial &&
+ test_must_fail git history drop does-not-exist 2>err &&
+ test_grep "commit cannot be found: does-not-exist" err
+ )
+'
+
+test_expect_success 'errors with invalid --empty= value' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit initial &&
+ test_commit second &&
+ test_must_fail git history drop --empty=bogus HEAD 2>err &&
+ test_grep "unrecognized.*--empty.*bogus" err
+ )
+'
+
+test_expect_success 'drops a commit in the middle and replays descendants' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit first &&
+ test_commit second &&
+ test_commit third &&
+
+ git symbolic-ref HEAD >expect &&
+ git history drop HEAD~ &&
+ git symbolic-ref HEAD >actual &&
+ test_cmp expect actual &&
+
+ expect_log <<-\EOF &&
+ third
+ first
+ EOF
+
+ test_must_fail git show HEAD:second.t &&
+ test_path_is_missing second.t &&
+
+ git reflog >reflog &&
+ test_grep "drop: dropping HEAD~" reflog
+ )
+'
+
+test_expect_success 'drops the HEAD commit' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit first &&
+ test_commit second &&
+
+ git history drop HEAD &&
+
+ expect_log <<-\EOF
+ first
+ EOF
+ )
+'
+
+test_expect_success 'drops a commit on detached HEAD' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit first &&
+ test_commit second &&
+ test_commit third &&
+ git checkout --detach HEAD &&
+
+ git history drop HEAD~ &&
+
+ expect_log <<-\EOF
+ third
+ first
+ EOF
+ )
+'
+
+# Note: in this case it would actually be fine to drop the root commit, as we
+# do have a descendant commit, and no reference points to the root commit
+# directly. So this is something that we may relax eventually.
+test_expect_success 'refuses to drop the root commit' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit first &&
+ test_commit second &&
+
+ test_must_fail git history drop HEAD~ 2>err &&
+ test_grep "cannot drop root commit" err
+ )
+'
+
+# In contrast to the above case, we actually don't want to drop the root commit
+# here as that would cause us to end up with an empty commit graph.
+test_expect_success 'refuses to drop the root commit when branch becomes empty' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit first &&
+
+ test_must_fail git history drop HEAD 2>err &&
+ test_grep "cannot drop root commit" err
+ )
+'
+
+test_expect_success 'refuses to drop a merge commit' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit base &&
+ git branch branch &&
+ test_commit ours &&
+ git switch branch &&
+ test_commit theirs &&
+ git switch - &&
+ git merge theirs &&
+
+ test_must_fail git history drop HEAD 2>err &&
+ test_grep "cannot drop merge commit" err
+ )
+'
+
+test_expect_success 'refuses when descendants contain a merge commit' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit base &&
+ test_commit middle &&
+ git branch branch &&
+ test_commit ours &&
+ git switch branch &&
+ test_commit theirs &&
+ git switch - &&
+ git merge theirs &&
+
+ test_must_fail git history drop middle 2>err &&
+ test_grep "replaying merge commits is not supported yet" err
+ )
+'
+
+test_expect_success 'works in a bare repository' '
+ test_when_finished "rm -rf repo repo.git" &&
+
+ git init repo &&
+ test_commit -C repo first &&
+ test_commit -C repo second &&
+ test_commit -C repo third &&
+
+ git clone --bare repo repo.git &&
+ (
+ cd repo.git &&
+
+ git history drop HEAD~ &&
+ expect_log <<-\EOF
+ third
+ first
+ EOF
+ )
+'
+
+test_expect_success 'updates branches on other lines of descent' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit base &&
+ test_commit target &&
+ git branch theirs &&
+ test_commit ours &&
+ git switch theirs &&
+ test_commit theirs &&
+
+ expect_graph --branches <<-\EOF &&
+ * theirs
+ | * ours
+ |/
+ * target
+ * base
+ EOF
+
+ git history drop target &&
+
+ expect_graph --branches <<-\EOF
+ * ours
+ | * theirs
+ |/
+ * base
+ EOF
+ )
+'
+
+test_expect_success 'moves branch pointing at dropped commit to its parent' '
+ test_when_finished "rm -rf repo" &&
+ git init repo --initial-branch=main &&
+ (
+ cd repo &&
+ test_commit first &&
+ test_commit second &&
+ git branch points-at-second &&
+ test_commit third &&
+
+ git rev-parse first >expect &&
+ git history drop second &&
+ git rev-parse points-at-second >actual &&
+ test_cmp expect actual &&
+
+ expect_log --format="%s %D" --branches <<-\EOF
+ third HEAD -> main
+ first tag: first, points-at-second
+ EOF
+ )
+'
+
+test_expect_success '--dry-run prints ref updates without modifying repo' '
+ test_when_finished "rm -rf repo" &&
+ git init repo --initial-branch=main &&
+ (
+ cd repo &&
+ test_commit base &&
+ git branch branch &&
+ test_commit middle &&
+ test_commit ours &&
+ git switch branch &&
+ test_commit theirs &&
+
+ git refs list >refs-expect &&
+ git history drop --dry-run main~ >updates &&
+ git refs list >refs-actual &&
+ test_cmp refs-expect refs-actual &&
+ test_grep "update refs/heads/main" updates &&
+
+ git update-ref --stdin <updates &&
+ expect_log main <<-\EOF
+ ours
+ base
+ EOF
+ )
+'
+
+test_expect_success '--dry-run detects conflicts with modified working tree' '
+ test_when_finished "rm -rf repo" &&
+ git init repo --initial-branch=main &&
+ (
+ cd repo &&
+ test_commit first &&
+ test_commit second modify-me &&
+ echo modified >modify-me &&
+
+ git refs list >refs-expect &&
+ git diff >diff-expect &&
+ test_must_fail git history drop --dry-run HEAD 2>err &&
+ test_grep "dropping this commit would overwrite local changes" err &&
+ git diff >diff-actual &&
+ git refs list >refs-actual &&
+
+ test_cmp diff-expect diff-actual &&
+ test_cmp refs-expect refs-actual
+ )
+'
+
+test_expect_success '--update-refs=head updates only HEAD' '
+ test_when_finished "rm -rf repo" &&
+ git init repo --initial-branch=main &&
+ (
+ cd repo &&
+ test_commit base &&
+ test_commit target &&
+ git branch theirs &&
+ test_commit ours &&
+ git switch theirs &&
+ test_commit theirs &&
+
+ # When told to update HEAD only, the command refuses to
+ # rewrite commits that are not an ancestor of HEAD.
+ test_must_fail git history drop --update-refs=head main 2>err &&
+ test_grep "rewritten commit must be an ancestor of HEAD" err &&
+
+ expect_graph --branches <<-\EOF &&
+ * theirs
+ | * ours
+ |/
+ * target
+ * base
+ EOF
+
+ git switch main &&
+ git history drop --update-refs=head target &&
+
+ expect_graph --branches <<-\EOF
+ * ours
+ | * theirs
+ | * target
+ |/
+ * base
+ EOF
+ )
+'
+
+test_expect_success 'conflict with replayed commit aborts cleanly' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit base &&
+ test_commit conflict-a file &&
+ test_commit conflict-b file &&
+
+ git refs list >refs-expect &&
+ test_must_fail git history drop HEAD~ 2>err &&
+ test_grep "failed replaying descendants" err &&
+ git refs list >refs-actual &&
+ test_cmp refs-expect refs-actual
+ )
+'
+
+# Build a history where a descendant of the drop target reverts the change
+# introduced by the drop target. After dropping, the descendant's diff applies
+# against a tree that already lacks the change, so it becomes empty.
+setup_empty_descendant_repo () {
+ git init "$1" &&
+ (
+ cd "$1" &&
+ echo C1 >file &&
+ git add file &&
+ git commit -m "base" &&
+ git tag base &&
+ echo C2 >file &&
+ git add file &&
+ git commit -m "drop-me" &&
+ git tag drop-me &&
+ test_commit middle &&
+ echo C1 >file &&
+ git add file &&
+ git commit -m "revert-drop-me" &&
+ git tag revert-drop-me
+ )
+}
+
+test_expect_success '--empty=drop drops descendants that become empty' '
+ test_when_finished "rm -rf repo" &&
+ setup_empty_descendant_repo repo &&
+ (
+ cd repo &&
+
+ git history drop --empty=drop drop-me &&
+
+ expect_log <<-\EOF
+ middle
+ base
+ EOF
+ )
+'
+
+test_expect_success '--empty=keep keeps descendants that become empty' '
+ test_when_finished "rm -rf repo" &&
+ setup_empty_descendant_repo repo &&
+ (
+ cd repo &&
+
+ git history drop --empty=keep drop-me &&
+
+ expect_log <<-\EOF &&
+ revert-drop-me
+ middle
+ base
+ EOF
+ git diff HEAD~ HEAD >diff &&
+ test_must_be_empty diff
+ )
+'
+
+test_expect_success '--empty=abort errors out when a descendant becomes empty' '
+ test_when_finished "rm -rf repo" &&
+ setup_empty_descendant_repo repo &&
+ (
+ cd repo &&
+
+ test_must_fail git history drop --empty=abort drop-me 2>err &&
+ test_grep "became empty after replay" err
+ )
+'
+
+test_expect_success 'updates index and worktree when HEAD moves' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit first &&
+ test_commit second &&
+ test_commit third &&
+
+ git history drop second &&
+
+ # Worktree should no longer contain second.t.
+ test_path_is_missing second.t &&
+ test_path_is_file first.t &&
+ test_path_is_file third.t &&
+
+ # Index and worktree should both match the new HEAD.
+ git status --porcelain --untracked-files=no >status &&
+ test_must_be_empty status
+ )
+'
+
+test_expect_success 'updates worktree when dropping HEAD itself' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit first &&
+ test_commit second &&
+
+ git history drop HEAD &&
+
+ test_path_is_missing second.t &&
+ test_path_is_file first.t &&
+
+ git status --porcelain --untracked-files=no >status &&
+ test_must_be_empty status
+ )
+'
+
+test_expect_success 'preserves unrelated unstaged modifications' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit first &&
+ echo first-content >unrelated.txt &&
+ git add unrelated.txt &&
+ git commit -m "add unrelated" &&
+ test_commit second &&
+ test_commit third &&
+
+ echo locally-modified >unrelated.txt &&
+
+ git diff >diff-expect &&
+ git history drop second &&
+ git diff >diff-actual &&
+ test_cmp diff-expect diff-actual &&
+ test_path_is_missing second.t
+ )
+'
+
+test_expect_success 'preserves unrelated staged changes' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit first &&
+ echo first-content >unrelated.txt &&
+ git add unrelated.txt &&
+ git commit -m "add unrelated" &&
+ test_commit second &&
+ test_commit third &&
+
+ echo staged-change >unrelated.txt &&
+ git add unrelated.txt &&
+
+ git diff --cached >diff-expect &&
+ git history drop second &&
+ git diff --cached >diff-actual &&
+ test_cmp diff-expect diff-actual &&
+ test_path_is_missing second.t
+ )
+'
+
+test_expect_success 'aborts when local modifications would be overwritten' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ test_commit base &&
+ test_commit conflict &&
+
+ echo local-edit >conflict.t &&
+ git diff >diff-expect &&
+ test_must_fail git history drop HEAD 2>err &&
+ test_grep "would overwrite local changes" err &&
+ git diff >diff-actual &&
+ test_cmp diff-expect diff-actual
+ )
+'
+
+test_done
--
2.55.0.rc2.803.g1fd1e6609c.dirty
^ permalink raw reply related
* Re: [PATCH v4 3/3] replay: offer an option to linearize the commit topology
From: Patrick Steinhardt @ 2026-06-29 8:04 UTC (permalink / raw)
To: Toon Claes; +Cc: git, Elijah Newren, Johannes Schindelin
In-Reply-To: <87qzltyiao.fsf@emacs.iotcl.com>
On Fri, Jun 26, 2026 at 07:36:31AM +0200, Toon Claes wrote:
> Patrick Steinhardt <ps@pks.im> writes:
>
> > git-rebase(1) essentially knows about three different modes:
> >
> > - "--no-rebase-merges", which is the default and maps to your
> > "--linearize".
> >
> > - "--rebase-merges", which by default doesn't rebase cousins by using
> > "--ancestry-path" internally.
> >
> > - "--rebase-merges=rebase-cousins", which doesn't pass the above
> > option.
> >
> > So it's not a simple boolean there, which makes me wonder whether we
> > should mirror the same interface so that all of git-rebase(1)'s modes
> > can be represented, as well.
>
> That's a valid question, although I don't know a good answer to that.
>
> Basically you're asking for what the command line options will look
> like? Allow me to think out loud.
>
> In this series I'm adding --linearize to git-replay(1). As mentioned, I
> don't think it makes sense to add it to git-history(1) as well. Without
> this option, the process aborts when it encounters a merge.
>
> Dscho sent a patch series to properly replay (2-way) merges. I think
> this should become the default for both git-replay(1) and
> git-history(1).
>
> But then, do we want to have an option that brings back the current
> behavior of aborting at merges? Maybe with --no-merges?
I think that would be a sensible option to have.
> Then there's the option of rebasing cousins left. That's something that
> isn't covered by Dscho's series yet. Maybe --replay-cousins?
>
> To reiterate what the final design could look like:
>
> * <nothing>: replay merges preserving topology.
> * "--linearize": flattens merges (only git-replay(1)).
> * "--no-merges": dies when the process tries to replay a merge.
> * "--replay-cousins": does what --rebase-merges=rebase-cousins does.
Right. And if we tried to be consistent with git-rebase(1), then this
could be done as:
- "--rebase-merges" to replay merges preserving topology, which is the
default once we support replaying them.
- "--no-rebase-merges" to flatten commits.
- "--rebase-merges=abort" to explicitly die when seeing merges.
- "--rebase-merges=rebase-cousins"
> Now, all these options are (I think) mutually exclusive, so we could
> consider an option "--replay-merges=<mode>", but personally I find
> "--<option>=<value>" arguments harder to use than specifying separate
> options.
>
> I think I'm avoiding your question, because the design of the command
> line parameters doesn't need tot 1-on-1 correlate to the internal
> datastructure. And I agree the mode isn't a boolean, but does that mean
> we want to use an enum internally? Well, I don't know. And I also don't
> think that matters right now. Code is easy to change, I think the
> command line options should be designed with the future in mind, which I
> believe we do with "--linearize".
>
> Sorry for this long-winded rambling, but bottom line I think it's fine
> to add --linearize and in the future add more options and see how the
> code should evolve to support those.
Hm, I dunno. You basically reasoned that we potentially want to have all
of the same options that git-rebase(1)'s "--rebase-merges=" already
supports. So that begs the question why we need to reinvent the wheel
then and not just use the same syntax.
Note that I'm not arguing that we should support all of these options
now. I'm merely arguing that we should try to be consistent, unless
there is a good argument not to do that. I'm fine with the interface if
there indeed is a good argument, but if so we should document why we
think that the current interface in git-rebase(1) is not a good fit for
this command.
Thanks!
Patrick
^ permalink raw reply
* [PATCH v2 00/12] reftable: harden against corrupted tables
From: Patrick Steinhardt @ 2026-06-29 9:02 UTC (permalink / raw)
To: git; +Cc: oxsignal, Christian Couder
In-Reply-To: <20260624-pks-reftable-hardening-v1-0-66e4ce87c6b9@pks.im>
Hi,
this patch series addresses a bunch of errors that may happen when
trying to read corrupted tables. These errors include out-of-bounds
writes, out-of-bounds reads and the ability to hit abort(3p) calls.
The out-of-bounds write was originally reported by awo on the security
mailing list. As we never transfer reftables over the protocol it would
require local disk access to create such corrupted reftables, so there
isn't really an easy way to exploit these.
In any case, I took that chance and wrote a fuzzer for parsing the
tables, which surfaced a bunch of issues. At the end of this series
though the fuzzer can now run for an extended amount of time (2hrs+)
without surfacing any new issues.
Changes in v2:
- Introduce a test helper that writes a reftable block.
- Link to v1: https://patch.msgid.link/20260624-pks-reftable-hardening-v1-0-66e4ce87c6b9@pks.im
Thanks!
Patrick
---
Patrick Steinhardt (12):
meson: support building fuzzers with libFuzzer
oss-fuzz: add fuzzer for parsing reftables
reftable/basics: fix OOB read on binary search of empty range
reftable/record: don't abort when decoding invalid ref value type
t/unit-tests: introduce test helper to write reftable blocks
reftable/block: fix OOB write with bogus inflated log size
reftable/block: fix OOB read with bogus block size
reftable/block: fix OOB read with bogus restart count
reftable/block: fix use of uninitialized memory when binsearch fails
reftable/block: fix OOB read with bogus restart offset
reftable/table: fix NULL pointer access when seeking to bogus offsets
reftable/table: fix OOB read on truncated table
Makefile | 1 +
ci/run-build-and-minimal-fuzzers.sh | 1 +
meson.build | 15 +++
meson_options.txt | 2 +
oss-fuzz/.gitignore | 1 +
oss-fuzz/fuzz-reftable.c | 74 +++++++++++++++
oss-fuzz/meson.build | 2 +
reftable/basics.c | 3 +
reftable/block.c | 39 +++++++-
reftable/record.c | 6 +-
reftable/table.c | 7 ++
t/unit-tests/u-reftable-basics.c | 11 +++
t/unit-tests/u-reftable-block.c | 184 ++++++++++++++++++++++++++++++++----
t/unit-tests/u-reftable-record.c | 24 +++++
t/unit-tests/u-reftable-table.c | 91 ++++++++++++++++++
15 files changed, 435 insertions(+), 26 deletions(-)
Range-diff versus v1:
1: 82275a5448 = 1: 5bb58da117 meson: support building fuzzers with libFuzzer
2: 8b234b5dc6 = 2: 8d11b15082 oss-fuzz: add fuzzer for parsing reftables
3: f265bcf6f4 = 3: 21186da3f1 reftable/basics: fix OOB read on binary search of empty range
4: a56c6cb50c = 4: 3c327bacc2 reftable/record: don't abort when decoding invalid ref value type
-: ---------- > 5: 4125c76a97 t/unit-tests: introduce test helper to write reftable blocks
5: 9074372e30 ! 6: e923c23518 reftable/block: fix OOB write with bogus inflated log size
@@ reftable/block.c: int reftable_block_init(struct reftable_block *block,
## t/unit-tests/u-reftable-block.c ##
@@ t/unit-tests/u-reftable-block.c: void test_reftable_block__iterator(void)
- block_writer_release(&writer);
+ reftable_block_release(&block);
reftable_buf_release(&data);
}
+
+void test_reftable_block__corrupt_log_block_size(void)
+{
+ struct reftable_block_source source = { 0 };
-+ struct block_writer writer = {
-+ .last_key = REFTABLE_BUF_INIT,
-+ };
+ struct reftable_record rec = {
+ .type = REFTABLE_BLOCK_TYPE_LOG,
+ .u.log = {
@@ t/unit-tests/u-reftable-block.c: void test_reftable_block__iterator(void)
+ },
+ };
+ struct reftable_block block = { 0 };
-+ struct reftable_buf data;
-+
-+ data.len = 1024;
-+ REFTABLE_CALLOC_ARRAY(data.buf, data.len);
-+ cl_assert(data.buf != NULL);
++ struct reftable_buf data = REFTABLE_BUF_INIT;
+
-+ cl_must_pass(block_writer_init(&writer, REFTABLE_BLOCK_TYPE_LOG,
-+ (uint8_t *) data.buf, data.len,
-+ 0, hash_size(REFTABLE_HASH_SHA1)));
-+ cl_must_pass(block_writer_add(&writer, &rec));
-+ cl_assert(block_writer_finish(&writer) > 0);
++ cl_reftable_write_block(&data, REFTABLE_BLOCK_TYPE_LOG, &rec, 1);
+
+ /*
+ * Log blocks store their inflated size as a big-endian 24-bit integer
@@ t/unit-tests/u-reftable-block.c: void test_reftable_block__iterator(void)
+ REFTABLE_FORMAT_ERROR);
+
+ reftable_block_release(&block);
-+ block_writer_release(&writer);
+ reftable_buf_release(&data);
+}
6: 6877f58485 ! 7: 16c2904a96 reftable/block: fix OOB read with bogus block size
@@ reftable/block.c: int reftable_block_init(struct reftable_block *block,
## t/unit-tests/u-reftable-block.c ##
@@ t/unit-tests/u-reftable-block.c: void test_reftable_block__corrupt_log_block_size(void)
- block_writer_release(&writer);
+ reftable_block_release(&block);
reftable_buf_release(&data);
}
+
+void test_reftable_block__corrupt_block_size(void)
+{
+ struct reftable_block_source source = { 0 };
-+ struct block_writer writer = {
-+ .last_key = REFTABLE_BUF_INIT,
-+ };
+ struct reftable_record rec = {
+ .type = REFTABLE_BLOCK_TYPE_REF,
+ .u.ref = {
@@ t/unit-tests/u-reftable-block.c: void test_reftable_block__corrupt_log_block_siz
+ },
+ };
+ struct reftable_block block = { 0 };
-+ struct reftable_buf data;
-+
-+ data.len = 1024;
-+ REFTABLE_CALLOC_ARRAY(data.buf, data.len);
-+ cl_assert(data.buf != NULL);
++ struct reftable_buf data = REFTABLE_BUF_INIT;
+
-+ cl_must_pass(block_writer_init(&writer, REFTABLE_BLOCK_TYPE_REF,
-+ (uint8_t *) data.buf, data.len,
-+ 0, hash_size(REFTABLE_HASH_SHA1)));
-+ cl_must_pass(block_writer_add(&writer, &rec));
-+ cl_assert(block_writer_finish(&writer) > 0);
++ cl_reftable_write_block(&data, REFTABLE_BLOCK_TYPE_REF, &rec, 1);
+
+ /*
+ * The block size is stored as a big-endian 24-bit integer right after
@@ t/unit-tests/u-reftable-block.c: void test_reftable_block__corrupt_log_block_siz
+ REFTABLE_FORMAT_ERROR);
+
+ reftable_block_release(&block);
-+ block_writer_release(&writer);
+ reftable_buf_release(&data);
+}
7: 3c022a4f97 ! 8: 872eca67bb reftable/block: fix OOB read with bogus restart count
@@ reftable/block.c: int reftable_block_init(struct reftable_block *block,
## t/unit-tests/u-reftable-block.c ##
@@ t/unit-tests/u-reftable-block.c: void test_reftable_block__corrupt_block_size(void)
- block_writer_release(&writer);
+ reftable_block_release(&block);
reftable_buf_release(&data);
}
+
+void test_reftable_block__corrupt_restart_count(void)
+{
+ struct reftable_block_source source = { 0 };
-+ struct block_writer writer = {
-+ .last_key = REFTABLE_BUF_INIT,
-+ };
+ struct reftable_record rec = {
+ .type = REFTABLE_BLOCK_TYPE_REF,
+ .u.ref = {
@@ t/unit-tests/u-reftable-block.c: void test_reftable_block__corrupt_block_size(vo
+ },
+ };
+ struct reftable_block block = { 0 };
-+ struct reftable_buf data;
++ struct reftable_buf data = REFTABLE_BUF_INIT;
+ int block_size;
+
-+ data.len = 1024;
-+ REFTABLE_CALLOC_ARRAY(data.buf, data.len);
-+ cl_assert(data.buf != NULL);
-+
-+ cl_must_pass(block_writer_init(&writer, REFTABLE_BLOCK_TYPE_REF,
-+ (uint8_t *) data.buf, data.len,
-+ 0, hash_size(REFTABLE_HASH_SHA1)));
-+ cl_must_pass(block_writer_add(&writer, &rec));
-+ block_size = block_writer_finish(&writer);
-+ cl_assert(block_size > 0);
++ block_size = cl_reftable_write_block(&data, REFTABLE_BLOCK_TYPE_REF, &rec, 1);
+
+ /*
+ * Corrupt the restart count to claim a bogus number of restart points.
@@ t/unit-tests/u-reftable-block.c: void test_reftable_block__corrupt_block_size(vo
+ REFTABLE_FORMAT_ERROR);
+
+ reftable_block_release(&block);
-+ block_writer_release(&writer);
+ reftable_buf_release(&data);
+}
8: af5697b85b = 9: c82d51c163 reftable/block: fix use of uninitialized memory when binsearch fails
9: e9d4eca613 ! 10: 16e1087a66 reftable/block: fix OOB read with bogus restart offset
@@ reftable/block.c: static int restart_needle_less(size_t idx, void *_args)
## t/unit-tests/u-reftable-block.c ##
@@ t/unit-tests/u-reftable-block.c: void test_reftable_block__corrupt_restart_count(void)
- block_writer_release(&writer);
+ reftable_block_release(&block);
reftable_buf_release(&data);
}
+
+void test_reftable_block__corrupt_restart_offset(void)
+{
+ struct reftable_block_source source = { 0 };
-+ struct block_writer writer = {
-+ .last_key = REFTABLE_BUF_INIT,
-+ };
+ struct reftable_record rec = {
+ .type = REFTABLE_BLOCK_TYPE_REF,
+ .u.ref = {
@@ t/unit-tests/u-reftable-block.c: void test_reftable_block__corrupt_restart_count
+ struct reftable_block block = { 0 };
+ struct block_iter it = BLOCK_ITER_INIT;
+ struct reftable_buf want = REFTABLE_BUF_INIT;
-+ struct reftable_buf data;
-+
-+ data.len = 1024;
-+ REFTABLE_CALLOC_ARRAY(data.buf, data.len);
-+ cl_assert(data.buf != NULL);
++ struct reftable_buf data = REFTABLE_BUF_INIT;
+
-+ cl_must_pass(block_writer_init(&writer, REFTABLE_BLOCK_TYPE_REF,
-+ (uint8_t *) data.buf, data.len,
-+ 0, hash_size(REFTABLE_HASH_SHA1)));
-+ cl_must_pass(block_writer_add(&writer, &rec));
-+ cl_assert(block_writer_finish(&writer) > 0);
++ cl_reftable_write_block(&data, REFTABLE_BLOCK_TYPE_REF, &rec, 1);
+
+ block_source_from_buf(&source, &data);
+ cl_must_pass(reftable_block_init(&block, &source, 0, 0, data.len,
@@ t/unit-tests/u-reftable-block.c: void test_reftable_block__corrupt_restart_count
+ reftable_buf_release(&want);
+ block_iter_close(&it);
+ reftable_block_release(&block);
-+ block_writer_release(&writer);
+ reftable_buf_release(&data);
+}
10: 4bb729aeb0 = 11: 63dd98f908 reftable/table: fix NULL pointer access when seeking to bogus offsets
11: e3bca6af6e = 12: 32696a01bc reftable/table: fix OOB read on truncated table
---
base-commit: ab776a62a78576513ee121424adb19597fbb7613
change-id: 20260623-pks-reftable-hardening-f54de69fea63
^ permalink raw reply
* [PATCH v2 01/12] meson: support building fuzzers with libFuzzer
From: Patrick Steinhardt @ 2026-06-29 9:02 UTC (permalink / raw)
To: git; +Cc: oxsignal, Christian Couder
In-Reply-To: <20260629-pks-reftable-hardening-v2-0-b0228e7d908d@pks.im>
To support fuzzing via libFuzzer one has to pass a couple of compiler
options:
- It is mandatory to enable the "fuzzer-no-link" sanitizer for
coverage feedback.
- It is recommended to enable at least one more sanitizer to catch
issues, like the "address" sanitizer.
- The fuzzing executables need to be linked with "-fsanitize=fuzzer"
to wire up libFuzzer itself.
The first two items can already be achieved via the "-Db_sanitize="
option. But the last item cannot easily be achieved, as we can only
configure global link arguments.
Introduce a new "-Dfuzzers_link_args=" build option to plug this gap.
Add documentation so that users know how to set up libFuzzer.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
meson.build | 15 +++++++++++++++
meson_options.txt | 2 ++
oss-fuzz/meson.build | 1 +
3 files changed, 18 insertions(+)
diff --git a/meson.build b/meson.build
index 3247697f74..9df6fbb0a5 100644
--- a/meson.build
+++ b/meson.build
@@ -161,6 +161,21 @@
# These machine files can be passed to `meson setup` via the `--native-file`
# option.
#
+# Fuzzing
+# =======
+#
+# Meson supports building the fuzzing targets by setting `-Dfuzzers=true`. By
+# default, the targets will be built without libFuzzer and thus won't be usable
+# for fuzzing. You have to configure a couple of options to properly wire up
+# libFuzzer:
+#
+# $ meson setup build-fuzzers \
+# -Db_sanitize=address,fuzzer-no-link \
+# -Dfuzzers=true \
+# -Dfuzzers_link_args=-fsanitize=fuzzer
+# $ meson compile -C build-fuzzers
+# $ ./build-fuzzers/oss-fuzz/fuzz-config <args>
+#
# Cross compilation
# =================
#
diff --git a/meson_options.txt b/meson_options.txt
index d936ada098..dc88f130d7 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -131,3 +131,5 @@ option('test_utf8_locale', type: 'string',
description: 'Name of a UTF-8 locale used for testing.')
option('fuzzers', type: 'boolean', value: false,
description: 'Enable building fuzzers.')
+option('fuzzers_link_args', type: 'array', value: [],
+ description: 'Linker arguments used to link fuzzers. Use -fsanitize=fuzzer for fuzzing.')
diff --git a/oss-fuzz/meson.build b/oss-fuzz/meson.build
index 878afd8426..10bcac2f6d 100644
--- a/oss-fuzz/meson.build
+++ b/oss-fuzz/meson.build
@@ -16,5 +16,6 @@ foreach fuzz_program : fuzz_programs
fuzz_program,
],
dependencies: [libgit_commonmain],
+ link_args: get_option('fuzzers_link_args'),
)
endforeach
--
2.55.0.rc2.803.g1fd1e6609c.dirty
^ permalink raw reply related
* [PATCH v2 02/12] oss-fuzz: add fuzzer for parsing reftables
From: Patrick Steinhardt @ 2026-06-29 9:02 UTC (permalink / raw)
To: git; +Cc: oxsignal, Christian Couder
In-Reply-To: <20260629-pks-reftable-hardening-v2-0-b0228e7d908d@pks.im>
Add a new fuzzer that exercises our parsing of reftables. Fallout from
this fuzzer will be fixed over subsequent commits.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
Makefile | 1 +
ci/run-build-and-minimal-fuzzers.sh | 1 +
oss-fuzz/.gitignore | 1 +
oss-fuzz/fuzz-reftable.c | 74 +++++++++++++++++++++++++++++++++++++
oss-fuzz/meson.build | 1 +
5 files changed, 78 insertions(+)
diff --git a/Makefile b/Makefile
index 1cec251f43..89d3edd5ea 100644
--- a/Makefile
+++ b/Makefile
@@ -2599,6 +2599,7 @@ FUZZ_OBJS += oss-fuzz/fuzz-date.o
FUZZ_OBJS += oss-fuzz/fuzz-pack-headers.o
FUZZ_OBJS += oss-fuzz/fuzz-pack-idx.o
FUZZ_OBJS += oss-fuzz/fuzz-parse-attr-line.o
+FUZZ_OBJS += oss-fuzz/fuzz-reftable.o
FUZZ_OBJS += oss-fuzz/fuzz-url-decode-mem.o
.PHONY: fuzz-objs
fuzz-objs: $(FUZZ_OBJS)
diff --git a/ci/run-build-and-minimal-fuzzers.sh b/ci/run-build-and-minimal-fuzzers.sh
index e7b97952e7..37b24b092d 100755
--- a/ci/run-build-and-minimal-fuzzers.sh
+++ b/ci/run-build-and-minimal-fuzzers.sh
@@ -21,6 +21,7 @@ date
pack-headers
pack-idx
parse-attr-line
+reftable
url-decode-mem
"
diff --git a/oss-fuzz/.gitignore b/oss-fuzz/.gitignore
index f2d74de457..dc7a127a62 100644
--- a/oss-fuzz/.gitignore
+++ b/oss-fuzz/.gitignore
@@ -5,4 +5,5 @@ fuzz-date
fuzz-pack-headers
fuzz-pack-idx
fuzz-parse-attr-line
+fuzz-reftable
fuzz-url-decode-mem
diff --git a/oss-fuzz/fuzz-reftable.c b/oss-fuzz/fuzz-reftable.c
new file mode 100644
index 0000000000..c46eac2c6b
--- /dev/null
+++ b/oss-fuzz/fuzz-reftable.c
@@ -0,0 +1,74 @@
+#include "git-compat-util.h"
+#include "reftable/basics.h"
+#include "reftable/blocksource.h"
+#include "reftable/reftable-blocksource.h"
+#include "reftable/reftable-error.h"
+#include "reftable/reftable-iterator.h"
+#include "reftable/reftable-record.h"
+#include "reftable/reftable-table.h"
+#include "reftable/reftable-writer.h"
+
+int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size);
+
+int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
+{
+ struct reftable_block_source source = { 0 };
+ struct reftable_buf buf = REFTABLE_BUF_INIT;
+ struct reftable_table *table = NULL;
+ int err;
+
+ if (reftable_buf_add(&buf, (const char *)data, size) < 0)
+ goto out;
+ block_source_from_buf(&source, &buf);
+
+ err = reftable_table_new(&table, &source, "fuzz-input");
+ if (err < 0)
+ goto out;
+
+ /*
+ * Exercise the ref, log and raw block iterators so that we cover as
+ * much of the parsing code as possible.
+ */
+ {
+ struct reftable_ref_record ref = { 0 };
+ struct reftable_iterator it = { 0 };
+
+ reftable_table_init_ref_iterator(table, &it);
+ if (!reftable_iterator_seek_ref(&it, ""))
+ while (!reftable_iterator_next_ref(&it, &ref))
+ ;
+
+ reftable_ref_record_release(&ref);
+ reftable_iterator_destroy(&it);
+ }
+
+ {
+ struct reftable_log_record log = { 0 };
+ struct reftable_iterator it = { 0 };
+
+ reftable_table_init_log_iterator(table, &it);
+ if (!reftable_iterator_seek_log(&it, ""))
+ while (!reftable_iterator_next_log(&it, &log))
+ ;
+
+ reftable_log_record_release(&log);
+ reftable_iterator_destroy(&it);
+ }
+
+ {
+ struct reftable_table_iterator it = { 0 };
+ const struct reftable_block *block;
+
+ if (!reftable_table_iterator_init(&it, table))
+ while (!reftable_table_iterator_next(&it, &block))
+ ;
+
+ reftable_table_iterator_release(&it);
+ }
+
+out:
+ if (table)
+ reftable_table_decref(table);
+ reftable_buf_release(&buf);
+ return 0;
+}
diff --git a/oss-fuzz/meson.build b/oss-fuzz/meson.build
index 10bcac2f6d..5a3854256b 100644
--- a/oss-fuzz/meson.build
+++ b/oss-fuzz/meson.build
@@ -6,6 +6,7 @@ fuzz_programs = [
'fuzz-pack-headers.c',
'fuzz-pack-idx.c',
'fuzz-parse-attr-line.c',
+ 'fuzz-reftable.c',
'fuzz-url-decode-mem.c',
]
--
2.55.0.rc2.803.g1fd1e6609c.dirty
^ permalink raw reply related
* [PATCH v2 03/12] reftable/basics: fix OOB read on binary search of empty range
From: Patrick Steinhardt @ 2026-06-29 9:02 UTC (permalink / raw)
To: git; +Cc: oxsignal, Christian Couder
In-Reply-To: <20260629-pks-reftable-hardening-v2-0-b0228e7d908d@pks.im>
`binsearch()` performs a binary search over a range of `sz` elements by
repeatedly calling the comparison function with indices into that range.
When the range is empty though, there is no valid index to call the
comparison function with. We still end up executing the comparison
function though with an index of 0, which of course will cause an
out-of-bounds read.
Return early when the range is empty.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
reftable/basics.c | 3 +++
t/unit-tests/u-reftable-basics.c | 11 +++++++++++
2 files changed, 14 insertions(+)
diff --git a/reftable/basics.c b/reftable/basics.c
index e969927b61..f0442a46cf 100644
--- a/reftable/basics.c
+++ b/reftable/basics.c
@@ -152,6 +152,9 @@ size_t binsearch(size_t sz, int (*f)(size_t k, void *args), void *args)
size_t lo = 0;
size_t hi = sz;
+ if (!sz)
+ return 0;
+
/* Invariants:
*
* (hi == sz) || f(hi) == true
diff --git a/t/unit-tests/u-reftable-basics.c b/t/unit-tests/u-reftable-basics.c
index 73566ed0eb..c5d83b6714 100644
--- a/t/unit-tests/u-reftable-basics.c
+++ b/t/unit-tests/u-reftable-basics.c
@@ -60,6 +60,17 @@ void test_reftable_basics__binsearch(void)
}
}
+static int unreachable_lesseq(size_t i UNUSED, void *args UNUSED)
+{
+ cl_fail("comparison function called for empty range");
+ return 0;
+}
+
+void test_reftable_basics__binsearch_empty(void)
+{
+ cl_assert_equal_i(binsearch(0, &unreachable_lesseq, NULL), 0);
+}
+
void test_reftable_basics__names_length(void)
{
const char *a[] = { "a", "b", NULL };
--
2.55.0.rc2.803.g1fd1e6609c.dirty
^ permalink raw reply related
* [PATCH v2 04/12] reftable/record: don't abort when decoding invalid ref value type
From: Patrick Steinhardt @ 2026-06-29 9:02 UTC (permalink / raw)
To: git; +Cc: oxsignal, Christian Couder
In-Reply-To: <20260629-pks-reftable-hardening-v2-0-b0228e7d908d@pks.im>
When decoding a ref record we read its value type from the block. In
case the type itself is invalid we call `abort()`. This is rather
heavy-handed though: the data we're reading is untrusted, so we should
treat the issue as a normal and not as a programming error.
Fix this by handling the error gracefully. Note that this also requires
us to set the value type later, as otherwise we might store an invalid
type in the record.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
reftable/record.c | 6 +++---
t/unit-tests/u-reftable-record.c | 24 ++++++++++++++++++++++++
2 files changed, 27 insertions(+), 3 deletions(-)
diff --git a/reftable/record.c b/reftable/record.c
index fcd387ba5d..1fce441930 100644
--- a/reftable/record.c
+++ b/reftable/record.c
@@ -388,7 +388,6 @@ static int reftable_ref_record_decode(void *rec, struct reftable_buf key,
r->refname[key.len] = 0;
r->update_index = update_index;
- r->value_type = val_type;
switch (val_type) {
case REFTABLE_REF_VAL1:
if (in.len < hash_size) {
@@ -426,9 +425,10 @@ static int reftable_ref_record_decode(void *rec, struct reftable_buf key,
case REFTABLE_REF_DELETION:
break;
default:
- abort();
- break;
+ err = REFTABLE_FORMAT_ERROR;
+ goto done;
}
+ r->value_type = val_type;
return start.len - in.len;
diff --git a/t/unit-tests/u-reftable-record.c b/t/unit-tests/u-reftable-record.c
index 1bf2e170dc..9c95083ef4 100644
--- a/t/unit-tests/u-reftable-record.c
+++ b/t/unit-tests/u-reftable-record.c
@@ -11,6 +11,7 @@
#include "reftable/basics.h"
#include "reftable/constants.h"
#include "reftable/record.h"
+#include "reftable/reftable-error.h"
static void t_copy(struct reftable_record *rec)
{
@@ -202,6 +203,29 @@ void test_reftable_record__ref_record_roundtrip(void)
reftable_buf_release(&scratch);
}
+void test_reftable_record__ref_record_decode_invalid_value_type(void)
+{
+ struct reftable_buf scratch = REFTABLE_BUF_INIT;
+ struct reftable_record out = {
+ .type = REFTABLE_BLOCK_TYPE_REF,
+ };
+ struct reftable_buf key = REFTABLE_BUF_INIT;
+ uint8_t buffer[1024] = { 0 };
+ struct string_view dest = {
+ .buf = buffer,
+ .len = sizeof(buffer),
+ };
+
+ cl_must_pass(reftable_buf_addstr(&key, "refs/heads/master"));
+ cl_assert_equal_i(reftable_record_decode(&out, key, REFTABLE_NR_REF_VALUETYPES,
+ dest, REFTABLE_HASH_SIZE_SHA1, &scratch),
+ REFTABLE_FORMAT_ERROR);
+
+ reftable_record_release(&out);
+ reftable_buf_release(&key);
+ reftable_buf_release(&scratch);
+}
+
void test_reftable_record__log_record_comparison(void)
{
struct reftable_record in[3] = {
--
2.55.0.rc2.803.g1fd1e6609c.dirty
^ permalink raw reply related
* [PATCH v2 05/12] t/unit-tests: introduce test helper to write reftable blocks
From: Patrick Steinhardt @ 2026-06-29 9:02 UTC (permalink / raw)
To: git; +Cc: oxsignal, Christian Couder
In-Reply-To: <20260629-pks-reftable-hardening-v2-0-b0228e7d908d@pks.im>
Introduce a new test helper that allows us to write reftable blocks.
This helper will be used by subsequent commits.
Suggested-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
t/unit-tests/u-reftable-block.c | 47 ++++++++++++++++++++++++-----------------
1 file changed, 28 insertions(+), 19 deletions(-)
diff --git a/t/unit-tests/u-reftable-block.c b/t/unit-tests/u-reftable-block.c
index f4bded7d26..f4e926ce3a 100644
--- a/t/unit-tests/u-reftable-block.c
+++ b/t/unit-tests/u-reftable-block.c
@@ -14,6 +14,31 @@ license that can be found in the LICENSE file or at
#include "reftable/reftable-error.h"
#include "strbuf.h"
+static int cl_reftable_write_block(struct reftable_buf *buf,
+ uint8_t block_type,
+ struct reftable_record *recs,
+ size_t nrecs)
+{
+ struct block_writer writer = {
+ .last_key = REFTABLE_BUF_INIT,
+ };
+ uint8_t block[1024];
+ int block_end;
+
+ cl_must_pass(block_writer_init(&writer, block_type, block, 1024,
+ 0, hash_size(REFTABLE_HASH_SHA1)));
+ for (size_t i = 0; i < nrecs; i++)
+ cl_must_pass(block_writer_add(&writer, &recs[i]));
+
+ block_end = block_writer_finish(&writer);
+ cl_assert(block_end > 0);
+
+ cl_must_pass(reftable_buf_add(buf, block, block_end));
+
+ block_writer_release(&writer);
+ return block_end;
+}
+
void test_reftable_block__read_write(void)
{
const int header_off = 21; /* random */
@@ -381,25 +406,13 @@ void test_reftable_block__ref_read_write(void)
void test_reftable_block__iterator(void)
{
struct reftable_block_source source = { 0 };
- struct block_writer writer = {
- .last_key = REFTABLE_BUF_INIT,
- };
struct reftable_record expected_refs[20];
struct reftable_ref_record ref = { 0 };
struct reftable_iterator it = { 0 };
struct reftable_block block = { 0 };
- struct reftable_buf data;
+ struct reftable_buf data = REFTABLE_BUF_INIT;
int err;
- data.len = 1024;
- REFTABLE_CALLOC_ARRAY(data.buf, data.len);
- cl_assert(data.buf != NULL);
-
- err = block_writer_init(&writer, REFTABLE_BLOCK_TYPE_REF,
- (uint8_t *) data.buf, data.len,
- 0, hash_size(REFTABLE_HASH_SHA1));
- cl_assert(!err);
-
for (size_t i = 0; i < ARRAY_SIZE(expected_refs); i++) {
expected_refs[i] = (struct reftable_record) {
.type = REFTABLE_BLOCK_TYPE_REF,
@@ -409,13 +422,10 @@ void test_reftable_block__iterator(void)
},
};
memset(expected_refs[i].u.ref.value.val1, i, REFTABLE_HASH_SIZE_SHA1);
-
- err = block_writer_add(&writer, &expected_refs[i]);
- cl_assert_equal_i(err, 0);
}
- err = block_writer_finish(&writer);
- cl_assert(err > 0);
+ cl_reftable_write_block(&data, REFTABLE_BLOCK_TYPE_REF,
+ expected_refs, ARRAY_SIZE(expected_refs));
block_source_from_buf(&source, &data);
reftable_block_init(&block, &source, 0, 0, data.len,
@@ -453,6 +463,5 @@ void test_reftable_block__iterator(void)
reftable_ref_record_release(&ref);
reftable_iterator_destroy(&it);
reftable_block_release(&block);
- block_writer_release(&writer);
reftable_buf_release(&data);
}
--
2.55.0.rc2.803.g1fd1e6609c.dirty
^ permalink raw reply related
* [PATCH v2 06/12] reftable/block: fix OOB write with bogus inflated log size
From: Patrick Steinhardt @ 2026-06-29 9:02 UTC (permalink / raw)
To: git; +Cc: oxsignal, Christian Couder
In-Reply-To: <20260629-pks-reftable-hardening-v2-0-b0228e7d908d@pks.im>
The "log" reftable block stores reflog information. This information is
compressed using zlib. The inflated size is stored in the block header
so that callers can easily learn ahead of time how large of a buffer
they have to allocate to inflate the data in a single pass. So to
reconstruct the full inflated block we:
- Copy over the header as-is, as it's not deflated.
- Append the inflated data to the buffer.
The inflated block size stored in the header also includes the length of
the header itself. So to figure out the bytes that should be inflated by
zlib we need to subtract the header size, which is trusted data, from
the block size, which is untrusted data derived from the block header.
While we do verify that we were able to inflate all data as expected, we
don't verify ahead of time that the encoded block length is larger than
the header length. This can lead to an underflow, which makes zlib
assume that it can write more data into the target buffer than we have
allocated. The result is an out-of-bounds write:
==1422297==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c1ff6de5231 at pc 0x55555579a628 bp 0x7fffffff4f10 sp 0x7fffffff46d0
WRITE of size 4 at 0x7c1ff6de5231 thread T0
#0 0x55555579a627 in __asan_memcpy (./build/t/unit-tests+0x246627)
#1 0x55555598b093 in reftable_block_init ./build/../reftable/block.c:277:3
#2 0x555555813701 in test_reftable_block__corrupt_log_block_size ./build/../t/unit-tests/u-reftable-block.c:495:20
#3 0x5555557f684e in clar_run_test ./build/../t/unit-tests/clar/clar.c:335:3
#4 0x5555557f2e69 in clar_run_suite ./build/../t/unit-tests/clar/clar.c:431:3
#5 0x5555557f2882 in clar_test_run ./build/../t/unit-tests/clar/clar.c:636:4
#6 0x5555557f375f in clar_test ./build/../t/unit-tests/clar/clar.c:687:11
#7 0x5555557fa49d in cmd_main ./build/../t/unit-tests/unit-test.c:62:8
#8 0x55555584af4a in main ./build/../common-main.c:9:11
#9 0x7ffff7a2b284 in __libc_start_call_main (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b284) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
#10 0x7ffff7a2b337 in __libc_start_main@GLIBC_2.2.5 (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b337) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
#11 0x555555694c24 in _start (./build/t/unit-tests+0x140c24)
0x7c1ff6de5231 is located 0 bytes after 1-byte region [0x7c1ff6de5230,0x7c1ff6de5231)
allocated by thread T0 here:
#0 0x55555579db1b in realloc.part.0 asan_malloc_linux.cpp.o
#1 0x5555559868d7 in reftable_realloc ./build/../reftable/basics.c:36:9
#2 0x55555598a98f in reftable_alloc_grow ./build/../reftable/basics.h:229:10
#3 0x55555598ae58 in reftable_block_init ./build/../reftable/block.c:269:3
#4 0x555555813701 in test_reftable_block__corrupt_log_block_size ./build/../t/unit-tests/u-reftable-block.c:495:20
#5 0x5555557f684e in clar_run_test ./build/../t/unit-tests/clar/clar.c:335:3
#6 0x5555557f2e69 in clar_run_suite ./build/../t/unit-tests/clar/clar.c:431:3
#7 0x5555557f2882 in clar_test_run ./build/../t/unit-tests/clar/clar.c:636:4
#8 0x5555557f375f in clar_test ./build/../t/unit-tests/clar/clar.c:687:11
#9 0x5555557fa49d in cmd_main ./build/../t/unit-tests/unit-test.c:62:8
#10 0x55555584af4a in main ./build/../common-main.c:9:11
#11 0x7ffff7a2b284 in __libc_start_call_main (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b284) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
#12 0x7ffff7a2b337 in __libc_start_main@GLIBC_2.2.5 (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b337) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
#13 0x555555694c24 in _start (./build/t/unit-tests+0x140c24)
SUMMARY: AddressSanitizer: heap-buffer-overflow (./build/t/unit-tests+0x246627) in __asan_memcpy
Shadow bytes around the buggy address:
0x7c1ff6de4f80: fa fa fd fd fa fa fd fd fa fa fd fd fa fa fd fd
0x7c1ff6de5000: fa fa fd fd fa fa fd fd fa fa fd fd fa fa fd fd
0x7c1ff6de5080: fa fa fd fd fa fa fd fd fa fa fd fd fa fa fd fd
0x7c1ff6de5100: fa fa fd fd fa fa fd fd fa fa fd fd fa fa fd fd
0x7c1ff6de5180: fa fa fd fd fa fa fd fd fa fa fd fa fa fa fd fd
=>0x7c1ff6de5200: fa fa 04 fa fa fa[01]fa fa fa fa fa fa fa fa fa
0x7c1ff6de5280: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x7c1ff6de5300: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x7c1ff6de5380: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x7c1ff6de5400: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x7c1ff6de5480: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Fix the bug by adding a sanity check and add a unit test.
Reported-by: oxsignal <awo@kakao.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
reftable/block.c | 9 +++++++++
t/unit-tests/u-reftable-block.c | 32 ++++++++++++++++++++++++++++++++
2 files changed, 41 insertions(+)
diff --git a/reftable/block.c b/reftable/block.c
index 920b3f4486..b86cb9ec5a 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -260,6 +260,15 @@ int reftable_block_init(struct reftable_block *block,
goto done;
}
+ /*
+ * Verify that the block size covers at least the table header, block
+ * header and the 2 byte restart counter.
+ */
+ if (block_size < header_size + 4 + 2) {
+ err = REFTABLE_FORMAT_ERROR;
+ goto done;
+ }
+
if (block_type == REFTABLE_BLOCK_TYPE_LOG) {
uint32_t block_header_skip = 4 + header_size;
uLong dst_len = block_size - block_header_skip;
diff --git a/t/unit-tests/u-reftable-block.c b/t/unit-tests/u-reftable-block.c
index f4e926ce3a..088162483e 100644
--- a/t/unit-tests/u-reftable-block.c
+++ b/t/unit-tests/u-reftable-block.c
@@ -465,3 +465,35 @@ void test_reftable_block__iterator(void)
reftable_block_release(&block);
reftable_buf_release(&data);
}
+
+void test_reftable_block__corrupt_log_block_size(void)
+{
+ struct reftable_block_source source = { 0 };
+ struct reftable_record rec = {
+ .type = REFTABLE_BLOCK_TYPE_LOG,
+ .u.log = {
+ .refname = (char *) "refs/heads/main",
+ .update_index = 1,
+ .value_type = REFTABLE_LOG_UPDATE,
+ },
+ };
+ struct reftable_block block = { 0 };
+ struct reftable_buf data = REFTABLE_BUF_INIT;
+
+ cl_reftable_write_block(&data, REFTABLE_BLOCK_TYPE_LOG, &rec, 1);
+
+ /*
+ * Log blocks store their inflated size as a big-endian 24-bit integer
+ * right after the one-byte block type. Rewrite it to claim a size that
+ * is smaller than the block header.
+ */
+ reftable_put_be24((uint8_t *) data.buf + 1, 1);
+
+ block_source_from_buf(&source, &data);
+ cl_assert_equal_i(reftable_block_init(&block, &source, 0, 0, data.len,
+ REFTABLE_HASH_SIZE_SHA1, REFTABLE_BLOCK_TYPE_LOG),
+ REFTABLE_FORMAT_ERROR);
+
+ reftable_block_release(&block);
+ reftable_buf_release(&data);
+}
--
2.55.0.rc2.803.g1fd1e6609c.dirty
^ permalink raw reply related
* [PATCH v2 07/12] reftable/block: fix OOB read with bogus block size
From: Patrick Steinhardt @ 2026-06-29 9:02 UTC (permalink / raw)
To: git; +Cc: oxsignal, Christian Couder
In-Reply-To: <20260629-pks-reftable-hardening-v2-0-b0228e7d908d@pks.im>
The block size is read from the block header, which is untrusted data.
We use it without verification to access the restart count at the end of
the block as well as to compute the restart table offset. With a bogus
block size that exceeds the data we have actually read this can lead to
an out-of-bounds read:
==1458284==ERROR: AddressSanitizer: SEGV on unknown address 0x7d8ff7de4b7d (pc 0x55555598c339 bp 0x7fffffff4ef0 sp 0x7fffffff4eb0 T0)
==1458284==The signal is caused by a READ memory access.
#0 0x55555598c339 in reftable_get_be16 ./build/../reftable/basics.h:118:9
#1 0x55555598bee2 in reftable_block_init ./build/../reftable/block.c:344:18
#2 0x555555813e0e in test_reftable_block__corrupt_block_size ./build/../t/unit-tests/u-reftable-block.c:540:8
#3 0x5555557f684e in clar_run_test ./build/../t/unit-tests/clar/clar.c:335:3
#4 0x5555557f2e69 in clar_run_suite ./build/../t/unit-tests/clar/clar.c:431:3
#5 0x5555557f2882 in clar_test_run ./build/../t/unit-tests/clar/clar.c:636:4
#6 0x5555557f375f in clar_test ./build/../t/unit-tests/clar/clar.c:687:11
#7 0x5555557fa49d in cmd_main ./build/../t/unit-tests/unit-test.c:62:8
#8 0x55555584b55a in main ./build/../common-main.c:9:11
#9 0x7ffff7a2b284 in __libc_start_call_main (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b284) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
#10 0x7ffff7a2b337 in __libc_start_main@GLIBC_2.2.5 (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b337) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
#11 0x555555694c24 in _start (./build/t/unit-tests+0x140c24)
==1458284==Register values:
rax = 0x00007d8ff7de4b7d rbx = 0x00007fffffff4f00 rcx = 0x0000000000000006 rdx = 0x0000000000000010
rdi = 0x00007d8ff7de4b7d rsi = 0x00007bfff5cf0420 rbp = 0x00007fffffff4ef0 rsp = 0x00007fffffff4eb0
r8 = 0x00000f807eb960b8 r9 = 0x0000000000000001 r10 = 0x00007bfff5cf05e7 r11 = 0x000000000000000f
r12 = 0x00007fffffff58f8 r13 = 0x0000000000000001 r14 = 0x0000555555ee8160 r15 = 0x0000000000000000
AddressSanitizer can not provide additional info.
Verify that the claimed block size fits into the block data before using
it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
reftable/block.c | 9 +++++++++
t/unit-tests/u-reftable-block.c | 33 +++++++++++++++++++++++++++++++++
2 files changed, 42 insertions(+)
diff --git a/reftable/block.c b/reftable/block.c
index b86cb9ec5a..4d6b11c2e7 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -340,6 +340,15 @@ int reftable_block_init(struct reftable_block *block,
full_block_size = block_size;
}
+ /*
+ * Ensure that we have sufficient data available now to satisfy the
+ * claimed block size.
+ */
+ if (block_size > block->block_data.len) {
+ err = REFTABLE_FORMAT_ERROR;
+ goto done;
+ }
+
restart_count = reftable_get_be16(block->block_data.data + block_size - 2);
restart_off = block_size - 2 - 3 * restart_count;
diff --git a/t/unit-tests/u-reftable-block.c b/t/unit-tests/u-reftable-block.c
index 088162483e..43b9d5fb59 100644
--- a/t/unit-tests/u-reftable-block.c
+++ b/t/unit-tests/u-reftable-block.c
@@ -497,3 +497,36 @@ void test_reftable_block__corrupt_log_block_size(void)
reftable_block_release(&block);
reftable_buf_release(&data);
}
+
+void test_reftable_block__corrupt_block_size(void)
+{
+ struct reftable_block_source source = { 0 };
+ struct reftable_record rec = {
+ .type = REFTABLE_BLOCK_TYPE_REF,
+ .u.ref = {
+ .value_type = REFTABLE_REF_VAL1,
+ .refname = (char *) "refs/heads/main",
+ },
+ };
+ struct reftable_block block = { 0 };
+ struct reftable_buf data = REFTABLE_BUF_INIT;
+
+ cl_reftable_write_block(&data, REFTABLE_BLOCK_TYPE_REF, &rec, 1);
+
+ /*
+ * The block size is stored as a big-endian 24-bit integer right after
+ * the one-byte block type at the start of the block. Corrupt it to
+ * claim a size that is larger than the data we actually have. Reading
+ * the restart count and restart table relative to such a bogus block
+ * size must not access out-of-bounds memory.
+ */
+ reftable_put_be24((uint8_t *) data.buf + 1, 0xffffff);
+
+ block_source_from_buf(&source, &data);
+ cl_assert_equal_i(reftable_block_init(&block, &source, 0, 0, data.len,
+ REFTABLE_HASH_SIZE_SHA1, REFTABLE_BLOCK_TYPE_REF),
+ REFTABLE_FORMAT_ERROR);
+
+ reftable_block_release(&block);
+ reftable_buf_release(&data);
+}
--
2.55.0.rc2.803.g1fd1e6609c.dirty
^ permalink raw reply related
* [PATCH v2 08/12] reftable/block: fix OOB read with bogus restart count
From: Patrick Steinhardt @ 2026-06-29 9:02 UTC (permalink / raw)
To: git; +Cc: oxsignal, Christian Couder
In-Reply-To: <20260629-pks-reftable-hardening-v2-0-b0228e7d908d@pks.im>
The restart count is stored in the last two bytes of a block. We use it
without verification to compute the offset of the restart table. With a
bogus restart count that is large enough this computation underflows,
and the subsequent reads via the restart table access out-of-bounds
memory:
==129439==ERROR: AddressSanitizer: SEGV on unknown address 0x7d90f6dcd0ad (pc 0x55555598ce89 bp 0x7fffffff4ed0 sp 0x7fffffff4e80 T0)
==129439==The signal is caused by a READ memory access.
#0 0x55555598ce89 in reftable_get_be24 ./git/build/../reftable/basics.h:125:9
#1 0x55555598eabf in block_restart_offset ./git/build/../reftable/block.c:407:9
#2 0x55555598e5d5 in restart_needle_less ./git/build/../reftable/block.c:431:17
#3 0x5555559887e2 in binsearch ./git/build/../reftable/basics.c:165:13
#4 0x55555598dfec in block_iter_seek_key ./git/build/../reftable/block.c:529:6
#5 0x555555814517 in test_reftable_block__corrupt_restart_count ./git/build/../t/unit-tests/u-reftable-block.c:593:15
#6 0x5555557f684e in clar_run_test ./git/build/../t/unit-tests/clar/clar.c:335:3
#7 0x5555557f2e69 in clar_run_suite ./git/build/../t/unit-tests/clar/clar.c:431:3
#8 0x5555557f2882 in clar_test_run ./git/build/../t/unit-tests/clar/clar.c:636:4
#9 0x5555557f375f in clar_test ./git/build/../t/unit-tests/clar/clar.c:687:11
#10 0x5555557fa49d in cmd_main ./git/build/../t/unit-tests/unit-test.c:62:8
#11 0x55555584c12a in main ./git/build/../common-main.c:9:11
#12 0x7ffff7a2b284 in __libc_start_call_main (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b284) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
#13 0x7ffff7a2b337 in __libc_start_main@GLIBC_2.2.5 (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b337) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
#14 0x555555694c24 in _start (./git/build/t/unit-tests+0x140c24)
==129439==Register values:
rax = 0x00007d90f6dcd0ad rbx = 0x00007fffffff4f20 rcx = 0xf2f2f2f8f2f2f2f8 rdx = 0x0000000000000000
rdi = 0x00007d90f6dcd0ad rsi = 0x0000000000007fff rbp = 0x00007fffffff4ed0 rsp = 0x00007fffffff4e80
r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x0000000000000000 r11 = 0x0000000000000017
r12 = 0x00007fffffff58e8 r13 = 0x0000000000000001 r14 = 0x00007ffff7ffd000 r15 = 0x00005555560550b0
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV ./git/build/../reftable/basics.h:125:9 in reftable_get_be24
Verify that the restart table actually fits into the block.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
reftable/block.c | 4 ++++
t/unit-tests/u-reftable-block.c | 33 +++++++++++++++++++++++++++++++++
2 files changed, 37 insertions(+)
diff --git a/reftable/block.c b/reftable/block.c
index 4d6b11c2e7..4d285aefd7 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -351,6 +351,10 @@ int reftable_block_init(struct reftable_block *block,
restart_count = reftable_get_be16(block->block_data.data + block_size - 2);
restart_off = block_size - 2 - 3 * restart_count;
+ if (restart_off < header_size + 4 || restart_off > block_size - 2) {
+ err = REFTABLE_FORMAT_ERROR;
+ goto done;
+ }
block->block_type = block_type;
block->hash_size = hash_size;
diff --git a/t/unit-tests/u-reftable-block.c b/t/unit-tests/u-reftable-block.c
index 43b9d5fb59..a0fbfb247f 100644
--- a/t/unit-tests/u-reftable-block.c
+++ b/t/unit-tests/u-reftable-block.c
@@ -530,3 +530,36 @@ void test_reftable_block__corrupt_block_size(void)
reftable_block_release(&block);
reftable_buf_release(&data);
}
+
+void test_reftable_block__corrupt_restart_count(void)
+{
+ struct reftable_block_source source = { 0 };
+ struct reftable_record rec = {
+ .type = REFTABLE_BLOCK_TYPE_REF,
+ .u.ref = {
+ .value_type = REFTABLE_REF_VAL1,
+ .refname = (char *) "refs/heads/main",
+ },
+ };
+ struct reftable_block block = { 0 };
+ struct reftable_buf data = REFTABLE_BUF_INIT;
+ int block_size;
+
+ block_size = cl_reftable_write_block(&data, REFTABLE_BLOCK_TYPE_REF, &rec, 1);
+
+ /*
+ * Corrupt the restart count to claim a bogus number of restart points.
+ * Note that this would only cause us to perform an out-of-bounds
+ * access when seeking into the block, but we want to refuse such a
+ * block outright.
+ */
+ reftable_put_be16((uint8_t *) data.buf + block_size - 2, 0xffff);
+
+ block_source_from_buf(&source, &data);
+ cl_assert_equal_i(reftable_block_init(&block, &source, 0, 0, data.len,
+ REFTABLE_HASH_SIZE_SHA1, REFTABLE_BLOCK_TYPE_REF),
+ REFTABLE_FORMAT_ERROR);
+
+ reftable_block_release(&block);
+ reftable_buf_release(&data);
+}
--
2.55.0.rc2.803.g1fd1e6609c.dirty
^ permalink raw reply related
* [PATCH v2 09/12] reftable/block: fix use of uninitialized memory when binsearch fails
From: Patrick Steinhardt @ 2026-06-29 9:02 UTC (permalink / raw)
To: git; +Cc: oxsignal, Christian Couder
In-Reply-To: <20260629-pks-reftable-hardening-v2-0-b0228e7d908d@pks.im>
When doing the binary search through our restart offsets we may hit an
error in case `restart_needle_less()` fails to decode the record at the
given offset. While we correctly detect this case and error out, it will
cause us to call `reftable_record_release()` on the yet-uninitialized
record.
Fix this by initializing the record earlier.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
reftable/block.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/reftable/block.c b/reftable/block.c
index 4d285aefd7..89efce8751 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -517,6 +517,10 @@ int block_iter_seek_key(struct block_iter *it, struct reftable_buf *want)
int err = 0;
size_t i;
+ err = reftable_record_init(&rec, reftable_block_type(it->block));
+ if (err < 0)
+ goto done;
+
/*
* Perform a binary search over the block's restart points, which
* avoids doing a linear scan over the whole block. Like this, we
@@ -558,10 +562,6 @@ int block_iter_seek_key(struct block_iter *it, struct reftable_buf *want)
else
it->next_off = it->block->header_off + 4;
- err = reftable_record_init(&rec, reftable_block_type(it->block));
- if (err < 0)
- goto done;
-
/*
* We're looking for the last entry less than the wanted key so that
* the next call to `block_reader_next()` would yield the wanted
--
2.55.0.rc2.803.g1fd1e6609c.dirty
^ permalink raw reply related
* [PATCH v2 10/12] reftable/block: fix OOB read with bogus restart offset
From: Patrick Steinhardt @ 2026-06-29 9:02 UTC (permalink / raw)
To: git; +Cc: oxsignal, Christian Couder
In-Reply-To: <20260629-pks-reftable-hardening-v2-0-b0228e7d908d@pks.im>
Restart points encode records in a given block that do not use prefix
compression and that can thus immediately be seeked to. These offsets
are encoded in the restart table, where each offset needs to point at
one of the records of the block. We do not verify this though, so a
bogus restart offset may cause an out-of-bounds read:
==1472280==ERROR: AddressSanitizer: SEGV on unknown address 0x7d8ff7de5f7f (pc 0x55555599502b bp 0x7fffffff4df0 sp 0x7fffffff4d40 T0)
==1472280==The signal is caused by a READ memory access.
#0 0x55555599502b in get_var_int ./git/build/../reftable/record.c:30:6
#1 0x555555995c2a in reftable_decode_keylen ./git/build/../reftable/record.c:177:6
#2 0x55555598e85c in restart_needle_less ./git/build/../reftable/block.c:455:6
#3 0x55555598895f in binsearch ./git/build/../reftable/basics.c:175:9
#4 0x55555598e189 in block_iter_seek_key ./git/build/../reftable/block.c:543:6
#5 0x555555814aee in test_reftable_block__corrupt_restart_offset ./git/build/../t/unit-tests/u-reftable-block.c:636:20
#6 0x5555557f684e in clar_run_test ./git/build/../t/unit-tests/clar/clar.c:335:3
#7 0x5555557f2e69 in clar_run_suite ./git/build/../t/unit-tests/clar/clar.c:431:3
#8 0x5555557f2882 in clar_test_run ./git/build/../t/unit-tests/clar/clar.c:636:4
#9 0x5555557f375f in clar_test ./git/build/../t/unit-tests/clar/clar.c:687:11
#10 0x5555557fa49d in cmd_main ./git/build/../t/unit-tests/unit-test.c:62:8
#11 0x55555584c25a in main ./git/build/../common-main.c:9:11
#12 0x7ffff7a2b284 in __libc_start_call_main (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b284) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
#13 0x7ffff7a2b337 in __libc_start_main@GLIBC_2.2.5 (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b337) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
#14 0x555555694c24 in _start (./git/build/t/unit-tests+0x140c24)
==1472280==Register values:
rax = 0x00007d8ff7de5f7f rbx = 0x00007fffffff4e00 rcx = 0x00007d8ff7de5f80 rdx = 0x00007bfff5b6af60
rdi = 0x00007bfff5b6af40 rsi = 0x00007bfff592dfa0 rbp = 0x00007fffffff4df0 rsp = 0x00007fffffff4d40
r8 = 0x00000000ff00002b r9 = 0x00007d8ff7de5f7f r10 = 0x00000f7ffeb25bf0 r11 = 0xf3f30000f1f1f1f1
r12 = 0x00007fffffff58f8 r13 = 0x0000000000000001 r14 = 0x00007ffff7ffd000 r15 = 0x0000555556055fd0
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV ./git/build/../reftable/record.c:30:6 in get_var_int
Guard against such restart offsets and signal an error to the caller via
`args.error`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
reftable/block.c | 9 +++++++++
t/unit-tests/u-reftable-block.c | 39 +++++++++++++++++++++++++++++++++++++++
2 files changed, 48 insertions(+)
diff --git a/reftable/block.c b/reftable/block.c
index 89efce8751..1fa81405d2 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -440,6 +440,15 @@ static int restart_needle_less(size_t idx, void *_args)
uint8_t extra;
int n;
+ /*
+ * The restart offset must point to a record, which is stored before
+ * the restart table. Verify that this is the case.
+ */
+ if (off >= args->block->restart_off) {
+ args->error = 1;
+ return -1;
+ }
+
/*
* Records at restart points are stored without prefix compression, so
* there is no need to fully decode the record key here. This removes
diff --git a/t/unit-tests/u-reftable-block.c b/t/unit-tests/u-reftable-block.c
index a0fbfb247f..b8bb9a23e4 100644
--- a/t/unit-tests/u-reftable-block.c
+++ b/t/unit-tests/u-reftable-block.c
@@ -563,3 +563,42 @@ void test_reftable_block__corrupt_restart_count(void)
reftable_block_release(&block);
reftable_buf_release(&data);
}
+
+void test_reftable_block__corrupt_restart_offset(void)
+{
+ struct reftable_block_source source = { 0 };
+ struct reftable_record rec = {
+ .type = REFTABLE_BLOCK_TYPE_REF,
+ .u.ref = {
+ .value_type = REFTABLE_REF_VAL1,
+ .refname = (char *) "refs/heads/main",
+ },
+ };
+ struct reftable_block block = { 0 };
+ struct block_iter it = BLOCK_ITER_INIT;
+ struct reftable_buf want = REFTABLE_BUF_INIT;
+ struct reftable_buf data = REFTABLE_BUF_INIT;
+
+ cl_reftable_write_block(&data, REFTABLE_BLOCK_TYPE_REF, &rec, 1);
+
+ block_source_from_buf(&source, &data);
+ cl_must_pass(reftable_block_init(&block, &source, 0, 0, data.len,
+ REFTABLE_HASH_SIZE_SHA1, REFTABLE_BLOCK_TYPE_REF));
+
+ /*
+ * Corrupt the first restart offset, stored as a big-endian 24-bit
+ * integer at the start of the restart table, to point past the end of
+ * the records section. Seeking such a block must fail gracefully.
+ */
+ reftable_put_be24((uint8_t *) block.block_data.data + block.restart_off,
+ 0xffffff);
+
+ block_iter_init(&it, &block);
+ cl_must_pass(reftable_buf_addstr(&want, "refs/heads/main"));
+ cl_assert_equal_i(block_iter_seek_key(&it, &want), REFTABLE_FORMAT_ERROR);
+
+ reftable_buf_release(&want);
+ block_iter_close(&it);
+ reftable_block_release(&block);
+ reftable_buf_release(&data);
+}
--
2.55.0.rc2.803.g1fd1e6609c.dirty
^ permalink raw reply related
* [PATCH v2 11/12] reftable/table: fix NULL pointer access when seeking to bogus offsets
From: Patrick Steinhardt @ 2026-06-29 9:02 UTC (permalink / raw)
To: git; +Cc: oxsignal, Christian Couder
In-Reply-To: <20260629-pks-reftable-hardening-v2-0-b0228e7d908d@pks.im>
When seeking an iterator to an arbitrary offset we may return a positive
value in case the offset points beyond the block. This makes sense when
iterating through multiple blocks of the same section, as the positive
value indicates to us that we're at the end of the table.
But when the offset originates from a section or index offset it is
supposed to point at a valid block, so an out-of-bounds value means that
the table is corrupt. Treating it as a normal end-of-iteration causes us
to silently report an empty section instead of surfacing the corruption,
and we are left with a partially-initialized block. This may later on
cause a NULL pointer exception:
==1486841==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x55555598e02c bp 0x7fffffff4eb0 sp 0x7fffffff4e70 T0)
==1486841==The signal is caused by a READ memory access.
==1486841==Hint: address points to the zero page.
#0 0x55555598e02c in reftable_block_type ./git/build/../reftable/block.c:392:9
#1 0x55555598ee6e in block_iter_seek_key ./git/build/../reftable/block.c:536:35
#2 0x5555559ae553 in table_iter_seek_linear ./git/build/../reftable/table.c:344:8
#3 0x5555559adbff in table_iter_seek ./git/build/../reftable/table.c:450:9
#4 0x5555559ada9c in table_iter_seek_void ./git/build/../reftable/table.c:460:9
#5 0x555555992872 in reftable_iterator_seek_log_at ./git/build/../reftable/iter.c:281:9
#6 0x555555992953 in reftable_iterator_seek_log ./git/build/../reftable/iter.c:287:9
#7 0x55555583aa78 in test_reftable_table__seek_invalid_log_offset ./git/build/../t/unit-tests/u-reftable-table.c:257:20
#8 0x5555557f684e in clar_run_test ./git/build/../t/unit-tests/clar/clar.c:335:3
#9 0x5555557f2e69 in clar_run_suite ./git/build/../t/unit-tests/clar/clar.c:431:3
#10 0x5555557f2882 in clar_test_run ./git/build/../t/unit-tests/clar/clar.c:636:4
#11 0x5555557f375f in clar_test ./git/build/../t/unit-tests/clar/clar.c:687:11
#12 0x5555557fa49d in cmd_main ./git/build/../t/unit-tests/unit-test.c:62:8
#13 0x55555584cffa in main ./git/build/../common-main.c:9:11
#14 0x7ffff7a2b284 in __libc_start_call_main (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b284) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
#15 0x7ffff7a2b337 in __libc_start_main@GLIBC_2.2.5 (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b337) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
#16 0x555555694c24 in _start (./git/build/t/unit-tests+0x140c24)
==1486841==Register values:
rax = 0x0000000000000000 rbx = 0x00007fffffff4ec0 rcx = 0x0000000000000000 rdx = 0x00007cfff6e2bd58
rdi = 0x00007cfff6e2bd58 rsi = 0x00007bfff5da1020 rbp = 0x00007fffffff4eb0 rsp = 0x00007fffffff4e70
r8 = 0x0000000000000000 r9 = 0x0000000000000002 r10 = 0x0000000000000000 r11 = 0x0000000000000017
r12 = 0x00007fffffff5908 r13 = 0x0000000000000001 r14 = 0x00007ffff7ffd000 r15 = 0x0000555556056e90
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV ./git/build/../reftable/block.c:392:9 in reftable_block_type
==1486841==ABORTING
Fix this by returning a proper error in `table_iter_seek_to()` when the
offset ranges beyond the block.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
reftable/table.c | 2 ++
t/unit-tests/u-reftable-table.c | 63 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 65 insertions(+)
diff --git a/reftable/table.c b/reftable/table.c
index 56362df0ed..f4bc86a29d 100644
--- a/reftable/table.c
+++ b/reftable/table.c
@@ -242,6 +242,8 @@ static int table_iter_seek_to(struct table_iter *ti, uint64_t off, uint8_t typ)
int err;
err = table_init_block(ti->table, &ti->block, off, typ);
+ if (err > 0)
+ return REFTABLE_FORMAT_ERROR;
if (err != 0)
return err;
diff --git a/t/unit-tests/u-reftable-table.c b/t/unit-tests/u-reftable-table.c
index 14fae8b199..c7dca45e70 100644
--- a/t/unit-tests/u-reftable-table.c
+++ b/t/unit-tests/u-reftable-table.c
@@ -1,8 +1,11 @@
#include "unit-test.h"
#include "lib-reftable.h"
+#include "reftable/basics.h"
+#include "reftable/block.h"
#include "reftable/blocksource.h"
#include "reftable/constants.h"
#include "reftable/iter.h"
+#include "reftable/reftable-error.h"
#include "reftable/table.h"
#include "strbuf.h"
@@ -199,3 +202,63 @@ void test_reftable_table__block_iterator(void)
reftable_buf_release(&buf);
reftable_free(records);
}
+
+void test_reftable_table__seek_invalid_log_offset(void)
+{
+ struct reftable_ref_record refs[] = {
+ {
+ .refname = (char *) "refs/heads/main",
+ .value_type = REFTABLE_REF_VAL1,
+ .value.val1 = { 42 },
+ },
+ };
+ struct reftable_log_record logs[] = {
+ {
+ .refname = (char *) "refs/heads/main",
+ .update_index = 1,
+ .value_type = REFTABLE_LOG_UPDATE,
+ .value.update = {
+ .name = (char *) "user",
+ .email = (char *) "user@example.com",
+ .message = (char *) "message\n",
+ },
+ },
+ };
+ struct reftable_block_source source = { 0 };
+ struct reftable_log_record log = { 0 };
+ struct reftable_iterator it = { 0 };
+ struct reftable_table *table;
+ struct reftable_buf buf = REFTABLE_BUF_INIT;
+ size_t fsize = footer_size(1);
+ uint8_t *footer;
+
+ cl_reftable_write_to_buf(&buf, refs, ARRAY_SIZE(refs),
+ logs, ARRAY_SIZE(logs), NULL);
+
+ /*
+ * Corrupt the log section offset stored in the footer so that it points
+ * past the end of the table. The footer is checksummed, so we also have
+ * to recompute and rewrite the CRC.
+ */
+ footer = (uint8_t *) buf.buf + buf.len - fsize;
+ reftable_put_be64(footer + header_size(1) + 24, UINT64_MAX);
+ reftable_put_be32(footer + fsize - 4, crc32(0, footer, fsize - 4));
+
+ block_source_from_buf(&source, &buf);
+ cl_must_pass(reftable_table_new(&table, &source, "name"));
+
+ /*
+ * Seeking the log iterator must not crash even though the log section
+ * offset is bogus. As the offset points past the end of the table we
+ * know that the table is corrupt, so the seek must report a format
+ * error instead of pretending that the section is empty.
+ */
+ reftable_table_init_log_iterator(table, &it);
+ cl_assert_equal_i(reftable_iterator_seek_log(&it, ""),
+ REFTABLE_FORMAT_ERROR);
+
+ reftable_log_record_release(&log);
+ reftable_iterator_destroy(&it);
+ reftable_table_decref(table);
+ reftable_buf_release(&buf);
+}
--
2.55.0.rc2.803.g1fd1e6609c.dirty
^ permalink raw reply related
* [PATCH v2 12/12] reftable/table: fix OOB read on truncated table
From: Patrick Steinhardt @ 2026-06-29 9:02 UTC (permalink / raw)
To: git; +Cc: oxsignal, Christian Couder
In-Reply-To: <20260629-pks-reftable-hardening-v2-0-b0228e7d908d@pks.im>
When opening a table we compute the size of its data section by
subtracting the footer size from the file size. We do not verify that
the file is actually large enough to contain both the header and the
footer though. With a truncated table the subtraction can thus
underflow, causing us to read the footer out of bounds:
SUMMARY: AddressSanitizer: heap-buffer-overflow (/home/pks/Development/git/build/t/unit-tests+0x2479a4) in __asan_memcpy
Shadow bytes around the buggy address:
0x7ccff6e0de80: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
0x7ccff6e0df00: fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa
0x7ccff6e0df80: fa fa fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x7ccff6e0e000: fd fd fd fd fa fa fa fa fa fa fa fa fd fd fd fd
0x7ccff6e0e080: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa fa
=>0x7ccff6e0e100: fa fa fa fa fa[fa]00 00 00 00 00 00 00 00 00 00
0x7ccff6e0e180: 00 00 00 00 00 00 00 04 fa fa fa fa fa fa fa fa
0x7ccff6e0e200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x7ccff6e0e280: 00 00 fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x7ccff6e0e300: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x7ccff6e0e380: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==1500371==ABORTING
Verify that the file is large enough to contain both the header and the
footer before computing the table size.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
reftable/table.c | 5 +++++
t/unit-tests/u-reftable-table.c | 28 ++++++++++++++++++++++++++++
2 files changed, 33 insertions(+)
diff --git a/reftable/table.c b/reftable/table.c
index f4bc86a29d..b4d3f9e211 100644
--- a/reftable/table.c
+++ b/reftable/table.c
@@ -562,6 +562,11 @@ int reftable_table_new(struct reftable_table **out,
goto done;
}
+ if (file_size < header_size(t->version) + footer_size(t->version)) {
+ err = REFTABLE_FORMAT_ERROR;
+ goto done;
+ }
+
t->size = file_size - footer_size(t->version);
t->source = *source;
t->name = reftable_strdup(name);
diff --git a/t/unit-tests/u-reftable-table.c b/t/unit-tests/u-reftable-table.c
index c7dca45e70..28b0ef5258 100644
--- a/t/unit-tests/u-reftable-table.c
+++ b/t/unit-tests/u-reftable-table.c
@@ -262,3 +262,31 @@ void test_reftable_table__seek_invalid_log_offset(void)
reftable_table_decref(table);
reftable_buf_release(&buf);
}
+
+void test_reftable_table__new_with_truncated_table(void)
+{
+ struct reftable_ref_record refs[] = {
+ {
+ .refname = (char *) "refs/heads/main",
+ .value_type = REFTABLE_REF_VAL1,
+ .value.val1 = { 42 },
+ },
+ };
+ struct reftable_block_source source = { 0 };
+ struct reftable_table *table;
+ struct reftable_buf buf = REFTABLE_BUF_INIT;
+
+ cl_reftable_write_to_buf(&buf, refs, ARRAY_SIZE(refs), NULL, 0, NULL);
+
+ /*
+ * Truncate the table so that it is large enough to read the header, but
+ * too small to also contain the footer.
+ */
+ buf.len = footer_size(1) - 1;
+ block_source_from_buf(&source, &buf);
+
+ cl_assert_equal_i(reftable_table_new(&table, &source, "name"),
+ REFTABLE_FORMAT_ERROR);
+
+ reftable_buf_release(&buf);
+}
--
2.55.0.rc2.803.g1fd1e6609c.dirty
^ permalink raw reply related
* Re: [PATCH v4 6/8] commit-reach: remove unused nonstale_queue dedup wrappers
From: Kristofer Karlsson @ 2026-06-29 10:09 UTC (permalink / raw)
To: SZEDER Gábor
Cc: Kristofer Karlsson via GitGitGadget, git, Derrick Stolee,
Elijah Newren
In-Reply-To: <akIBvWT7nIWntCNT@szeder.dev>
On Mon, 29 Jun 2026 at 07:25, SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> On Sun, Jun 28, 2026 at 12:25:44PM +0000, Kristofer Karlsson via GitGitGadget wrote:
> > From: Kristofer Karlsson <krka@spotify.com>
> >
> > nonstale_queue_put_dedup() and nonstale_queue_get_dedup() became
> > unused after the previous commit. The core nonstale_queue functions
> > remain in use by ahead_behind().
>
> Please squash this patch into the previous one. Since the last
> callers of these static functions went away in that commit, it can't
> be built with DEVELOPER=1:
>
> commit-reach.c:91:23: warning: ‘nonstale_queue_get_dedup’ defined but not used [-Wunused-function]
> 91 | static struct commit *nonstale_queue_get_dedup(struct nonstale_queue *queue)
> | ^~~~~~~~~~~~~~~~~~~~~~~~
> commit-reach.c:82:13: warning: ‘nonstale_queue_put_dedup’ defined but not used [-Wunused-function]
> 82 | static void nonstale_queue_put_dedup(struct nonstale_queue *queue,
> | ^~~~~~~~~~~~~~~~~~~~~~~~
>
Thanks, will squash for v5! It's unfortunate that this means the commit itself
becomes less clean, but I don't have any other good solution
-- and having each commit compile cleanly is more important.
- Kristofer
^ permalink raw reply
* Re: [GSoC Blog] Week 5 : Improve Disk Space Recovery for Partial Clones
From: Siddharth Shrimali @ 2026-06-29 11:49 UTC (permalink / raw)
To: git; +Cc: Christian Couder, Siddharth Asthana
In-Reply-To: <CAGWgyh_WJ2mAgbJ2agp9UQm8iyR=eq0xWjdYT59CC9fZTnAbzA@mail.gmail.com>
Hello everyone,
My latest blog post, covering week 5, is now live:
https://siddharth.shrimali.info/#post/7
Please feel free to review my work and share your feedback.
Always open to discussions! :)
Regards,
Siddharth Shrimali
^ permalink raw reply
* Re: [PATCH v4 0/8] commit-reach: terminate merge-base walk when one side is exhausted
From: Kristofer Karlsson @ 2026-06-29 12:11 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
In-Reply-To: <48bfdb11-2624-4aa6-8fbd-d3f894c33bcc@gmail.com>
On Sun, 28 Jun 2026 at 17:16, Derrick Stolee <stolee@gmail.com> wrote:
>
> I reviewed the v3 discussion, the range-diff, and reread patch 8. I think
> that this version is good to go.
Thanks for all your reviews and feedback. However, I found one more
problem that needs to be resolved before this is good to go.
paint_down_to_common() has this fallback:
if (!min_generation && !corrected_commit_dates_enabled(r))
queue.pq.compare = compare_commits_by_commit_date;
When this fires, the queue uses commit-date ordering instead of
generation ordering. The side-exhaustion optimization and my older
patch for !FIND_ALL early exit both check for reaching the finite
generation, but with date ordering, that check is wrong --
a commit can have a finite topo level (it is in a v1 commit graph)
while the queue is not ordered by generation. This unfortunately
means there is a regression for the !FIND_ALL optimization that
I should fix before 2.55 is final. I will send a small patch for
that separately: add tests that demonstrate the problem, and disable
the !FIND_ALL early exit when generation ordering is not active.
I traced the history of this fallback. The queue was switched from
date ordering to generation ordering in 3afc679b (2018-05). Then in
091f4cf3 (2018-08) you added the date fallback after finding that v1
topo levels caused "git merge-base v4.8 v4.9" on the Linux kernel to
walk 636k commits instead of 167k -- a side branch with a low topo
level stayed in the queue behind a long chain, preventing early STALE
propagation. Later, 8d00d7c3 (2021-01) tightened the fallback to
only fire without corrected commit dates, since v2 does not have the
regression.
The problem that 091f4cf3 addresses looks closely related to what
side-exhaustion solves: the walk goes deep into a subgraph where
only one paint side has presence. With side-exhaustion, the walk
terminates as soon as one paint side is exhausted from the queue,
so the deep walk never happens regardless of queue ordering.
I benchmarked "git merge-base --all v4.8 v4.9" on the Linux kernel
(the same case from 091f4cf3) with three configurations:
master (--all) side-exhaust (--all, gen ordering)
no graph: 3212 ms 3268 ms
v1 graph: 188 ms 17 ms
v2 graph: 227 ms 17 ms
With side-exhaustion, the v1 case no longer shows a regression
compared to the date fallback -- if anything, it is slightly faster
since the walk terminates earlier. This suggests that the workaround
from 091f4cf3 may no longer be needed when side-exhaustion is
present.
It is also worth noting that commitGraph.generationVersion has
defaulted to 2 since 2021, so the v1 fallback path is rarely
exercised in practice. Any commit-graph rewrite produces v2 data,
and only repos that have not rewritten their commit graph in over
four years would still have v1-only data.
If that reasoning holds, the fix for v5 would be to remove the date
fallback entirely, always using compare_commits_by_gen_then_commit_date.
This would:
1. Fix the bug (finite generation always means generation-ordered
queue).
2. Remove corrected_commit_dates_enabled() which has no other
callers.
The alternative would be to keep the fallback and disable the
optimizations that depend on ordering (via a flag like
paint_state.gen_ordered).
Do you see any cases I might be missing where removing the fallback
could cause problems?
Thanks,
Kristofer
^ permalink raw reply
* Re: [PATCH v4 0/8] commit-reach: terminate merge-base walk when one side is exhausted
From: Derrick Stolee @ 2026-06-29 12:40 UTC (permalink / raw)
To: Kristofer Karlsson
Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
In-Reply-To: <CAL71e4O8gTLm4WUcPF-ZbOYTuEzuNSVh0Qjf8ys1w4LVF9Hi8Q@mail.gmail.com>
On 6/29/2026 8:11 AM, Kristofer Karlsson wrote:
> On Sun, 28 Jun 2026 at 17:16, Derrick Stolee <stolee@gmail.com> wrote:
>>
>> I reviewed the v3 discussion, the range-diff, and reread patch 8. I think
>> that this version is good to go.
>
> Thanks for all your reviews and feedback. However, I found one more
> problem that needs to be resolved before this is good to go.
>
> paint_down_to_common() has this fallback:
>
> if (!min_generation && !corrected_commit_dates_enabled(r))
> queue.pq.compare = compare_commits_by_commit_date;
...> I traced the history of this fallback.
...> The problem that 091f4cf3 addresses looks closely related to what
> side-exhaustion solves: the walk goes deep into a subgraph where
> only one paint side has presence. With side-exhaustion, the walk
> terminates as soon as one paint side is exhausted from the queue,
> so the deep walk never happens regardless of queue ordering.
>
> I benchmarked "git merge-base --all v4.8 v4.9" on the Linux kernel
> (the same case from 091f4cf3) with three configurations:
>
> master (--all) side-exhaust (--all, gen ordering)
> no graph: 3212 ms 3268 ms
> v1 graph: 188 ms 17 ms
> v2 graph: 227 ms 17 ms
>
> With side-exhaustion, the v1 case no longer shows a regression
> compared to the date fallback -- if anything, it is slightly faster
> since the walk terminates earlier. This suggests that the workaround
> from 091f4cf3 may no longer be needed when side-exhaustion is
> present.
Thanks for digging into the history of this fallback and catching it
during review!
> If that reasoning holds, the fix for v5 would be to remove the date
> fallback entirely, always using compare_commits_by_gen_then_commit_date.
> This would:
>
> 1. Fix the bug (finite generation always means generation-ordered
> queue).
> 2. Remove corrected_commit_dates_enabled() which has no other
> callers.
I agree with your reasoning, data-backed discovery, and the course of
action to fix this. I'm happy that you're able to close the loop on
this long-standing performance issue even with v1 generation numbers.
> Do you see any cases I might be missing where removing the fallback
> could cause problems?
I don't see any other concerns here. You're right that if we were to
have a different mode that changes the priority-queue ordering, then
the side-exhaustion optimization cannot be trusted, but you will
remove this possibility.
It _may_ be worth mentioning this with a comment when initializing
the queue order for the paint_queue, because the use of the queue
requires topological ordering.
Thanks,
-Stolee
^ permalink raw reply
* Re: [PATCH v4 0/8] commit-reach: terminate merge-base walk when one side is exhausted
From: Kristofer Karlsson @ 2026-06-29 12:59 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Kristofer Karlsson via GitGitGadget, git, Elijah Newren
In-Reply-To: <5ef694a3-9164-4ab4-8835-136439f6d267@gmail.com>
On Mon, 29 Jun 2026 at 14:40, Derrick Stolee <stolee@gmail.com> wrote:
>
> I agree with your reasoning, data-backed discovery, and the course of
> action to fix this. I'm happy that you're able to close the loop on
> this long-standing performance issue even with v1 generation numbers.
Sounds good, then I can continue with the approach of removing some code
(even though it will likely be a net addition in the end).
> > Do you see any cases I might be missing where removing the fallback
> > could cause problems?
> I don't see any other concerns here. You're right that if we were to
> have a different mode that changes the priority-queue ordering, then
> the side-exhaustion optimization cannot be trusted, but you will
> remove this possibility.
>
> It _may_ be worth mentioning this with a comment when initializing
> the queue order for the paint_queue, because the use of the queue
> requires topological ordering.
Yes my plan is to rewrite v5 in a few ways:
- update original documentation to note that infinite -> finite
generation does not always hold
- add a test (or more than one) for this problem
- don't introduce the bug at any point
- add a commit to replace the disabled optimization with
removal of the commit-date based ordering (+ doc update)
Thanks for helping with this,
Kristofer
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox