From: Patrick Steinhardt <ps@pks.im>
To: Karthik Nayak <karthik.188@gmail.com>
Cc: git@vger.kernel.org, jltobler@gmail.com, shejialuo@gmail.com
Subject: Re: [PATCH v2 2/5] refs/reftable: add fsck check for checking the table name
Date: Wed, 3 Sep 2025 10:07:13 +0200 [thread overview]
Message-ID: <aLf3MaKHZSQfnBlT@pks.im> (raw)
In-Reply-To: <20250902-228-reftable-introduce-consistency-checks-v2-2-4f96b3834779@gmail.com>
On Tue, Sep 02, 2025 at 09:05:22AM +0200, Karthik Nayak wrote:
> The `git refs verify` command is used to run fsck checks on the
> reference backends. This command is also invoked when users run 'git
> fsck'. While the files-backend has some fsck checks added, the reftable
> backend lacks such checks. Let's add the required infrastructure and a
> check to test for the table names in the 'tables.list' of reftables.
>
> For the infrastructure, since the reftable library is treated as an
> independent library we should ensure that the library code works
> independently without knowledge about Git's internals. To do this,
> add both 'reftable/fsck.c' and 'reftable/reftable-fsck.h'. Which
> provide an entry point 'reftable_fsck_check' for running fsck checks
> over a provided reftable stack. The callee provides the function with
> callbacks to handle issue and information reporting.
>
> Add glue code in 'refs/reftable-backend.c' which calls the reftable
> library to perform the fsck checks. Here we also map the reftable errors
> to Git' fsck errors.
>
> Introduce a check to validate table names for a given reftable stack.
> Also add 'badReftableTableName' as a corresponding error within Git. Add
> a test to check for this behavior.
>
> While here, remove a unused header `#include "../lockfile.h"` from
> 'refs/reftable-backend.c'.
It's quite a bunch of changes overall that could've been reasonably
split up into multiple commits. E.g. one to introduce the reftable-side
logic, one to start calling it in Git, and one to drop the superfluous
header.
> diff --git a/Makefile b/Makefile
> index e11340c1ae..f2ddcc8d7c 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -2733,6 +2733,7 @@ REFTABLE_OBJS += reftable/error.o
> REFTABLE_OBJS += reftable/block.o
> REFTABLE_OBJS += reftable/blocksource.o
> REFTABLE_OBJS += reftable/iter.o
> +REFTABLE_OBJS += reftable/fsck.o
"f" is before "i" in the alphabet I'm accustomed to :) So let's retain
lexicographic ordering here.
> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index 8dae1e1112..c38c6422f8 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -2675,11 +2676,55 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
> return ret;
> }
>
> -static int reftable_be_fsck(struct ref_store *ref_store UNUSED,
> - struct fsck_options *o UNUSED,
> +static void reftable_fsck_verbose_handler(const char *msg, void *cb_data)
> +{
> + struct fsck_options *o = cb_data;
> +
> + if (o->verbose)
> + fprintf_ln(stderr, "%s", _(msg));
> +}
Is this `_()` marker correct here? There isn't really any reasonable way
for somebody to translate a variable with unknown contents. So shouldn't
it only be the caller of `reftable_fsck_verbose_handler()` that should
mark the string as translatable?
> +static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
> + void *cb_data)
> +{
> + struct fsck_ref_report report = { .path = info->path };
> + struct fsck_options *o = cb_data;
> + enum fsck_msg_id msg_id;
> +
> + switch (info->error) {
> + case REFTABLE_FSCK_ERROR_TABLE_NAME:
> + msg_id = FSCK_MSG_BAD_REFTABLE_TABLE_NAME;
> + break;
> + default:
> + BUG("unknown fsck error: %d", info->error);
> + }
> +
> + return fsck_report_ref(o, &report, msg_id, "%s", info->msg);
> +}
I think this function will become a bit unwieldy over time. We might
instead want to have an array that maps from reftable-specific to
fsck-specific error code:
static const fsck_msg_id[] = {
[REFTABLE_FSCK_ERROR_TABLE_NAME] = FSCK_MSG_BAD_REFTABLE_TABLE_NAME,
};
So in that case, all we'd have to do is to perform bounds checking in
the above function. And maybe verify that the developer didn't forget to
fill in a new msg ID by checking that the derived message ID is non-zero.
> +static int reftable_be_fsck(struct ref_store *ref_store, struct fsck_options *o,
> struct worktree *wt UNUSED)
> {
> - return 0;
> + struct reftable_ref_store *refs;
> + struct strmap_entry *entry;
> + struct hashmap_iter iter;
> + int ret = 0;
> +
> + refs = reftable_be_downcast(ref_store, REF_STORE_READ, "fsck");
> +
> + if (o->verbose)
> + fprintf_ln(stderr, _("Checking references consistency"));
This line is duplicate across both backends, right? Maybe it's something
that we can do in the generic logic?
> + ret |= reftable_fsck_check(refs->main_backend.stack, reftable_fsck_error_handler,
> + reftable_fsck_verbose_handler, o);
> +
> + strmap_for_each_entry(&refs->worktree_backends, &iter, entry) {
> + struct reftable_backend *b = (struct reftable_backend *)entry->value;
> + ret |= reftable_fsck_check(b->stack, reftable_fsck_error_handler,
> + reftable_fsck_verbose_handler, o);
> + }
> +
> + return ret;
> }
>
> struct ref_storage_be refs_be_reftable = {
Looks good.
> diff --git a/reftable/fsck.c b/reftable/fsck.c
> new file mode 100644
> index 0000000000..4282b1413e
> --- /dev/null
> +++ b/reftable/fsck.c
> @@ -0,0 +1,53 @@
> +#include "basics.h"
> +#include "reftable-fsck.h"
> +#include "stack.h"
> +
> +int reftable_fsck_check(struct reftable_stack *stack,
> + reftable_fsck_report_fn report_fn,
> + reftable_fsck_verbose_fn verbose_fn,
> + void *cb_data)
> +{
> +
> + char **names = NULL;
> + uint64_t min, max;
> + int err = 0;
> +
> + if (stack == NULL)
> + goto out;
> +
> + err = read_lines(stack->list_file, &names);
> + if (err < 0)
> + goto out;
> +
> + verbose_fn("Checking reftable table names", cb_data);
> +
> + for (size_t i = 0; names[i]; i++) {
> + struct reftable_fsck_info info = {
> + .error = REFTABLE_FSCK_ERROR_TABLE_NAME,
> + .path = names[i],
> + };
> + uint32_t rnd;
> + /*
> + * We want to match the tail '.ref'. One extra byte to ensure
> + * that there is no unexpected extra character and one byte for
> + * the null terminator added by sscanf.
> + */
> + char tail[6];
> +
> + if (sscanf(names[i], "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x%5s",
> + &min, &max, &rnd, tail) != 4) {
> + info.msg = "invalid reftable table name";
This here is where the string should be translated.
> + err = report_fn(&info, cb_data);
> + continue;
> + }
I think sscanf is quite frowned-upon in the Git codebase. Maybe we
should manually parse through the string instead?
Furthermore, I think we should move every single check into a separate
function, similar to how the files backend does it. This ensures that
checks are self-contained and that it's way easier to add new checks
over time.
Another angle: did you verify that reftables written by JGit follow this
format?
> + if (strcmp(tail, ".ref")) {
> + info.msg = "invalid reftable table extension";
Same here, this should be translated.
> diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
> new file mode 100644
> index 0000000000..4cf0053234
> --- /dev/null
> +++ b/reftable/reftable-fsck.h
> @@ -0,0 +1,38 @@
> +#ifndef REFTABLE_FSCK_H
> +#define REFTABLE_FSCK_H
> +
> +#include "reftable-stack.h"
> +
> +enum reftable_fsck_error {
> + /* Invalid table name */
> + REFTABLE_FSCK_ERROR_TABLE_NAME = -1,
> +};
Wouldn't it be more natural to give these positive numbers?
> +/* Represents an individual error encountered during the FSCK checks. */
> +struct reftable_fsck_info {
> + enum reftable_fsck_error error;
> + const char *msg;
> + const char *path;
> +};
I wonder whether it should be the reftable library that decides on the
severity of each generated finding.
> +typedef int reftable_fsck_report_fn(struct reftable_fsck_info *info,
> + void *cb_data);
> +typedef void reftable_fsck_verbose_fn(const char *msg, void *cb_data);
> +
> +/*
> + * Given a reftable stack, perform FSCK check on the stack.
s/FSCK check/consistency checks/
> + *
> + * If an issue is encountered, the issue is reported to the callee via the
> + * provided 'report_fn'. If the issue is non-recoverable the flow will not
> + * continue. If it is recoverable, the flow will continue and further issues
> + * will be reported as identified.
> + *
> + * The 'verbose_fn' will be invoked to provide verbose information about
> + * the progress and state of the FSCK checks.
Same here.
> diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
> new file mode 100755
> index 0000000000..81d30df2d7
> --- /dev/null
> +++ b/t/t0614-reftable-fsck.sh
> @@ -0,0 +1,58 @@
> +#!/bin/sh
> +
> +test_description='Test reftable backend consistency check'
> +
> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
Tests shouldn't define these variables, but should dynamically figure
out what the default branch name is as required, e.g. by using
git-symbolic-ref(1).
> +GIT_TEST_DEFAULT_REF_FORMAT=reftable
> +export GIT_TEST_DEFAULT_REF_FORMAT
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'table name should be checked' '
> + test_when_finished "rm -rf repo" &&
> + git init repo &&
> + (
> + cd repo &&
> + git commit --allow-empty -m initial &&
> +
> + git refs verify 2>err &&
> + test_must_be_empty err &&
> +
> + TABLE_NAME=$(cat .git/reftable/tables.list | head -n1) &&
You can drop the cat(1) invocation and directly say `head -n1 file`.
> + sed "1s/^/extra/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
> + mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
> + mv .git/reftable/${TABLE_NAME} .git/reftable/extra${TABLE_NAME} &&
No need for the curly braces around TABLE_NAME here and further down. It
would be nice to quote these strings though.
> +
> + test_must_fail git refs verify 2>err &&
> + cat >expect <<-EOF &&
> + error: extra${TABLE_NAME}: badReftableTableName: invalid reftable table name
> + EOF
> + test_cmp expect err
> + )
> +'
> +
> +test_expect_success 'table name should be checked' '
> + test_when_finished "rm -rf repo" &&
> + git init repo &&
> + (
> + cd repo &&
> + git commit --allow-empty -m initial &&
> +
> + git refs verify 2>err &&
> + test_must_be_empty err &&
> +
> + TABLE_NAME=$(cat .git/reftable/tables.list | head -n1) &&
Same here wrt the extra invocation of cat(1).
Patrick
next prev parent reply other threads:[~2025-09-03 8:07 UTC|newest]
Thread overview: 96+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-19 12:20 [PATCH 0/5] refs/reftable: add fsck checks Karthik Nayak
2025-08-19 12:21 ` [PATCH 1/5] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
2025-08-19 12:21 ` [PATCH 2/5] refs/reftable: add fsck check for checking the table name Karthik Nayak
2025-08-26 16:21 ` shejialuo
2025-09-01 13:33 ` Karthik Nayak
2025-09-03 13:39 ` shejialuo
2025-08-19 12:21 ` [PATCH 3/5] refs/reftable: add fsck check for number of tables Karthik Nayak
2025-08-26 16:33 ` shejialuo
2025-09-01 13:40 ` Karthik Nayak
2025-08-26 16:44 ` shejialuo
2025-09-01 13:52 ` Karthik Nayak
2025-08-19 12:21 ` [PATCH 4/5] refs/reftable: add fsck check for trailing newline Karthik Nayak
2025-08-19 12:21 ` [PATCH 5/5] refs/reftable: add fsck check for incorrect update index Karthik Nayak
2025-08-26 16:39 ` [PATCH 0/5] refs/reftable: add fsck checks shejialuo
2025-09-01 13:52 ` Karthik Nayak
2025-09-02 7:05 ` [PATCH v2 " Karthik Nayak
2025-09-02 7:05 ` [PATCH v2 1/5] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
2025-09-02 22:25 ` Junio C Hamano
2025-09-08 13:00 ` Karthik Nayak
2025-09-02 7:05 ` [PATCH v2 2/5] refs/reftable: add fsck check for checking the table name Karthik Nayak
2025-09-03 8:07 ` Patrick Steinhardt [this message]
2025-09-03 16:51 ` shejialuo
2025-09-09 13:49 ` Karthik Nayak
2025-09-09 8:42 ` Karthik Nayak
2025-09-02 7:05 ` [PATCH v2 3/5] refs/reftable: add fsck check for number of tables Karthik Nayak
2025-09-03 8:07 ` Patrick Steinhardt
2025-09-15 9:27 ` Karthik Nayak
2025-09-02 7:05 ` [PATCH v2 4/5] refs/reftable: add fsck check for trailing newline Karthik Nayak
2025-09-02 22:38 ` Junio C Hamano
2025-09-03 8:07 ` Patrick Steinhardt
2025-09-02 7:05 ` [PATCH v2 5/5] refs/reftable: add fsck check for incorrect update index Karthik Nayak
2025-09-02 22:42 ` Junio C Hamano
2025-09-18 8:11 ` Karthik Nayak
2025-09-18 8:11 ` [PATCH v3 0/8] refs/reftable: add consistency checks Karthik Nayak
2025-09-18 8:11 ` [PATCH v3 1/8] refs: remove unused headers Karthik Nayak
2025-09-18 8:11 ` [PATCH v3 2/8] refs: move consistency check msg to generic layer Karthik Nayak
2025-09-18 8:11 ` [PATCH v3 3/8] reftable: check for trailing newline in 'tables.list' Karthik Nayak
2025-09-18 15:36 ` Junio C Hamano
2025-09-23 15:42 ` Karthik Nayak
2025-09-24 5:54 ` Patrick Steinhardt
2025-09-24 10:02 ` Karthik Nayak
2025-09-24 7:24 ` Kristoffer Haugsbakk
2025-09-24 11:06 ` Karthik Nayak
2025-09-18 8:11 ` [PATCH v3 4/8] reftable: ensure tables in a stack use sequential update indices Karthik Nayak
2025-09-24 5:54 ` Patrick Steinhardt
2025-09-24 11:20 ` Karthik Nayak
2025-09-24 18:04 ` Junio C Hamano
2025-09-24 20:13 ` Karthik Nayak
2025-09-25 6:12 ` Patrick Steinhardt
2025-09-25 16:22 ` Junio C Hamano
2025-09-18 8:11 ` [PATCH v3 5/8] Documentation/fsck-msgids: remove duplicate msg id Karthik Nayak
2025-09-18 8:11 ` [PATCH v3 6/8] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
2025-09-18 8:11 ` [PATCH v3 7/8] reftable: add code to facilitate consistency checks Karthik Nayak
2025-09-24 5:54 ` Patrick Steinhardt
2025-09-24 18:40 ` Karthik Nayak
2025-09-25 6:14 ` Patrick Steinhardt
2025-09-18 8:11 ` [PATCH v3 8/8] refs/reftable: add fsck check for checking the table name Karthik Nayak
2025-09-24 5:54 ` Patrick Steinhardt
2025-09-24 18:44 ` Karthik Nayak
2025-09-26 7:25 ` [PATCH v4 0/7] refs/reftable: add consistency checks Karthik Nayak
2025-09-26 7:25 ` [PATCH v4 1/7] refs: remove unused headers Karthik Nayak
2025-09-26 7:25 ` [PATCH v4 2/7] refs: move consistency check msg to generic layer Karthik Nayak
2025-09-26 7:25 ` [PATCH v4 3/7] reftable: check for trailing newline in 'tables.list' Karthik Nayak
2025-10-02 11:44 ` Patrick Steinhardt
2025-10-06 12:02 ` Karthik Nayak
2025-09-26 7:25 ` [PATCH v4 4/7] Documentation/fsck-msgids: remove duplicate msg id Karthik Nayak
2025-09-26 7:25 ` [PATCH v4 5/7] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
2025-09-26 7:25 ` [PATCH v4 6/7] reftable: add code to facilitate consistency checks Karthik Nayak
2025-10-02 11:44 ` Patrick Steinhardt
2025-09-26 7:25 ` [PATCH v4 7/7] refs/reftable: add fsck check for checking the table name Karthik Nayak
2025-10-02 11:44 ` Patrick Steinhardt
2025-10-06 12:05 ` Karthik Nayak
2025-09-26 21:08 ` [PATCH v4 0/7] refs/reftable: add consistency checks Junio C Hamano
2025-10-06 14:22 ` [PATCH v5 " Karthik Nayak
2025-10-06 14:22 ` [PATCH v5 1/7] refs: remove unused headers Karthik Nayak
2025-10-06 14:23 ` [PATCH v5 2/7] refs: move consistency check msg to generic layer Karthik Nayak
2025-10-06 14:23 ` [PATCH v5 3/7] reftable: check for trailing newline in 'tables.list' Karthik Nayak
2025-10-06 14:23 ` [PATCH v5 4/7] Documentation/fsck-msgids: remove duplicate msg id Karthik Nayak
2025-10-06 14:23 ` [PATCH v5 5/7] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
2025-10-06 14:23 ` [PATCH v5 6/7] reftable: add code to facilitate consistency checks Karthik Nayak
2025-10-06 14:23 ` [PATCH v5 7/7] refs/reftable: add fsck check for checking the table name Karthik Nayak
2025-10-07 2:32 ` Jeff King
2025-10-07 8:45 ` Karthik Nayak
2025-10-06 22:08 ` [PATCH v5 0/7] refs/reftable: add consistency checks Junio C Hamano
2025-10-07 8:47 ` Karthik Nayak
2025-10-07 15:11 ` Junio C Hamano
2025-10-07 12:11 ` [PATCH v6 " Karthik Nayak
2025-10-07 12:11 ` [PATCH v6 1/7] refs: remove unused headers Karthik Nayak
2025-10-07 12:11 ` [PATCH v6 2/7] refs: move consistency check msg to generic layer Karthik Nayak
2025-10-07 12:11 ` [PATCH v6 3/7] reftable: check for trailing newline in 'tables.list' Karthik Nayak
2025-10-07 12:11 ` [PATCH v6 4/7] Documentation/fsck-msgids: remove duplicate msg id Karthik Nayak
2025-10-07 12:11 ` [PATCH v6 5/7] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
2025-10-07 12:11 ` [PATCH v6 6/7] reftable: add code to facilitate consistency checks Karthik Nayak
2025-10-07 12:11 ` [PATCH v6 7/7] refs/reftable: add fsck check for checking the table name Karthik Nayak
2025-10-07 13:26 ` [PATCH v6 0/7] refs/reftable: add consistency checks Patrick Steinhardt
2025-10-07 16:25 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aLf3MaKHZSQfnBlT@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=jltobler@gmail.com \
--cc=karthik.188@gmail.com \
--cc=shejialuo@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).