git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Karthik Nayak <karthik.188@gmail.com>
Cc: git@vger.kernel.org, jltobler@gmail.com, shejialuo@gmail.com
Subject: Re: [PATCH v2 2/5] refs/reftable: add fsck check for checking the table name
Date: Wed, 3 Sep 2025 10:07:13 +0200	[thread overview]
Message-ID: <aLf3MaKHZSQfnBlT@pks.im> (raw)
In-Reply-To: <20250902-228-reftable-introduce-consistency-checks-v2-2-4f96b3834779@gmail.com>

On Tue, Sep 02, 2025 at 09:05:22AM +0200, Karthik Nayak wrote:
> The `git refs verify` command is used to run fsck checks on the
> reference backends. This command is also invoked when users run 'git
> fsck'. While the files-backend has some fsck checks added, the reftable
> backend lacks such checks. Let's add the required infrastructure and a
> check to test for the table names in the 'tables.list' of reftables.
> 
> For the infrastructure, since the reftable library is treated as an
> independent library we should ensure that the library code works
> independently without knowledge about Git's internals. To do this,
> add both 'reftable/fsck.c' and 'reftable/reftable-fsck.h'. Which
> provide an entry point 'reftable_fsck_check' for running fsck checks
> over a provided reftable stack. The callee provides the function with
> callbacks to handle issue and information reporting.
> 
> Add glue code in 'refs/reftable-backend.c' which calls the reftable
> library to perform the fsck checks. Here we also map the reftable errors
> to Git' fsck errors.
> 
> Introduce a check to validate table names for a given reftable stack.
> Also add 'badReftableTableName' as a corresponding error within Git. Add
> a test to check for this behavior.
> 
> While here, remove a unused header `#include "../lockfile.h"` from
> 'refs/reftable-backend.c'.

It's quite a bunch of changes overall that could've been reasonably
split up into multiple commits. E.g. one to introduce the reftable-side
logic, one to start calling it in Git, and one to drop the superfluous
header.

> diff --git a/Makefile b/Makefile
> index e11340c1ae..f2ddcc8d7c 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -2733,6 +2733,7 @@ REFTABLE_OBJS += reftable/error.o
>  REFTABLE_OBJS += reftable/block.o
>  REFTABLE_OBJS += reftable/blocksource.o
>  REFTABLE_OBJS += reftable/iter.o
> +REFTABLE_OBJS += reftable/fsck.o

"f" is before "i" in the alphabet I'm accustomed to :) So let's retain
lexicographic ordering here.

> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index 8dae1e1112..c38c6422f8 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -2675,11 +2676,55 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
>  	return ret;
>  }
>  
> -static int reftable_be_fsck(struct ref_store *ref_store UNUSED,
> -			    struct fsck_options *o UNUSED,
> +static void reftable_fsck_verbose_handler(const char *msg, void *cb_data)
> +{
> +	struct fsck_options *o = cb_data;
> +
> +	if (o->verbose)
> +		fprintf_ln(stderr, "%s", _(msg));
> +}

Is this `_()` marker correct here? There isn't really any reasonable way
for somebody to translate a variable with unknown contents. So shouldn't
it only be the caller of `reftable_fsck_verbose_handler()` that should
mark the string as translatable?

> +static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
> +				       void *cb_data)
> +{
> +	struct fsck_ref_report report = { .path = info->path };
> +	struct fsck_options *o = cb_data;
> +	enum fsck_msg_id msg_id;
> +
> +	switch (info->error) {
> +	case REFTABLE_FSCK_ERROR_TABLE_NAME:
> +		msg_id = FSCK_MSG_BAD_REFTABLE_TABLE_NAME;
> +		break;
> +	default:
> +		BUG("unknown fsck error: %d", info->error);
> +	}
> +
> +	return fsck_report_ref(o, &report, msg_id, "%s", info->msg);
> +}

I think this function will become a bit unwieldy over time. We might
instead want to have an array that maps from reftable-specific to
fsck-specific error code:

    static const fsck_msg_id[] = {
        [REFTABLE_FSCK_ERROR_TABLE_NAME] = FSCK_MSG_BAD_REFTABLE_TABLE_NAME,
    };

So in that case, all we'd have to do is to perform bounds checking in
the above function. And maybe verify that the developer didn't forget to
fill in a new msg ID by checking that the derived message ID is non-zero.

> +static int reftable_be_fsck(struct ref_store *ref_store, struct fsck_options *o,
>  			    struct worktree *wt UNUSED)
>  {
> -	return 0;
> +	struct reftable_ref_store *refs;
> +	struct strmap_entry *entry;
> +	struct hashmap_iter iter;
> +	int ret = 0;
> +
> +	refs = reftable_be_downcast(ref_store, REF_STORE_READ, "fsck");
> +
> +	if (o->verbose)
> +		fprintf_ln(stderr, _("Checking references consistency"));

This line is duplicate across both backends, right? Maybe it's something
that we can do in the generic logic?

> +	ret |= reftable_fsck_check(refs->main_backend.stack, reftable_fsck_error_handler,
> +				  reftable_fsck_verbose_handler, o);
> +
> +	strmap_for_each_entry(&refs->worktree_backends, &iter, entry) {
> +		struct reftable_backend *b = (struct reftable_backend *)entry->value;
> +		ret |= reftable_fsck_check(b->stack, reftable_fsck_error_handler,
> +					  reftable_fsck_verbose_handler, o);
> +	}
> +
> +	return ret;
>  }
>  
>  struct ref_storage_be refs_be_reftable = {

Looks good.

> diff --git a/reftable/fsck.c b/reftable/fsck.c
> new file mode 100644
> index 0000000000..4282b1413e
> --- /dev/null
> +++ b/reftable/fsck.c
> @@ -0,0 +1,53 @@
> +#include "basics.h"
> +#include "reftable-fsck.h"
> +#include "stack.h"
> +
> +int reftable_fsck_check(struct reftable_stack *stack,
> +			reftable_fsck_report_fn report_fn,
> +			reftable_fsck_verbose_fn verbose_fn,
> +			void *cb_data)
> +{
> +
> +	char **names = NULL;
> +	uint64_t min, max;
> +	int err = 0;
> +
> +	if (stack == NULL)
> +		goto out;
> +
> +	err = read_lines(stack->list_file, &names);
> +	if (err < 0)
> +		goto out;
> +
> +	verbose_fn("Checking reftable table names", cb_data);
> +
> +	for (size_t i = 0; names[i]; i++) {
> +		struct reftable_fsck_info info = {
> +			.error = REFTABLE_FSCK_ERROR_TABLE_NAME,
> +			.path = names[i],
> +		};
> +		uint32_t rnd;
> +		/*
> +		 * We want to match the tail '.ref'. One extra byte to ensure
> +		 * that there is no unexpected extra character and one byte for
> +		 * the null terminator added by sscanf.
> +		 */
> +		char tail[6];
> +
> +		if (sscanf(names[i], "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x%5s",
> +			   &min, &max, &rnd, tail) != 4) {
> +			info.msg = "invalid reftable table name";

This here is where the string should be translated.

> +			err = report_fn(&info, cb_data);
> +			continue;
> +		}

I think sscanf is quite frowned-upon in the Git codebase. Maybe we
should manually parse through the string instead?

Furthermore, I think we should move every single check into a separate
function, similar to how the files backend does it. This ensures that
checks are self-contained and that it's way easier to add new checks
over time.

Another angle: did you verify that reftables written by JGit follow this
format?

> +		if (strcmp(tail, ".ref")) {
> +			info.msg = "invalid reftable table extension";

Same here, this should be translated.

> diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
> new file mode 100644
> index 0000000000..4cf0053234
> --- /dev/null
> +++ b/reftable/reftable-fsck.h
> @@ -0,0 +1,38 @@
> +#ifndef REFTABLE_FSCK_H
> +#define REFTABLE_FSCK_H
> +
> +#include "reftable-stack.h"
> +
> +enum reftable_fsck_error {
> +	/* Invalid table name */
> +	REFTABLE_FSCK_ERROR_TABLE_NAME = -1,
> +};

Wouldn't it be more natural to give these positive numbers?

> +/* Represents an individual error encountered during the FSCK checks. */
> +struct reftable_fsck_info {
> +	enum reftable_fsck_error error;
> +	const char *msg;
> +	const char *path;
> +};

I wonder whether it should be the reftable library that decides on the
severity of each generated finding.

> +typedef int reftable_fsck_report_fn(struct reftable_fsck_info *info,
> +				    void *cb_data);
> +typedef void reftable_fsck_verbose_fn(const char *msg, void *cb_data);
> +
> +/*
> + * Given a reftable stack, perform FSCK check on the stack.

s/FSCK check/consistency checks/

> + *
> + * If an issue is encountered, the issue is reported to the callee via the
> + * provided 'report_fn'. If the issue is non-recoverable the flow will not
> + * continue. If it is recoverable, the flow will continue and further issues
> + * will be reported as identified.
> + *
> + * The 'verbose_fn' will be invoked to provide verbose information about
> + * the progress and state of the FSCK checks.

Same here.

> diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
> new file mode 100755
> index 0000000000..81d30df2d7
> --- /dev/null
> +++ b/t/t0614-reftable-fsck.sh
> @@ -0,0 +1,58 @@
> +#!/bin/sh
> +
> +test_description='Test reftable backend consistency check'
> +
> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME

Tests shouldn't define these variables, but should dynamically figure
out what the default branch name is as required, e.g. by using
git-symbolic-ref(1).

> +GIT_TEST_DEFAULT_REF_FORMAT=reftable
> +export GIT_TEST_DEFAULT_REF_FORMAT
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'table name should be checked' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		git commit --allow-empty -m initial &&
> +
> +		git refs verify 2>err &&
> +		test_must_be_empty err &&
> +
> +		TABLE_NAME=$(cat .git/reftable/tables.list | head -n1) &&

You can drop the cat(1) invocation and directly say `head -n1 file`.

> +		sed "1s/^/extra/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
> +		mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
> +		mv .git/reftable/${TABLE_NAME} .git/reftable/extra${TABLE_NAME} &&

No need for the curly braces around TABLE_NAME here and further down. It
would be nice to quote these strings though.

> +
> +		test_must_fail git refs verify 2>err &&
> +		cat >expect <<-EOF &&
> +		error: extra${TABLE_NAME}: badReftableTableName: invalid reftable table name
> +		EOF
> +		test_cmp expect err
> +	)
> +'
> +
> +test_expect_success 'table name should be checked' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		git commit --allow-empty -m initial &&
> +
> +		git refs verify 2>err &&
> +		test_must_be_empty err &&
> +
> +		TABLE_NAME=$(cat .git/reftable/tables.list | head -n1) &&

Same here wrt the extra invocation of cat(1).

Patrick

  reply	other threads:[~2025-09-03  8:07 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-19 12:20 [PATCH 0/5] refs/reftable: add fsck checks Karthik Nayak
2025-08-19 12:21 ` [PATCH 1/5] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
2025-08-19 12:21 ` [PATCH 2/5] refs/reftable: add fsck check for checking the table name Karthik Nayak
2025-08-26 16:21   ` shejialuo
2025-09-01 13:33     ` Karthik Nayak
2025-09-03 13:39       ` shejialuo
2025-08-19 12:21 ` [PATCH 3/5] refs/reftable: add fsck check for number of tables Karthik Nayak
2025-08-26 16:33   ` shejialuo
2025-09-01 13:40     ` Karthik Nayak
2025-08-26 16:44   ` shejialuo
2025-09-01 13:52     ` Karthik Nayak
2025-08-19 12:21 ` [PATCH 4/5] refs/reftable: add fsck check for trailing newline Karthik Nayak
2025-08-19 12:21 ` [PATCH 5/5] refs/reftable: add fsck check for incorrect update index Karthik Nayak
2025-08-26 16:39 ` [PATCH 0/5] refs/reftable: add fsck checks shejialuo
2025-09-01 13:52   ` Karthik Nayak
2025-09-02  7:05 ` [PATCH v2 " Karthik Nayak
2025-09-02  7:05   ` [PATCH v2 1/5] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
2025-09-02 22:25     ` Junio C Hamano
2025-09-08 13:00       ` Karthik Nayak
2025-09-02  7:05   ` [PATCH v2 2/5] refs/reftable: add fsck check for checking the table name Karthik Nayak
2025-09-03  8:07     ` Patrick Steinhardt [this message]
2025-09-03 16:51       ` shejialuo
2025-09-09 13:49         ` Karthik Nayak
2025-09-09  8:42       ` Karthik Nayak
2025-09-02  7:05   ` [PATCH v2 3/5] refs/reftable: add fsck check for number of tables Karthik Nayak
2025-09-03  8:07     ` Patrick Steinhardt
2025-09-15  9:27       ` Karthik Nayak
2025-09-02  7:05   ` [PATCH v2 4/5] refs/reftable: add fsck check for trailing newline Karthik Nayak
2025-09-02 22:38     ` Junio C Hamano
2025-09-03  8:07       ` Patrick Steinhardt
2025-09-02  7:05   ` [PATCH v2 5/5] refs/reftable: add fsck check for incorrect update index Karthik Nayak
2025-09-02 22:42     ` Junio C Hamano
2025-09-18  8:11       ` Karthik Nayak
2025-09-18  8:11 ` [PATCH v3 0/8] refs/reftable: add consistency checks Karthik Nayak
2025-09-18  8:11   ` [PATCH v3 1/8] refs: remove unused headers Karthik Nayak
2025-09-18  8:11   ` [PATCH v3 2/8] refs: move consistency check msg to generic layer Karthik Nayak
2025-09-18  8:11   ` [PATCH v3 3/8] reftable: check for trailing newline in 'tables.list' Karthik Nayak
2025-09-18 15:36     ` Junio C Hamano
2025-09-23 15:42       ` Karthik Nayak
2025-09-24  5:54     ` Patrick Steinhardt
2025-09-24 10:02       ` Karthik Nayak
2025-09-24  7:24     ` Kristoffer Haugsbakk
2025-09-24 11:06       ` Karthik Nayak
2025-09-18  8:11   ` [PATCH v3 4/8] reftable: ensure tables in a stack use sequential update indices Karthik Nayak
2025-09-24  5:54     ` Patrick Steinhardt
2025-09-24 11:20       ` Karthik Nayak
2025-09-24 18:04         ` Junio C Hamano
2025-09-24 20:13           ` Karthik Nayak
2025-09-25  6:12             ` Patrick Steinhardt
2025-09-25 16:22               ` Junio C Hamano
2025-09-18  8:11   ` [PATCH v3 5/8] Documentation/fsck-msgids: remove duplicate msg id Karthik Nayak
2025-09-18  8:11   ` [PATCH v3 6/8] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
2025-09-18  8:11   ` [PATCH v3 7/8] reftable: add code to facilitate consistency checks Karthik Nayak
2025-09-24  5:54     ` Patrick Steinhardt
2025-09-24 18:40       ` Karthik Nayak
2025-09-25  6:14         ` Patrick Steinhardt
2025-09-18  8:11   ` [PATCH v3 8/8] refs/reftable: add fsck check for checking the table name Karthik Nayak
2025-09-24  5:54     ` Patrick Steinhardt
2025-09-24 18:44       ` Karthik Nayak
2025-09-26  7:25 ` [PATCH v4 0/7] refs/reftable: add consistency checks Karthik Nayak
2025-09-26  7:25   ` [PATCH v4 1/7] refs: remove unused headers Karthik Nayak
2025-09-26  7:25   ` [PATCH v4 2/7] refs: move consistency check msg to generic layer Karthik Nayak
2025-09-26  7:25   ` [PATCH v4 3/7] reftable: check for trailing newline in 'tables.list' Karthik Nayak
2025-10-02 11:44     ` Patrick Steinhardt
2025-10-06 12:02       ` Karthik Nayak
2025-09-26  7:25   ` [PATCH v4 4/7] Documentation/fsck-msgids: remove duplicate msg id Karthik Nayak
2025-09-26  7:25   ` [PATCH v4 5/7] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
2025-09-26  7:25   ` [PATCH v4 6/7] reftable: add code to facilitate consistency checks Karthik Nayak
2025-10-02 11:44     ` Patrick Steinhardt
2025-09-26  7:25   ` [PATCH v4 7/7] refs/reftable: add fsck check for checking the table name Karthik Nayak
2025-10-02 11:44     ` Patrick Steinhardt
2025-10-06 12:05       ` Karthik Nayak
2025-09-26 21:08   ` [PATCH v4 0/7] refs/reftable: add consistency checks Junio C Hamano
2025-10-06 14:22 ` [PATCH v5 " Karthik Nayak
2025-10-06 14:22   ` [PATCH v5 1/7] refs: remove unused headers Karthik Nayak
2025-10-06 14:23   ` [PATCH v5 2/7] refs: move consistency check msg to generic layer Karthik Nayak
2025-10-06 14:23   ` [PATCH v5 3/7] reftable: check for trailing newline in 'tables.list' Karthik Nayak
2025-10-06 14:23   ` [PATCH v5 4/7] Documentation/fsck-msgids: remove duplicate msg id Karthik Nayak
2025-10-06 14:23   ` [PATCH v5 5/7] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
2025-10-06 14:23   ` [PATCH v5 6/7] reftable: add code to facilitate consistency checks Karthik Nayak
2025-10-06 14:23   ` [PATCH v5 7/7] refs/reftable: add fsck check for checking the table name Karthik Nayak
2025-10-07  2:32     ` Jeff King
2025-10-07  8:45       ` Karthik Nayak
2025-10-06 22:08   ` [PATCH v5 0/7] refs/reftable: add consistency checks Junio C Hamano
2025-10-07  8:47     ` Karthik Nayak
2025-10-07 15:11       ` Junio C Hamano
2025-10-07 12:11 ` [PATCH v6 " Karthik Nayak
2025-10-07 12:11   ` [PATCH v6 1/7] refs: remove unused headers Karthik Nayak
2025-10-07 12:11   ` [PATCH v6 2/7] refs: move consistency check msg to generic layer Karthik Nayak
2025-10-07 12:11   ` [PATCH v6 3/7] reftable: check for trailing newline in 'tables.list' Karthik Nayak
2025-10-07 12:11   ` [PATCH v6 4/7] Documentation/fsck-msgids: remove duplicate msg id Karthik Nayak
2025-10-07 12:11   ` [PATCH v6 5/7] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
2025-10-07 12:11   ` [PATCH v6 6/7] reftable: add code to facilitate consistency checks Karthik Nayak
2025-10-07 12:11   ` [PATCH v6 7/7] refs/reftable: add fsck check for checking the table name Karthik Nayak
2025-10-07 13:26   ` [PATCH v6 0/7] refs/reftable: add consistency checks Patrick Steinhardt
2025-10-07 16:25     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aLf3MaKHZSQfnBlT@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=jltobler@gmail.com \
    --cc=karthik.188@gmail.com \
    --cc=shejialuo@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).