* [PATCH 1/5] fsck: order 'fsck_msg_type' alphabetically
2025-08-19 12:20 [PATCH 0/5] refs/reftable: add fsck checks Karthik Nayak
@ 2025-08-19 12:21 ` Karthik Nayak
2025-08-19 12:21 ` [PATCH 2/5] refs/reftable: add fsck check for checking the table name Karthik Nayak
` (9 subsequent siblings)
10 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-08-19 12:21 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak
The list of 'fsck_msg_type' seem to be alphabetically ordered, but there
are a few small misses. Fix this by sorting the sub-sections of the
list to maintain alphabetical ordering. Also fix a clang-format issue
where the escaped newlines are not aligned.
While here, remove a duplicate instance of 'gitmodulesLarge' in the
'fsck-msgids' documentation.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Documentation/fsck-msgids.adoc | 3 -
fsck.h | 150 ++++++++++++++++++++---------------------
2 files changed, 75 insertions(+), 78 deletions(-)
diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 0ba4f9a27e..1c912615f9 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -104,9 +104,6 @@
`gitmodulesParse`::
(INFO) Could not parse `.gitmodules` blob.
-`gitmodulesLarge`;
- (ERROR) `.gitmodules` blob is too large to parse.
-
`gitmodulesPath`::
(ERROR) `.gitmodules` path is invalid.
diff --git a/fsck.h b/fsck.h
index dd7df3d5b3..559ad57807 100644
--- a/fsck.h
+++ b/fsck.h
@@ -20,82 +20,82 @@ enum fsck_msg_type {
* two in sync.
*/
-#define FOREACH_FSCK_MSG_ID(FUNC) \
- /* fatal errors */ \
- FUNC(NUL_IN_HEADER, FATAL) \
- FUNC(UNTERMINATED_HEADER, FATAL) \
- /* errors */ \
- FUNC(BAD_DATE, ERROR) \
- FUNC(BAD_DATE_OVERFLOW, ERROR) \
- FUNC(BAD_EMAIL, ERROR) \
- FUNC(BAD_NAME, ERROR) \
- FUNC(BAD_OBJECT_SHA1, ERROR) \
- FUNC(BAD_PACKED_REF_ENTRY, ERROR) \
- FUNC(BAD_PACKED_REF_HEADER, ERROR) \
- FUNC(BAD_PARENT_SHA1, ERROR) \
- FUNC(BAD_REF_CONTENT, ERROR) \
- FUNC(BAD_REF_FILETYPE, ERROR) \
- FUNC(BAD_REF_NAME, ERROR) \
- FUNC(BAD_REFERENT_NAME, ERROR) \
- FUNC(BAD_TIMEZONE, ERROR) \
- FUNC(BAD_TREE, ERROR) \
- FUNC(BAD_TREE_SHA1, ERROR) \
- FUNC(BAD_TYPE, ERROR) \
- FUNC(DUPLICATE_ENTRIES, ERROR) \
- FUNC(MISSING_AUTHOR, ERROR) \
- FUNC(MISSING_COMMITTER, ERROR) \
- FUNC(MISSING_EMAIL, ERROR) \
- FUNC(MISSING_NAME_BEFORE_EMAIL, ERROR) \
- FUNC(MISSING_OBJECT, ERROR) \
- FUNC(MISSING_SPACE_BEFORE_DATE, ERROR) \
- FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR) \
- FUNC(MISSING_TAG, ERROR) \
- FUNC(MISSING_TAG_ENTRY, ERROR) \
- FUNC(MISSING_TREE, ERROR) \
- FUNC(MISSING_TYPE, ERROR) \
- FUNC(MISSING_TYPE_ENTRY, ERROR) \
- FUNC(MULTIPLE_AUTHORS, ERROR) \
- FUNC(PACKED_REF_ENTRY_NOT_TERMINATED, ERROR) \
- FUNC(PACKED_REF_UNSORTED, ERROR) \
- FUNC(TREE_NOT_SORTED, ERROR) \
- FUNC(UNKNOWN_TYPE, ERROR) \
- FUNC(ZERO_PADDED_DATE, ERROR) \
- FUNC(GITMODULES_MISSING, ERROR) \
- FUNC(GITMODULES_BLOB, ERROR) \
- FUNC(GITMODULES_LARGE, ERROR) \
- FUNC(GITMODULES_NAME, ERROR) \
- FUNC(GITMODULES_SYMLINK, ERROR) \
- FUNC(GITMODULES_URL, ERROR) \
- FUNC(GITMODULES_PATH, ERROR) \
- FUNC(GITMODULES_UPDATE, ERROR) \
- FUNC(GITATTRIBUTES_MISSING, ERROR) \
- FUNC(GITATTRIBUTES_LARGE, ERROR) \
- FUNC(GITATTRIBUTES_LINE_LENGTH, ERROR) \
- FUNC(GITATTRIBUTES_BLOB, ERROR) \
- /* warnings */ \
- FUNC(EMPTY_NAME, WARN) \
- FUNC(FULL_PATHNAME, WARN) \
- FUNC(HAS_DOT, WARN) \
- FUNC(HAS_DOTDOT, WARN) \
- FUNC(HAS_DOTGIT, WARN) \
- FUNC(NULL_SHA1, WARN) \
- FUNC(ZERO_PADDED_FILEMODE, WARN) \
- FUNC(NUL_IN_COMMIT, WARN) \
- FUNC(LARGE_PATHNAME, WARN) \
+#define FOREACH_FSCK_MSG_ID(FUNC) \
+ /* fatal errors */ \
+ FUNC(NUL_IN_HEADER, FATAL) \
+ FUNC(UNTERMINATED_HEADER, FATAL) \
+ /* errors */ \
+ FUNC(BAD_DATE, ERROR) \
+ FUNC(BAD_DATE_OVERFLOW, ERROR) \
+ FUNC(BAD_EMAIL, ERROR) \
+ FUNC(BAD_NAME, ERROR) \
+ FUNC(BAD_OBJECT_SHA1, ERROR) \
+ FUNC(BAD_PACKED_REF_ENTRY, ERROR) \
+ FUNC(BAD_PACKED_REF_HEADER, ERROR) \
+ FUNC(BAD_PARENT_SHA1, ERROR) \
+ FUNC(BAD_REFERENT_NAME, ERROR) \
+ FUNC(BAD_REF_CONTENT, ERROR) \
+ FUNC(BAD_REF_FILETYPE, ERROR) \
+ FUNC(BAD_REF_NAME, ERROR) \
+ FUNC(BAD_TIMEZONE, ERROR) \
+ FUNC(BAD_TREE, ERROR) \
+ FUNC(BAD_TREE_SHA1, ERROR) \
+ FUNC(BAD_TYPE, ERROR) \
+ FUNC(DUPLICATE_ENTRIES, ERROR) \
+ FUNC(GITATTRIBUTES_BLOB, ERROR) \
+ FUNC(GITATTRIBUTES_LARGE, ERROR) \
+ FUNC(GITATTRIBUTES_LINE_LENGTH, ERROR) \
+ FUNC(GITATTRIBUTES_MISSING, ERROR) \
+ FUNC(GITMODULES_BLOB, ERROR) \
+ FUNC(GITMODULES_LARGE, ERROR) \
+ FUNC(GITMODULES_MISSING, ERROR) \
+ FUNC(GITMODULES_NAME, ERROR) \
+ FUNC(GITMODULES_PATH, ERROR) \
+ FUNC(GITMODULES_SYMLINK, ERROR) \
+ FUNC(GITMODULES_UPDATE, ERROR) \
+ FUNC(GITMODULES_URL, ERROR) \
+ FUNC(MISSING_AUTHOR, ERROR) \
+ FUNC(MISSING_COMMITTER, ERROR) \
+ FUNC(MISSING_EMAIL, ERROR) \
+ FUNC(MISSING_NAME_BEFORE_EMAIL, ERROR) \
+ FUNC(MISSING_OBJECT, ERROR) \
+ FUNC(MISSING_SPACE_BEFORE_DATE, ERROR) \
+ FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR) \
+ FUNC(MISSING_TAG, ERROR) \
+ FUNC(MISSING_TAG_ENTRY, ERROR) \
+ FUNC(MISSING_TREE, ERROR) \
+ FUNC(MISSING_TYPE, ERROR) \
+ FUNC(MISSING_TYPE_ENTRY, ERROR) \
+ FUNC(MULTIPLE_AUTHORS, ERROR) \
+ FUNC(PACKED_REF_ENTRY_NOT_TERMINATED, ERROR) \
+ FUNC(PACKED_REF_UNSORTED, ERROR) \
+ FUNC(TREE_NOT_SORTED, ERROR) \
+ FUNC(UNKNOWN_TYPE, ERROR) \
+ FUNC(ZERO_PADDED_DATE, ERROR) \
+ /* warnings */ \
+ FUNC(EMPTY_NAME, WARN) \
+ FUNC(FULL_PATHNAME, WARN) \
+ FUNC(HAS_DOT, WARN) \
+ FUNC(HAS_DOTDOT, WARN) \
+ FUNC(HAS_DOTGIT, WARN) \
+ FUNC(LARGE_PATHNAME, WARN) \
+ FUNC(NULL_SHA1, WARN) \
+ FUNC(NUL_IN_COMMIT, WARN) \
+ FUNC(ZERO_PADDED_FILEMODE, WARN) \
/* infos (reported as warnings, but ignored by default) */ \
- FUNC(BAD_FILEMODE, INFO) \
- FUNC(EMPTY_PACKED_REFS_FILE, INFO) \
- FUNC(GITMODULES_PARSE, INFO) \
- FUNC(GITIGNORE_SYMLINK, INFO) \
- FUNC(GITATTRIBUTES_SYMLINK, INFO) \
- FUNC(MAILMAP_SYMLINK, INFO) \
- FUNC(BAD_TAG_NAME, INFO) \
- FUNC(MISSING_TAGGER_ENTRY, INFO) \
- FUNC(SYMLINK_REF, INFO) \
- FUNC(REF_MISSING_NEWLINE, INFO) \
- FUNC(SYMREF_TARGET_IS_NOT_A_REF, INFO) \
- FUNC(TRAILING_REF_CONTENT, INFO) \
- /* ignored (elevated when requested) */ \
+ FUNC(BAD_FILEMODE, INFO) \
+ FUNC(BAD_TAG_NAME, INFO) \
+ FUNC(EMPTY_PACKED_REFS_FILE, INFO) \
+ FUNC(GITATTRIBUTES_SYMLINK, INFO) \
+ FUNC(GITIGNORE_SYMLINK, INFO) \
+ FUNC(GITMODULES_PARSE, INFO) \
+ FUNC(MAILMAP_SYMLINK, INFO) \
+ FUNC(MISSING_TAGGER_ENTRY, INFO) \
+ FUNC(REF_MISSING_NEWLINE, INFO) \
+ FUNC(SYMLINK_REF, INFO) \
+ FUNC(SYMREF_TARGET_IS_NOT_A_REF, INFO) \
+ FUNC(TRAILING_REF_CONTENT, INFO) \
+ /* ignored (elevated when requested) */ \
FUNC(EXTRA_HEADER_ENTRY, IGNORE)
#define MSG_ID(id, msg_type) FSCK_MSG_##id,
--
2.50.1
^ permalink raw reply related [flat|nested] 96+ messages in thread* [PATCH 2/5] refs/reftable: add fsck check for checking the table name
2025-08-19 12:20 [PATCH 0/5] refs/reftable: add fsck checks Karthik Nayak
2025-08-19 12:21 ` [PATCH 1/5] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
@ 2025-08-19 12:21 ` Karthik Nayak
2025-08-26 16:21 ` shejialuo
2025-08-19 12:21 ` [PATCH 3/5] refs/reftable: add fsck check for number of tables Karthik Nayak
` (8 subsequent siblings)
10 siblings, 1 reply; 96+ messages in thread
From: Karthik Nayak @ 2025-08-19 12:21 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak
The `git refs verify` command is used to run fsck checks on the
reference backends. This command is also invoked when users run 'git
fsck'. While the files-backend has some fsck checks added, the reftable
backend lacks such checks. Let's add the required infrastructure and a
check to test for the table names in the 'tables.list' of reftables.
For the infrastructure, since the reftable library is treated as an
independent library we should ensure that the library code works
independently without knowledge about Git's internals. To do this,
add both 'reftable/fsck.c' and 'reftable/reftable-fsck.h'. Which
provide an entry point 'reftable_fsck_check' for running fsck checks
over a provided reftable stack. The callee provides the function with
callbacks to handle issue and information reporting.
Add glue code in 'refs/reftable-backend.c' which calls the reftable
library to perform the fsck checks. Here we also map the reftable errors
to Git' fsck errors.
Introduce a check to validate table names for a given reftable stack.
Also add 'badReftableTableName' as a corresponding error within Git. Add
a test to check for this behavior.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Documentation/fsck-msgids.adoc | 3 +++
Makefile | 1 +
fsck.h | 1 +
meson.build | 1 +
refs/reftable-backend.c | 61 +++++++++++++++++++++++++++++++++++++-----
reftable/fsck.c | 50 ++++++++++++++++++++++++++++++++++
reftable/reftable-fsck.h | 38 ++++++++++++++++++++++++++
t/meson.build | 3 ++-
t/t0614-reftable-fsck.sh | 35 ++++++++++++++++++++++++
9 files changed, 186 insertions(+), 7 deletions(-)
diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 1c912615f9..784ddc0df5 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -38,6 +38,9 @@
`badReferentName`::
(ERROR) The referent name of a symref is invalid.
+`badReftableTableName`::
+ (ERROR) A reftable table has an invalid name.
+
`badTagName`::
(INFO) A tag has an invalid format.
diff --git a/Makefile b/Makefile
index e11340c1ae..f2ddcc8d7c 100644
--- a/Makefile
+++ b/Makefile
@@ -2733,6 +2733,7 @@ REFTABLE_OBJS += reftable/error.o
REFTABLE_OBJS += reftable/block.o
REFTABLE_OBJS += reftable/blocksource.o
REFTABLE_OBJS += reftable/iter.o
+REFTABLE_OBJS += reftable/fsck.o
REFTABLE_OBJS += reftable/merged.o
REFTABLE_OBJS += reftable/pq.o
REFTABLE_OBJS += reftable/record.o
diff --git a/fsck.h b/fsck.h
index 559ad57807..5901f944a1 100644
--- a/fsck.h
+++ b/fsck.h
@@ -34,6 +34,7 @@ enum fsck_msg_type {
FUNC(BAD_PACKED_REF_HEADER, ERROR) \
FUNC(BAD_PARENT_SHA1, ERROR) \
FUNC(BAD_REFERENT_NAME, ERROR) \
+ FUNC(BAD_REFTABLE_TABLE_NAME, ERROR) \
FUNC(BAD_REF_CONTENT, ERROR) \
FUNC(BAD_REF_FILETYPE, ERROR) \
FUNC(BAD_REF_NAME, ERROR) \
diff --git a/meson.build b/meson.build
index 5dd299b496..82879fbfaa 100644
--- a/meson.build
+++ b/meson.build
@@ -452,6 +452,7 @@ libgit_sources = [
'reftable/error.c',
'reftable/block.c',
'reftable/blocksource.c',
+ 'reftable/fsck.c',
'reftable/iter.c',
'reftable/merged.c',
'reftable/pq.c',
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 8dae1e1112..ccd12052f2 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -6,20 +6,21 @@
#include "../config.h"
#include "../dir.h"
#include "../environment.h"
+#include "../fsck.h"
#include "../gettext.h"
#include "../hash.h"
#include "../hex.h"
#include "../iterator.h"
#include "../ident.h"
-#include "../lockfile.h"
#include "../object.h"
#include "../path.h"
#include "../refs.h"
#include "../reftable/reftable-basics.h"
-#include "../reftable/reftable-stack.h"
-#include "../reftable/reftable-record.h"
#include "../reftable/reftable-error.h"
+#include "../reftable/reftable-fsck.h"
#include "../reftable/reftable-iterator.h"
+#include "../reftable/reftable-record.h"
+#include "../reftable/reftable-stack.h"
#include "../repo-settings.h"
#include "../setup.h"
#include "../strmap.h"
@@ -2675,11 +2676,59 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
return ret;
}
-static int reftable_be_fsck(struct ref_store *ref_store UNUSED,
- struct fsck_options *o UNUSED,
+static void reftable_fsck_verbose_handler(const char *msg, void *cb_data)
+{
+ struct fsck_options *o = cb_data;
+
+ if (o->verbose)
+ fprintf_ln(stderr, "%s", _(msg));
+}
+
+static int reftable_fsck_error_handler(struct reftable_fsck_info info,
+ void *cb_data)
+{
+ struct fsck_options *o = cb_data;
+ struct fsck_ref_report report = { .path = info.path };
+ enum fsck_msg_id msg_id;
+
+ switch (info.error) {
+ case REFTABLE_FSCK_ERROR_TABLE_NAME:
+ msg_id = FSCK_MSG_BAD_REFTABLE_TABLE_NAME;
+ break;
+ default:
+ BUG("unknown fsck error: %d", info.error);
+ }
+
+ return fsck_report_ref(o, &report, msg_id, "%s", info.msg);
+}
+
+static int reftable_be_fsck(struct ref_store *ref_store, struct fsck_options *o,
struct worktree *wt UNUSED)
{
- return 0;
+ struct reftable_ref_store *refs;
+ struct strmap_entry *entry;
+ struct hashmap_iter iter;
+ int ret = 0;
+
+ refs = reftable_be_downcast(ref_store, REF_STORE_READ, "fsck");
+
+ if (o->verbose)
+ fprintf_ln(stderr, _("Checking references consistency"));
+
+ ret = reftable_fsck_check(refs->main_backend.stack, reftable_fsck_error_handler,
+ reftable_fsck_verbose_handler, o);
+ if (!ret)
+ return ret;
+
+ strmap_for_each_entry(&refs->worktree_backends, &iter, entry) {
+ struct reftable_backend *b = (struct reftable_backend *)entry->value;
+ ret = reftable_fsck_check(b->stack, reftable_fsck_error_handler,
+ reftable_fsck_verbose_handler, o);
+ if (!ret)
+ return ret;
+ }
+
+ return ret;
}
struct ref_storage_be refs_be_reftable = {
diff --git a/reftable/fsck.c b/reftable/fsck.c
new file mode 100644
index 0000000000..22ec3c26e9
--- /dev/null
+++ b/reftable/fsck.c
@@ -0,0 +1,50 @@
+#include "basics.h"
+#include "reftable-fsck.h"
+#include "stack.h"
+
+int reftable_fsck_check(struct reftable_stack *stack,
+ reftable_fsck_report_fn report_fn,
+ reftable_fsck_verbose_fn verbose_fn,
+ void *cb_data)
+{
+ char **names = NULL;
+ uint64_t min, max;
+ int err = 0;
+
+ if (stack == NULL)
+ goto out;
+
+ err = read_lines(stack->list_file, &names);
+ if (err < 0)
+ goto out;
+
+ verbose_fn("Checking reftable table names", cb_data);
+
+ for (size_t i = 0; names[i]; i++) {
+ struct reftable_fsck_info info = {
+ .error = REFTABLE_FSCK_ERROR_TABLE_NAME,
+ .path = names[i],
+ .msg = "invalid reftable name"
+ };
+ uint32_t rnd;
+ /*
+ * We want to match the tail '.ref'. One extra byte to ensure
+ * that there is no unexpected extra character and one byte for
+ * the null terminator added by sscanf.
+ */
+ char tail[6];
+
+ if (sscanf(names[i], "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x%5s",
+ &min, &max, &rnd, tail) != 4) {
+ err = report_fn(info, cb_data);
+ }
+
+ if (strcmp(tail, ".ref")) {
+ err = report_fn(info, cb_data);
+ }
+ }
+
+out:
+ free_names(names);
+ return err;
+}
diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
new file mode 100644
index 0000000000..087430d979
--- /dev/null
+++ b/reftable/reftable-fsck.h
@@ -0,0 +1,38 @@
+#ifndef REFTABLE_FSCK_H
+#define REFTABLE_FSCK_H
+
+#include "reftable-stack.h"
+
+enum reftable_fsck_error {
+ /* Invalid table name */
+ REFTABLE_FSCK_ERROR_TABLE_NAME = -1,
+};
+
+/* Represents an individual error encounctered during the FSCK checks. */
+struct reftable_fsck_info {
+ enum reftable_fsck_error error;
+ const char *msg;
+ const char *path;
+};
+
+typedef int reftable_fsck_report_fn(struct reftable_fsck_info info,
+ void *cb_data);
+typedef void reftable_fsck_verbose_fn(const char *msg, void *cb_data);
+
+/*
+ * Given a reftable stack, perform FSCK check on the stack.
+ *
+ * If an issue is encountered, the issue is reported to the callee via the
+ * provided 'report_fn'. If the issue is non-recoverable the flow will not
+ * conitnue. If it is recoverable, the flow will continue and further issues
+ * will be reported as identified.
+ *
+ * The 'verbose_fn' will be invoked to provide verbose information about
+ * the progress and state of the FSCK checks.
+ */
+int reftable_fsck_check(struct reftable_stack *stack,
+ reftable_fsck_report_fn report_fn,
+ reftable_fsck_verbose_fn verbose_fn,
+ void *cb_data);
+
+#endif /* REFTABLE_FSCK_H */
diff --git a/t/meson.build b/t/meson.build
index bbeba1a8d5..a8eb44eb30 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -145,6 +145,7 @@ integration_tests = [
't0611-reftable-httpd.sh',
't0612-reftable-jgit-compatibility.sh',
't0613-reftable-write-options.sh',
+ 't0614-reftable-fsck.sh',
't1000-read-tree-m-3way.sh',
't1001-read-tree-m-2way.sh',
't1002-read-tree-m-u-2way.sh',
@@ -1214,4 +1215,4 @@ if perl.found() and time.found()
timeout: 0,
)
endforeach
-endif
\ No newline at end of file
+endif
diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
new file mode 100755
index 0000000000..0d11871b1c
--- /dev/null
+++ b/t/t0614-reftable-fsck.sh
@@ -0,0 +1,35 @@
+#!/bin/sh
+
+test_description='Test reftable backend consistency check'
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+GIT_TEST_DEFAULT_REF_FORMAT=reftable
+export GIT_TEST_DEFAULT_REF_FORMAT
+
+. ./test-lib.sh
+
+test_expect_success 'table name should be checked' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git commit --allow-empty -m initial &&
+
+ git refs verify 2>err &&
+ test_must_be_empty err &&
+
+ TABLE_NAME=$(cat .git/reftable/tables.list | head -n1) &&
+ sed "1s/$/extra/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
+ mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
+ mv .git/reftable/${TABLE_NAME} .git/reftable/${TABLE_NAME}extra &&
+
+ test_must_fail git refs verify 2>err &&
+ cat >expect <<-EOF &&
+ error: ${TABLE_NAME}extra: badReftableTableName: invalid reftable name
+ EOF
+ test_cmp expect err
+ )
+'
+
+test_done
--
2.50.1
^ permalink raw reply related [flat|nested] 96+ messages in thread* Re: [PATCH 2/5] refs/reftable: add fsck check for checking the table name
2025-08-19 12:21 ` [PATCH 2/5] refs/reftable: add fsck check for checking the table name Karthik Nayak
@ 2025-08-26 16:21 ` shejialuo
2025-09-01 13:33 ` Karthik Nayak
0 siblings, 1 reply; 96+ messages in thread
From: shejialuo @ 2025-08-26 16:21 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git
On Tue, Aug 19, 2025 at 02:21:01PM +0200, Karthik Nayak wrote:
> The `git refs verify` command is used to run fsck checks on the
> reference backends. This command is also invoked when users run 'git
> fsck'. While the files-backend has some fsck checks added, the reftable
> backend lacks such checks. Let's add the required infrastructure and a
> check to test for the table names in the 'tables.list' of reftables.
>
> For the infrastructure, since the reftable library is treated as an
> independent library we should ensure that the library code works
> independently without knowledge about Git's internals. To do this,
> add both 'reftable/fsck.c' and 'reftable/reftable-fsck.h'. Which
A design question here, we name the "fsck.c" for the source code but for
the header, we use "reftable-fsck.h", it is a little strange. Why not
just "fsck.h" instead of "reftable-fsck.h".
> provide an entry point 'reftable_fsck_check' for running fsck checks
> over a provided reftable stack. The callee provides the function with
> callbacks to handle issue and information reporting.
>
> Add glue code in 'refs/reftable-backend.c' which calls the reftable
> library to perform the fsck checks. Here we also map the reftable errors
> to Git' fsck errors.
>
> Introduce a check to validate table names for a given reftable stack.
> Also add 'badReftableTableName' as a corresponding error within Git. Add
> a test to check for this behavior.
>
> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
> ---
> Documentation/fsck-msgids.adoc | 3 +++
> Makefile | 1 +
> fsck.h | 1 +
> meson.build | 1 +
> refs/reftable-backend.c | 61 +++++++++++++++++++++++++++++++++++++-----
> reftable/fsck.c | 50 ++++++++++++++++++++++++++++++++++
> reftable/reftable-fsck.h | 38 ++++++++++++++++++++++++++
> t/meson.build | 3 ++-
> t/t0614-reftable-fsck.sh | 35 ++++++++++++++++++++++++
> 9 files changed, 186 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
> index 1c912615f9..784ddc0df5 100644
> --- a/Documentation/fsck-msgids.adoc
> +++ b/Documentation/fsck-msgids.adoc
> @@ -38,6 +38,9 @@
> `badReferentName`::
> (ERROR) The referent name of a symref is invalid.
>
> +`badReftableTableName`::
> + (ERROR) A reftable table has an invalid name.
> +
When reading this, I feel a little strange. `Reftable` already indicates
it is a table. Should we simply say like the following:
A reftable has an invalid table name
> `badTagName`::
> (INFO) A tag has an invalid format.
>
> diff --git a/Makefile b/Makefile
> index e11340c1ae..f2ddcc8d7c 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -2733,6 +2733,7 @@ REFTABLE_OBJS += reftable/error.o
> REFTABLE_OBJS += reftable/block.o
> REFTABLE_OBJS += reftable/blocksource.o
> REFTABLE_OBJS += reftable/iter.o
> +REFTABLE_OBJS += reftable/fsck.o
> REFTABLE_OBJS += reftable/merged.o
> REFTABLE_OBJS += reftable/pq.o
> REFTABLE_OBJS += reftable/record.o
> diff --git a/fsck.h b/fsck.h
> index 559ad57807..5901f944a1 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -34,6 +34,7 @@ enum fsck_msg_type {
> FUNC(BAD_PACKED_REF_HEADER, ERROR) \
> FUNC(BAD_PARENT_SHA1, ERROR) \
> FUNC(BAD_REFERENT_NAME, ERROR) \
> + FUNC(BAD_REFTABLE_TABLE_NAME, ERROR) \
> FUNC(BAD_REF_CONTENT, ERROR) \
> FUNC(BAD_REF_FILETYPE, ERROR) \
> FUNC(BAD_REF_NAME, ERROR) \
> diff --git a/meson.build b/meson.build
> index 5dd299b496..82879fbfaa 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -452,6 +452,7 @@ libgit_sources = [
> 'reftable/error.c',
> 'reftable/block.c',
> 'reftable/blocksource.c',
> + 'reftable/fsck.c',
> 'reftable/iter.c',
> 'reftable/merged.c',
> 'reftable/pq.c',
> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index 8dae1e1112..ccd12052f2 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -6,20 +6,21 @@
> #include "../config.h"
> #include "../dir.h"
> #include "../environment.h"
> +#include "../fsck.h"
> #include "../gettext.h"
> #include "../hash.h"
> #include "../hex.h"
> #include "../iterator.h"
> #include "../ident.h"
> -#include "../lockfile.h"
Here, we delete this header file. Is the reason that we don't need this
header file anymore?
> #include "../object.h"
> #include "../path.h"
> #include "../refs.h"
> #include "../reftable/reftable-basics.h"
> -#include "../reftable/reftable-stack.h"
> -#include "../reftable/reftable-record.h"
> #include "../reftable/reftable-error.h"
> +#include "../reftable/reftable-fsck.h"
> #include "../reftable/reftable-iterator.h"
> +#include "../reftable/reftable-record.h"
> +#include "../reftable/reftable-stack.h"
> #include "../repo-settings.h"
> #include "../setup.h"
> #include "../strmap.h"
> @@ -2675,11 +2676,59 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
> return ret;
> }
>
> -static int reftable_be_fsck(struct ref_store *ref_store UNUSED,
> - struct fsck_options *o UNUSED,
> +static void reftable_fsck_verbose_handler(const char *msg, void *cb_data)
> +{
> + struct fsck_options *o = cb_data;
> +
> + if (o->verbose)
> + fprintf_ln(stderr, "%s", _(msg));
> +}
> +
> +static int reftable_fsck_error_handler(struct reftable_fsck_info info,
A design question: why do we need to pass the value "info" instead of
pointer?
> + void *cb_data)
> +{
> + struct fsck_options *o = cb_data;
> + struct fsck_ref_report report = { .path = info.path };
Let's make it reverse-christmas-tree ordering.
> + enum fsck_msg_id msg_id;
> +
> + switch (info.error) {
> + case REFTABLE_FSCK_ERROR_TABLE_NAME:
> + msg_id = FSCK_MSG_BAD_REFTABLE_TABLE_NAME;
> + break;
> + default:
> + BUG("unknown fsck error: %d", info.error);
> + }
> +
> + return fsck_report_ref(o, &report, msg_id, "%s", info.msg);
> +}
> +
> +static int reftable_be_fsck(struct ref_store *ref_store, struct fsck_options *o,
> struct worktree *wt UNUSED)
> {
> - return 0;
> + struct reftable_ref_store *refs;
> + struct strmap_entry *entry;
> + struct hashmap_iter iter;
> + int ret = 0;
> +
> + refs = reftable_be_downcast(ref_store, REF_STORE_READ, "fsck");
> +
> + if (o->verbose)
> + fprintf_ln(stderr, _("Checking references consistency"));
> +
> + ret = reftable_fsck_check(refs->main_backend.stack, reftable_fsck_error_handler,
> + reftable_fsck_verbose_handler, o);
> + if (!ret)
> + return ret;
> +
From my understanding, if we find that there is any trouble in the main
worktree reftable backend, we would just abort the check. Should we
continue to check the linked worktrees?
> + strmap_for_each_entry(&refs->worktree_backends, &iter, entry) {
> + struct reftable_backend *b = (struct reftable_backend *)entry->value;
> + ret = reftable_fsck_check(b->stack, reftable_fsck_error_handler,
> + reftable_fsck_verbose_handler, o);
> + if (!ret)
> + return ret;
> + }
> +
> + return ret;
> }
>
> struct ref_storage_be refs_be_reftable = {
> diff --git a/reftable/fsck.c b/reftable/fsck.c
> new file mode 100644
> index 0000000000..22ec3c26e9
> --- /dev/null
> +++ b/reftable/fsck.c
> @@ -0,0 +1,50 @@
> +#include "basics.h"
> +#include "reftable-fsck.h"
> +#include "stack.h"
> +
> +int reftable_fsck_check(struct reftable_stack *stack,
> + reftable_fsck_report_fn report_fn,
> + reftable_fsck_verbose_fn verbose_fn,
> + void *cb_data)
> +{
> + char **names = NULL;
> + uint64_t min, max;
> + int err = 0;
> +
> + if (stack == NULL)
> + goto out;
> +
> + err = read_lines(stack->list_file, &names);
> + if (err < 0)
> + goto out;
> +
> + verbose_fn("Checking reftable table names", cb_data);
> +
> + for (size_t i = 0; names[i]; i++) {
> + struct reftable_fsck_info info = {
> + .error = REFTABLE_FSCK_ERROR_TABLE_NAME,
> + .path = names[i],
> + .msg = "invalid reftable name"
> + };
Should we define this data structure outside of the loop? It's
unnecessary here as we could change ".path" and ".msg" dynamically in
the loop.
> + uint32_t rnd;
> + /*
> + * We want to match the tail '.ref'. One extra byte to ensure
> + * that there is no unexpected extra character and one byte for
> + * the null terminator added by sscanf.
> + */
> + char tail[6];
> +
> + if (sscanf(names[i], "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x%5s",
> + &min, &max, &rnd, tail) != 4) {
> + err = report_fn(info, cb_data);
I think we could just pass pointer to avoid unnecessary copy operations.
Besides that, I think here we report two different kinds of problem. But
we would give report the user always the same message `invalid reftable
name`. This is too vague.
I think we'd better set different messages for different problems.
> + }
> +
> + if (strcmp(tail, ".ref")) {
> + err = report_fn(info, cb_data);
> + }
> + }
> +
> +out:
> + free_names(names);
> + return err;
> +}
> diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
> new file mode 100644
> index 0000000000..087430d979
> --- /dev/null
> +++ b/reftable/reftable-fsck.h
> @@ -0,0 +1,38 @@
> +#ifndef REFTABLE_FSCK_H
> +#define REFTABLE_FSCK_H
> +
> +#include "reftable-stack.h"
> +
> +enum reftable_fsck_error {
> + /* Invalid table name */
> + REFTABLE_FSCK_ERROR_TABLE_NAME = -1,
> +};
> +
> +/* Represents an individual error encounctered during the FSCK checks. */
> +struct reftable_fsck_info {
> + enum reftable_fsck_error error;
> + const char *msg;
> + const char *path;
> +};
> +
> +typedef int reftable_fsck_report_fn(struct reftable_fsck_info info,
> + void *cb_data);
As I have explained above, we should use `struct reftable_fsck_info
*info` instead of `struct reftable_fsck_info info`.
> +typedef void reftable_fsck_verbose_fn(const char *msg, void *cb_data);
> +
> +/*
> + * Given a reftable stack, perform FSCK check on the stack.
> + *
> + * If an issue is encountered, the issue is reported to the callee via the
> + * provided 'report_fn'. If the issue is non-recoverable the flow will not
> + * conitnue. If it is recoverable, the flow will continue and further issues
> + * will be reported as identified.
> + *
> + * The 'verbose_fn' will be invoked to provide verbose information about
> + * the progress and state of the FSCK checks.
> + */
> +int reftable_fsck_check(struct reftable_stack *stack,
> + reftable_fsck_report_fn report_fn,
> + reftable_fsck_verbose_fn verbose_fn,
> + void *cb_data);
> +
> +#endif /* REFTABLE_FSCK_H */
> diff --git a/t/meson.build b/t/meson.build
> index bbeba1a8d5..a8eb44eb30 100644
> --- a/t/meson.build
> +++ b/t/meson.build
> @@ -145,6 +145,7 @@ integration_tests = [
> 't0611-reftable-httpd.sh',
> 't0612-reftable-jgit-compatibility.sh',
> 't0613-reftable-write-options.sh',
> + 't0614-reftable-fsck.sh',
> 't1000-read-tree-m-3way.sh',
> 't1001-read-tree-m-2way.sh',
> 't1002-read-tree-m-u-2way.sh',
> @@ -1214,4 +1215,4 @@ if perl.found() and time.found()
> timeout: 0,
> )
> endforeach
> -endif
> \ No newline at end of file
> +endif
> diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
> new file mode 100755
> index 0000000000..0d11871b1c
> --- /dev/null
> +++ b/t/t0614-reftable-fsck.sh
> @@ -0,0 +1,35 @@
> +#!/bin/sh
> +
> +test_description='Test reftable backend consistency check'
> +
> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
> +GIT_TEST_DEFAULT_REF_FORMAT=reftable
> +export GIT_TEST_DEFAULT_REF_FORMAT
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'table name should be checked' '
> + test_when_finished "rm -rf repo" &&
> + git init repo &&
> + (
> + cd repo &&
> + git commit --allow-empty -m initial &&
> +
> + git refs verify 2>err &&
> + test_must_be_empty err &&
> +
> + TABLE_NAME=$(cat .git/reftable/tables.list | head -n1) &&
> + sed "1s/$/extra/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
> + mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
> + mv .git/reftable/${TABLE_NAME} .git/reftable/${TABLE_NAME}extra &&
> +
> + test_must_fail git refs verify 2>err &&
> + cat >expect <<-EOF &&
> + error: ${TABLE_NAME}extra: badReftableTableName: invalid reftable name
> + EOF
> + test_cmp expect err
> + )
> +'
We would check two kinds of errors, should we add two tests instead of
only this one.
> +
> +test_done
>
> --
> 2.50.1
>
Thanks,
Jialuo
^ permalink raw reply [flat|nested] 96+ messages in thread* Re: [PATCH 2/5] refs/reftable: add fsck check for checking the table name
2025-08-26 16:21 ` shejialuo
@ 2025-09-01 13:33 ` Karthik Nayak
2025-09-03 13:39 ` shejialuo
0 siblings, 1 reply; 96+ messages in thread
From: Karthik Nayak @ 2025-09-01 13:33 UTC (permalink / raw)
To: shejialuo; +Cc: git, ps
[-- Attachment #1: Type: text/plain, Size: 9908 bytes --]
shejialuo <shejialuo@gmail.com> writes:
> On Tue, Aug 19, 2025 at 02:21:01PM +0200, Karthik Nayak wrote:
>> The `git refs verify` command is used to run fsck checks on the
>> reference backends. This command is also invoked when users run 'git
>> fsck'. While the files-backend has some fsck checks added, the reftable
>> backend lacks such checks. Let's add the required infrastructure and a
>> check to test for the table names in the 'tables.list' of reftables.
>>
>> For the infrastructure, since the reftable library is treated as an
>> independent library we should ensure that the library code works
>> independently without knowledge about Git's internals. To do this,
>> add both 'reftable/fsck.c' and 'reftable/reftable-fsck.h'. Which
>
> A design question here, we name the "fsck.c" for the source code but for
> the header, we use "reftable-fsck.h", it is a little strange. Why not
> just "fsck.h" instead of "reftable-fsck.h".
>
Since the reftable code is treated as an external library, all
'reftable-.*.h' headers are treated as headers which expose APIs for the
libraries users. We would have defined 'reftable/fsck.h' if there were
internal users of the 'fsck.c' code. But there are none.
>> diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
>> index 1c912615f9..784ddc0df5 100644
>> --- a/Documentation/fsck-msgids.adoc
>> +++ b/Documentation/fsck-msgids.adoc
>> @@ -38,6 +38,9 @@
>> `badReferentName`::
>> (ERROR) The referent name of a symref is invalid.
>>
>> +`badReftableTableName`::
>> + (ERROR) A reftable table has an invalid name.
>> +
>
> When reading this, I feel a little strange. `Reftable` already indicates
> it is a table. Should we simply say like the following:
>
> A reftable has an invalid table name
>
I'm not sure about this, since 'reftable' refers to the reference
backend and the 'table' refers to an individual table within the
'reftable' format. I would say both are important.
CC'ing Patrick here for a second opinion.
>> `badTagName`::
>> (INFO) A tag has an invalid format.
>>
>> diff --git a/Makefile b/Makefile
>> index e11340c1ae..f2ddcc8d7c 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -2733,6 +2733,7 @@ REFTABLE_OBJS += reftable/error.o
>> REFTABLE_OBJS += reftable/block.o
>> REFTABLE_OBJS += reftable/blocksource.o
>> REFTABLE_OBJS += reftable/iter.o
>> +REFTABLE_OBJS += reftable/fsck.o
>> REFTABLE_OBJS += reftable/merged.o
>> REFTABLE_OBJS += reftable/pq.o
>> REFTABLE_OBJS += reftable/record.o
>> diff --git a/fsck.h b/fsck.h
>> index 559ad57807..5901f944a1 100644
>> --- a/fsck.h
>> +++ b/fsck.h
>> @@ -34,6 +34,7 @@ enum fsck_msg_type {
>> FUNC(BAD_PACKED_REF_HEADER, ERROR) \
>> FUNC(BAD_PARENT_SHA1, ERROR) \
>> FUNC(BAD_REFERENT_NAME, ERROR) \
>> + FUNC(BAD_REFTABLE_TABLE_NAME, ERROR) \
>> FUNC(BAD_REF_CONTENT, ERROR) \
>> FUNC(BAD_REF_FILETYPE, ERROR) \
>> FUNC(BAD_REF_NAME, ERROR) \
>> diff --git a/meson.build b/meson.build
>> index 5dd299b496..82879fbfaa 100644
>> --- a/meson.build
>> +++ b/meson.build
>> @@ -452,6 +452,7 @@ libgit_sources = [
>> 'reftable/error.c',
>> 'reftable/block.c',
>> 'reftable/blocksource.c',
>> + 'reftable/fsck.c',
>> 'reftable/iter.c',
>> 'reftable/merged.c',
>> 'reftable/pq.c',
>> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
>> index 8dae1e1112..ccd12052f2 100644
>> --- a/refs/reftable-backend.c
>> +++ b/refs/reftable-backend.c
>> @@ -6,20 +6,21 @@
>> #include "../config.h"
>> #include "../dir.h"
>> #include "../environment.h"
>> +#include "../fsck.h"
>> #include "../gettext.h"
>> #include "../hash.h"
>> #include "../hex.h"
>> #include "../iterator.h"
>> #include "../ident.h"
>> -#include "../lockfile.h"
>
> Here, we delete this header file. Is the reason that we don't need this
> header file anymore?
>
Yes, it wasn't needed in the first place, let me add a comment in the
commit message.
>> #include "../object.h"
>> #include "../path.h"
>> #include "../refs.h"
>> #include "../reftable/reftable-basics.h"
>> -#include "../reftable/reftable-stack.h"
>> -#include "../reftable/reftable-record.h"
>> #include "../reftable/reftable-error.h"
>> +#include "../reftable/reftable-fsck.h"
>> #include "../reftable/reftable-iterator.h"
>> +#include "../reftable/reftable-record.h"
>> +#include "../reftable/reftable-stack.h"
>> #include "../repo-settings.h"
>> #include "../setup.h"
>> #include "../strmap.h"
>> @@ -2675,11 +2676,59 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
>> return ret;
>> }
>>
>> -static int reftable_be_fsck(struct ref_store *ref_store UNUSED,
>> - struct fsck_options *o UNUSED,
>> +static void reftable_fsck_verbose_handler(const char *msg, void *cb_data)
>> +{
>> + struct fsck_options *o = cb_data;
>> +
>> + if (o->verbose)
>> + fprintf_ln(stderr, "%s", _(msg));
>> +}
>> +
>> +static int reftable_fsck_error_handler(struct reftable_fsck_info info,
>
> A design question: why do we need to pass the value "info" instead of
> pointer?
>
I didn't see a reason to make it a pointer. But it does make it more
efficient when the struct size increases. Let me change it!
>
>> + void *cb_data)
>> +{
>> + struct fsck_options *o = cb_data;
>> + struct fsck_ref_report report = { .path = info.path };
>
> Let's make it reverse-christmas-tree ordering.
>
Will change!
>> +static int reftable_be_fsck(struct ref_store *ref_store, struct fsck_options *o,
>> struct worktree *wt UNUSED)
>> {
>> - return 0;
>> + struct reftable_ref_store *refs;
>> + struct strmap_entry *entry;
>> + struct hashmap_iter iter;
>> + int ret = 0;
>> +
>> + refs = reftable_be_downcast(ref_store, REF_STORE_READ, "fsck");
>> +
>> + if (o->verbose)
>> + fprintf_ln(stderr, _("Checking references consistency"));
>> +
>> + ret = reftable_fsck_check(refs->main_backend.stack, reftable_fsck_error_handler,
>> + reftable_fsck_verbose_handler, o);
>> + if (!ret)
>> + return ret;
>> +
>
> From my understanding, if we find that there is any trouble in the main
> worktree reftable backend, we would just abort the check. Should we
> continue to check the linked worktrees?
>
I think that makes sense. Let me make that change.
>> diff --git a/reftable/fsck.c b/reftable/fsck.c
>> new file mode 100644
>> index 0000000000..22ec3c26e9
>> --- /dev/null
>> +++ b/reftable/fsck.c
>> @@ -0,0 +1,50 @@
>> +#include "basics.h"
>> +#include "reftable-fsck.h"
>> +#include "stack.h"
>> +
>> +int reftable_fsck_check(struct reftable_stack *stack,
>> + reftable_fsck_report_fn report_fn,
>> + reftable_fsck_verbose_fn verbose_fn,
>> + void *cb_data)
>> +{
>> + char **names = NULL;
>> + uint64_t min, max;
>> + int err = 0;
>> +
>> + if (stack == NULL)
>> + goto out;
>> +
>> + err = read_lines(stack->list_file, &names);
>> + if (err < 0)
>> + goto out;
>> +
>> + verbose_fn("Checking reftable table names", cb_data);
>> +
>> + for (size_t i = 0; names[i]; i++) {
>> + struct reftable_fsck_info info = {
>> + .error = REFTABLE_FSCK_ERROR_TABLE_NAME,
>> + .path = names[i],
>> + .msg = "invalid reftable name"
>> + };
>
> Should we define this data structure outside of the loop? It's
> unnecessary here as we could change ".path" and ".msg" dynamically in
> the loop.
>
I don't think it'd make much difference for reftables, since tables are
geometrically packed. But I don't feel strongly, so I'll make the
change.
>> + uint32_t rnd;
>> + /*
>> + * We want to match the tail '.ref'. One extra byte to ensure
>> + * that there is no unexpected extra character and one byte for
>> + * the null terminator added by sscanf.
>> + */
>> + char tail[6];
>> +
>> + if (sscanf(names[i], "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x%5s",
>> + &min, &max, &rnd, tail) != 4) {
>> + err = report_fn(info, cb_data);
>
> I think we could just pass pointer to avoid unnecessary copy operations.
> Besides that, I think here we report two different kinds of problem. But
> we would give report the user always the same message `invalid reftable
> name`. This is too vague.
>
Not sure what you mean by 'unnecessary copy operations', could you
elaborate?
> I think we'd better set different messages for different problems.
>
Fair enough, let me modify that.
[snip]
>> diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
>> new file mode 100755
>> index 0000000000..0d11871b1c
>> --- /dev/null
>> +++ b/t/t0614-reftable-fsck.sh
>> @@ -0,0 +1,35 @@
>> +#!/bin/sh
>> +
>> +test_description='Test reftable backend consistency check'
>> +
>> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
>> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
>> +GIT_TEST_DEFAULT_REF_FORMAT=reftable
>> +export GIT_TEST_DEFAULT_REF_FORMAT
>> +
>> +. ./test-lib.sh
>> +
>> +test_expect_success 'table name should be checked' '
>> + test_when_finished "rm -rf repo" &&
>> + git init repo &&
>> + (
>> + cd repo &&
>> + git commit --allow-empty -m initial &&
>> +
>> + git refs verify 2>err &&
>> + test_must_be_empty err &&
>> +
>> + TABLE_NAME=$(cat .git/reftable/tables.list | head -n1) &&
>> + sed "1s/$/extra/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
>> + mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
>> + mv .git/reftable/${TABLE_NAME} .git/reftable/${TABLE_NAME}extra &&
>> +
>> + test_must_fail git refs verify 2>err &&
>> + cat >expect <<-EOF &&
>> + error: ${TABLE_NAME}extra: badReftableTableName: invalid reftable name
>> + EOF
>> + test_cmp expect err
>> + )
>> +'
>
> We would check two kinds of errors, should we add two tests instead of
> only this one.
>
Yeah, makes sense, will add!
>> +
>> +test_done
>>
>> --
>> 2.50.1
>>
>
> Thanks,
> Jialuo
Thanks for the review.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 96+ messages in thread* Re: [PATCH 2/5] refs/reftable: add fsck check for checking the table name
2025-09-01 13:33 ` Karthik Nayak
@ 2025-09-03 13:39 ` shejialuo
0 siblings, 0 replies; 96+ messages in thread
From: shejialuo @ 2025-09-03 13:39 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, ps
On Mon, Sep 01, 2025 at 06:33:24AM -0700, Karthik Nayak wrote:
> shejialuo <shejialuo@gmail.com> writes:
>
> > On Tue, Aug 19, 2025 at 02:21:01PM +0200, Karthik Nayak wrote:
> >> The `git refs verify` command is used to run fsck checks on the
> >> reference backends. This command is also invoked when users run 'git
> >> fsck'. While the files-backend has some fsck checks added, the reftable
> >> backend lacks such checks. Let's add the required infrastructure and a
> >> check to test for the table names in the 'tables.list' of reftables.
> >>
> >> For the infrastructure, since the reftable library is treated as an
> >> independent library we should ensure that the library code works
> >> independently without knowledge about Git's internals. To do this,
> >> add both 'reftable/fsck.c' and 'reftable/reftable-fsck.h'. Which
> >
> > A design question here, we name the "fsck.c" for the source code but for
> > the header, we use "reftable-fsck.h", it is a little strange. Why not
> > just "fsck.h" instead of "reftable-fsck.h".
> >
>
> Since the reftable code is treated as an external library, all
> 'reftable-.*.h' headers are treated as headers which expose APIs for the
> libraries users. We would have defined 'reftable/fsck.h' if there were
> internal users of the 'fsck.c' code. But there are none.
>
I understand the design. Thanks for the explanation.
[snip]
> >> + uint32_t rnd;
> >> + /*
> >> + * We want to match the tail '.ref'. One extra byte to ensure
> >> + * that there is no unexpected extra character and one byte for
> >> + * the null terminator added by sscanf.
> >> + */
> >> + char tail[6];
> >> +
> >> + if (sscanf(names[i], "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x%5s",
> >> + &min, &max, &rnd, tail) != 4) {
> >> + err = report_fn(info, cb_data);
> >
> > I think we could just pass pointer to avoid unnecessary copy operations.
> > Besides that, I think here we report two different kinds of problem. But
> > we would give report the user always the same message `invalid reftable
> > name`. This is too vague.
> >
>
> Not sure what you mean by 'unnecessary copy operations', could you
> elaborate?
>
In `report_fn`, we would copy the `info` value for each call. That's my
meaning.
Thanks,
Jialuo
^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH 3/5] refs/reftable: add fsck check for number of tables
2025-08-19 12:20 [PATCH 0/5] refs/reftable: add fsck checks Karthik Nayak
2025-08-19 12:21 ` [PATCH 1/5] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
2025-08-19 12:21 ` [PATCH 2/5] refs/reftable: add fsck check for checking the table name Karthik Nayak
@ 2025-08-19 12:21 ` Karthik Nayak
2025-08-26 16:33 ` shejialuo
2025-08-26 16:44 ` shejialuo
2025-08-19 12:21 ` [PATCH 4/5] refs/reftable: add fsck check for trailing newline Karthik Nayak
` (7 subsequent siblings)
10 siblings, 2 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-08-19 12:21 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak
Introduce a reftable fsck check to check that the number of files in the
reftable directory matches the number of files listed in 'tables.list'.
We do this by iterating over the files in the reftable directory and
counting all the files present excluding the 'tables.list'. This is also
exposed over Git's fsck checks as a 'badReftableStackCount' error.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Documentation/fsck-msgids.adoc | 3 +++
fsck.h | 1 +
refs/reftable-backend.c | 3 +++
reftable/fsck.c | 34 ++++++++++++++++++++++++++++++++++
reftable/reftable-fsck.h | 2 ++
t/t0614-reftable-fsck.sh | 20 ++++++++++++++++++++
6 files changed, 63 insertions(+)
diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 784ddc0df5..707e2fc50a 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -38,6 +38,9 @@
`badReferentName`::
(ERROR) The referent name of a symref is invalid.
+`badReftableStackCount`::
+ (ERROR) Mismatch in number of tables.
+
`badReftableTableName`::
(ERROR) A reftable table has an invalid name.
diff --git a/fsck.h b/fsck.h
index 5901f944a1..256effc4f8 100644
--- a/fsck.h
+++ b/fsck.h
@@ -34,6 +34,7 @@ enum fsck_msg_type {
FUNC(BAD_PACKED_REF_HEADER, ERROR) \
FUNC(BAD_PARENT_SHA1, ERROR) \
FUNC(BAD_REFERENT_NAME, ERROR) \
+ FUNC(BAD_REFTABLE_STACK_COUNT, ERROR) \
FUNC(BAD_REFTABLE_TABLE_NAME, ERROR) \
FUNC(BAD_REF_CONTENT, ERROR) \
FUNC(BAD_REF_FILETYPE, ERROR) \
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index ccd12052f2..616f4ee0f3 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -2695,6 +2695,9 @@ static int reftable_fsck_error_handler(struct reftable_fsck_info info,
case REFTABLE_FSCK_ERROR_TABLE_NAME:
msg_id = FSCK_MSG_BAD_REFTABLE_TABLE_NAME;
break;
+ case REFTABLE_FSCK_ERROR_STACK_COUNT:
+ msg_id = FSCK_MSG_BAD_REFTABLE_STACK_COUNT;
+ break;
default:
BUG("unknown fsck error: %d", info.error);
}
diff --git a/reftable/fsck.c b/reftable/fsck.c
index 22ec3c26e9..e92a630276 100644
--- a/reftable/fsck.c
+++ b/reftable/fsck.c
@@ -2,6 +2,28 @@
#include "reftable-fsck.h"
#include "stack.h"
+static int reftable_fsck_valid_stack_count(struct reftable_stack *st)
+{
+ DIR *dir = opendir(st->reftable_dir);
+ struct dirent *d = NULL;
+ unsigned int count = 0;
+
+ if (!dir)
+ return 0;
+
+ while ((d = readdir(dir))) {
+ if (!strcmp(d->d_name, "tables.list"))
+ continue;
+
+ if (d->d_type == DT_REG)
+ count++;
+ }
+
+ closedir(dir);
+
+ return count == st->tables_len;
+}
+
int reftable_fsck_check(struct reftable_stack *stack,
reftable_fsck_report_fn report_fn,
reftable_fsck_verbose_fn verbose_fn,
@@ -44,6 +66,18 @@ int reftable_fsck_check(struct reftable_stack *stack,
}
}
+ verbose_fn("Checking reftable tables count", cb_data);
+
+ if (!reftable_fsck_valid_stack_count(stack)) {
+ struct reftable_fsck_info info = {
+ .error = REFTABLE_FSCK_ERROR_STACK_COUNT,
+ .path = stack->list_file,
+ .msg = "mismatch in number of tables"
+ };
+
+ err = report_fn(info, cb_data);
+ }
+
out:
free_names(names);
return err;
diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
index 087430d979..888c3968b7 100644
--- a/reftable/reftable-fsck.h
+++ b/reftable/reftable-fsck.h
@@ -6,6 +6,8 @@
enum reftable_fsck_error {
/* Invalid table name */
REFTABLE_FSCK_ERROR_TABLE_NAME = -1,
+ /* Incorrect number of tables present */
+ REFTABLE_FSCK_ERROR_STACK_COUNT = -2,
};
/* Represents an individual error encounctered during the FSCK checks. */
diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
index 0d11871b1c..a351fed562 100755
--- a/t/t0614-reftable-fsck.sh
+++ b/t/t0614-reftable-fsck.sh
@@ -32,4 +32,24 @@ test_expect_success 'table name should be checked' '
)
'
+test_expect_success 'table count should be checked' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git commit --allow-empty -m initial &&
+
+ git refs verify 2>err &&
+ test_must_be_empty err &&
+
+ touch .git/reftable/0x000000002812-0x000000002813-c830a596.ref &&
+
+ test_must_fail git refs verify 2>err &&
+ cat >expect <<-EOF &&
+ error: $(pwd)/.git/reftable/tables.list: badReftableStackCount: mismatch in number of tables
+ EOF
+ test_cmp expect err
+ )
+'
+
test_done
--
2.50.1
^ permalink raw reply related [flat|nested] 96+ messages in thread* Re: [PATCH 3/5] refs/reftable: add fsck check for number of tables
2025-08-19 12:21 ` [PATCH 3/5] refs/reftable: add fsck check for number of tables Karthik Nayak
@ 2025-08-26 16:33 ` shejialuo
2025-09-01 13:40 ` Karthik Nayak
2025-08-26 16:44 ` shejialuo
1 sibling, 1 reply; 96+ messages in thread
From: shejialuo @ 2025-08-26 16:33 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git
On Tue, Aug 19, 2025 at 02:21:02PM +0200, Karthik Nayak wrote:
> diff --git a/reftable/fsck.c b/reftable/fsck.c
> index 22ec3c26e9..e92a630276 100644
> --- a/reftable/fsck.c
> +++ b/reftable/fsck.c
> @@ -2,6 +2,28 @@
> #include "reftable-fsck.h"
> #include "stack.h"
>
> +static int reftable_fsck_valid_stack_count(struct reftable_stack *st)
> +{
> + DIR *dir = opendir(st->reftable_dir);
> + struct dirent *d = NULL;
> + unsigned int count = 0;
> +
> + if (!dir)
> + return 0;
> +
> + while ((d = readdir(dir))) {
> + if (!strcmp(d->d_name, "tables.list"))
> + continue;
> +
> + if (d->d_type == DT_REG)
> + count++;
> + }
> +
> + closedir(dir);
> +
> + return count == st->tables_len;
> +}
> +
The above logic is clear to understand but I think we should our
internal interface in "dir-iterator.h" to implement above logic.
> int reftable_fsck_check(struct reftable_stack *stack,
> reftable_fsck_report_fn report_fn,
> reftable_fsck_verbose_fn verbose_fn,
> @@ -44,6 +66,18 @@ int reftable_fsck_check(struct reftable_stack *stack,
> }
> }
>
> + verbose_fn("Checking reftable tables count", cb_data);
> +
> + if (!reftable_fsck_valid_stack_count(stack)) {
> + struct reftable_fsck_info info = {
> + .error = REFTABLE_FSCK_ERROR_STACK_COUNT,
> + .path = stack->list_file,
> + .msg = "mismatch in number of tables"
> + };
> +
When reading here, I somehow understand the reason why you define this
data structure in the loop. But I still think we could just define only
one `info`.
BTY, I wonder whether we should define some auxiliary functions for each
check instead of adding logic directly in `reftable_fsck_check`
function?
> + err = report_fn(info, cb_data);
> + }
> +
> out:
> free_names(names);
> return err;
Thanks,
Jialuo
^ permalink raw reply [flat|nested] 96+ messages in thread* Re: [PATCH 3/5] refs/reftable: add fsck check for number of tables
2025-08-26 16:33 ` shejialuo
@ 2025-09-01 13:40 ` Karthik Nayak
0 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-09-01 13:40 UTC (permalink / raw)
To: shejialuo; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 2447 bytes --]
shejialuo <shejialuo@gmail.com> writes:
> On Tue, Aug 19, 2025 at 02:21:02PM +0200, Karthik Nayak wrote:
>> diff --git a/reftable/fsck.c b/reftable/fsck.c
>> index 22ec3c26e9..e92a630276 100644
>> --- a/reftable/fsck.c
>> +++ b/reftable/fsck.c
>> @@ -2,6 +2,28 @@
>> #include "reftable-fsck.h"
>> #include "stack.h"
>>
>> +static int reftable_fsck_valid_stack_count(struct reftable_stack *st)
>> +{
>> + DIR *dir = opendir(st->reftable_dir);
>> + struct dirent *d = NULL;
>> + unsigned int count = 0;
>> +
>> + if (!dir)
>> + return 0;
>> +
>> + while ((d = readdir(dir))) {
>> + if (!strcmp(d->d_name, "tables.list"))
>> + continue;
>> +
>> + if (d->d_type == DT_REG)
>> + count++;
>> + }
>> +
>> + closedir(dir);
>> +
>> + return count == st->tables_len;
>> +}
>> +
>
> The above logic is clear to understand but I think we should our
> internal interface in "dir-iterator.h" to implement above logic.
>
Since the reftable library is treated as external one. We can't add and
rely on code outside of the library. That's why you'll see some
duplication here and there.
>> int reftable_fsck_check(struct reftable_stack *stack,
>> reftable_fsck_report_fn report_fn,
>> reftable_fsck_verbose_fn verbose_fn,
>> @@ -44,6 +66,18 @@ int reftable_fsck_check(struct reftable_stack *stack,
>> }
>> }
>>
>> + verbose_fn("Checking reftable tables count", cb_data);
>> +
>> + if (!reftable_fsck_valid_stack_count(stack)) {
>> + struct reftable_fsck_info info = {
>> + .error = REFTABLE_FSCK_ERROR_STACK_COUNT,
>> + .path = stack->list_file,
>> + .msg = "mismatch in number of tables"
>> + };
>> +
>
> When reading here, I somehow understand the reason why you define this
> data structure in the loop. But I still think we could just define only
> one `info`.
>
I tried to rewrite it like you suggested, but I think it still makes
sense to keep the error definitions separate. They help provide
localized context. Otherwise, we'd define the error at the start, then
set individual fields later on. This causes some confusion.
>
> BTY, I wonder whether we should define some auxiliary functions for each
> check instead of adding logic directly in `reftable_fsck_check`
> function?
>
Post this patch series we'll dive into block and reference checks, which
will be isolated into individual functions.
>> + err = report_fn(info, cb_data);
>> + }
>> +
>> out:
>> free_names(names);
>> return err;
>
> Thanks,
> Jialuo
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH 3/5] refs/reftable: add fsck check for number of tables
2025-08-19 12:21 ` [PATCH 3/5] refs/reftable: add fsck check for number of tables Karthik Nayak
2025-08-26 16:33 ` shejialuo
@ 2025-08-26 16:44 ` shejialuo
2025-09-01 13:52 ` Karthik Nayak
1 sibling, 1 reply; 96+ messages in thread
From: shejialuo @ 2025-08-26 16:44 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git
On Tue, Aug 19, 2025 at 02:21:02PM +0200, Karthik Nayak wrote:
> +test_expect_success 'table count should be checked' '
> + test_when_finished "rm -rf repo" &&
> + git init repo &&
> + (
> + cd repo &&
> + git commit --allow-empty -m initial &&
> +
> + git refs verify 2>err &&
> + test_must_be_empty err &&
> +
> + touch .git/reftable/0x000000002812-0x000000002813-c830a596.ref &&
> +
> + test_must_fail git refs verify 2>err &&
> + cat >expect <<-EOF &&
> + error: $(pwd)/.git/reftable/tables.list: badReftableStackCount: mismatch in number of tables
This is a bad usage, we should just use `reftable/tables.list`. And this
is a common pattern. We would print the relative path against the ".git"
directory.
Thanks,
Jialuo
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH 3/5] refs/reftable: add fsck check for number of tables
2025-08-26 16:44 ` shejialuo
@ 2025-09-01 13:52 ` Karthik Nayak
0 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-09-01 13:52 UTC (permalink / raw)
To: shejialuo; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 915 bytes --]
shejialuo <shejialuo@gmail.com> writes:
> On Tue, Aug 19, 2025 at 02:21:02PM +0200, Karthik Nayak wrote:
>> +test_expect_success 'table count should be checked' '
>> + test_when_finished "rm -rf repo" &&
>> + git init repo &&
>> + (
>> + cd repo &&
>> + git commit --allow-empty -m initial &&
>> +
>> + git refs verify 2>err &&
>> + test_must_be_empty err &&
>> +
>> + touch .git/reftable/0x000000002812-0x000000002813-c830a596.ref &&
>> +
>> + test_must_fail git refs verify 2>err &&
>> + cat >expect <<-EOF &&
>> + error: $(pwd)/.git/reftable/tables.list: badReftableStackCount: mismatch in number of tables
>
> This is a bad usage, we should just use `reftable/tables.list`. And this
> is a common pattern. We would print the relative path against the ".git"
> directory.
>
Good point, this can be fixed to 'reftable/tables.list', we don't need
to obtain it from the stack.
> Thanks,
> Jialuo
Thanks
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH 4/5] refs/reftable: add fsck check for trailing newline
2025-08-19 12:20 [PATCH 0/5] refs/reftable: add fsck checks Karthik Nayak
` (2 preceding siblings ...)
2025-08-19 12:21 ` [PATCH 3/5] refs/reftable: add fsck check for number of tables Karthik Nayak
@ 2025-08-19 12:21 ` Karthik Nayak
2025-08-19 12:21 ` [PATCH 5/5] refs/reftable: add fsck check for incorrect update index Karthik Nayak
` (6 subsequent siblings)
10 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-08-19 12:21 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak
Introduce a fsck check for the reftable backend, which checks if the
'tables.list' contains a newline. The reftable backend writes a trailing
newline when writing the 'tables.list', but it doesn't check for it when
reading the file. A missing newline however indicates that the file was
manually tampered with, so let's raise this as an error to the user.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Documentation/fsck-msgids.adoc | 3 +++
fsck.h | 1 +
refs/reftable-backend.c | 3 +++
reftable/fsck.c | 36 ++++++++++++++++++++++++++++++++++++
reftable/reftable-fsck.h | 2 ++
t/t0614-reftable-fsck.sh | 21 +++++++++++++++++++++
6 files changed, 66 insertions(+)
diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 707e2fc50a..1432b1de06 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -41,6 +41,9 @@
`badReftableStackCount`::
(ERROR) Mismatch in number of tables.
+`badReftableStackListNewline`::
+ (ERROR) Reftable stack list missing trailing newline.
+
`badReftableTableName`::
(ERROR) A reftable table has an invalid name.
diff --git a/fsck.h b/fsck.h
index 256effc4f8..33432bae79 100644
--- a/fsck.h
+++ b/fsck.h
@@ -35,6 +35,7 @@ enum fsck_msg_type {
FUNC(BAD_PARENT_SHA1, ERROR) \
FUNC(BAD_REFERENT_NAME, ERROR) \
FUNC(BAD_REFTABLE_STACK_COUNT, ERROR) \
+ FUNC(BAD_REFTABLE_STACK_LIST_NEWLINE, ERROR) \
FUNC(BAD_REFTABLE_TABLE_NAME, ERROR) \
FUNC(BAD_REF_CONTENT, ERROR) \
FUNC(BAD_REF_FILETYPE, ERROR) \
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 616f4ee0f3..0087afa3ac 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -2698,6 +2698,9 @@ static int reftable_fsck_error_handler(struct reftable_fsck_info info,
case REFTABLE_FSCK_ERROR_STACK_COUNT:
msg_id = FSCK_MSG_BAD_REFTABLE_STACK_COUNT;
break;
+ case REFTABLE_FSCK_ERROR_STACK_LIST_MISSING_NEWLINE:
+ msg_id = FSCK_MSG_BAD_REFTABLE_STACK_LIST_NEWLINE;
+ break;
default:
BUG("unknown fsck error: %d", info.error);
}
diff --git a/reftable/fsck.c b/reftable/fsck.c
index e92a630276..b4898fd2cd 100644
--- a/reftable/fsck.c
+++ b/reftable/fsck.c
@@ -1,7 +1,31 @@
#include "basics.h"
+#include "reftable-error.h"
#include "reftable-fsck.h"
#include "stack.h"
+static int reftable_fsck_stack_contains_newline(const char *list_file)
+{
+ FILE *f = fopen(list_file, "r");
+ int c = 0;
+
+ if (f == NULL) {
+ if (errno == ENOENT)
+ return 0;
+ return REFTABLE_IO_ERROR;
+ }
+
+ if (fseek(f, 0, SEEK_END) == 0) {
+ long size = ftell(f);
+ if (size <= 0)
+ return REFTABLE_IO_ERROR;
+ fseek(f, -1, SEEK_END);
+ c = fgetc(f);
+ }
+ fclose(f);
+
+ return c == '\n';
+}
+
static int reftable_fsck_valid_stack_count(struct reftable_stack *st)
{
DIR *dir = opendir(st->reftable_dir);
@@ -66,6 +90,18 @@ int reftable_fsck_check(struct reftable_stack *stack,
}
}
+ verbose_fn("Checking trailing newline in stack list", cb_data);
+
+ if (!reftable_fsck_stack_contains_newline(stack->list_file)) {
+ struct reftable_fsck_info info = {
+ .error = REFTABLE_FSCK_ERROR_STACK_LIST_MISSING_NEWLINE,
+ .path = stack->list_file,
+ .msg = "trailing newline missing in stack list"
+ };
+
+ err = report_fn(info, cb_data);
+ }
+
verbose_fn("Checking reftable tables count", cb_data);
if (!reftable_fsck_valid_stack_count(stack)) {
diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
index 888c3968b7..8e6cb6c7d2 100644
--- a/reftable/reftable-fsck.h
+++ b/reftable/reftable-fsck.h
@@ -8,6 +8,8 @@ enum reftable_fsck_error {
REFTABLE_FSCK_ERROR_TABLE_NAME = -1,
/* Incorrect number of tables present */
REFTABLE_FSCK_ERROR_STACK_COUNT = -2,
+ /* Newline missing at the end of the stack list */
+ REFTABLE_FSCK_ERROR_STACK_LIST_MISSING_NEWLINE = -3,
};
/* Represents an individual error encounctered during the FSCK checks. */
diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
index a351fed562..937c5dd37a 100755
--- a/t/t0614-reftable-fsck.sh
+++ b/t/t0614-reftable-fsck.sh
@@ -52,4 +52,25 @@ test_expect_success 'table count should be checked' '
)
'
+test_expect_success 'stack list must contain trailing newline' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git commit --allow-empty -m initial &&
+
+ git refs verify 2>err &&
+ test_must_be_empty err &&
+
+ printf "%s" "$(cat .git/reftable/tables.list)" >.git/reftable/tables.list.tmp &&
+ mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
+
+ test_must_fail git refs verify 2>err &&
+ cat >expect <<-EOF &&
+ error: $(pwd)/.git/reftable/tables.list: badReftableStackListNewline: trailing newline missing in stack list
+ EOF
+ test_cmp expect err
+ )
+'
+
test_done
--
2.50.1
^ permalink raw reply related [flat|nested] 96+ messages in thread* [PATCH 5/5] refs/reftable: add fsck check for incorrect update index
2025-08-19 12:20 [PATCH 0/5] refs/reftable: add fsck checks Karthik Nayak
` (3 preceding siblings ...)
2025-08-19 12:21 ` [PATCH 4/5] refs/reftable: add fsck check for trailing newline Karthik Nayak
@ 2025-08-19 12:21 ` Karthik Nayak
2025-08-26 16:39 ` [PATCH 0/5] refs/reftable: add fsck checks shejialuo
` (5 subsequent siblings)
10 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-08-19 12:21 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak
Introduce a fsck check for the reftable backend, which checks if the
tables in 'tables.list' contain sequential update index. The tables in
the reftable backend should contain sequential update index. This fsck
check ensures that.
We must note that the reftable backend itself doesn't check to ensure
this and it also doesn't check to ensure that the index in the table
name matches the index in the header or the table. The latter is not
implemented in this fsck check either and will be added in a future
patch where we add fsck checks for internals of a table.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Documentation/fsck-msgids.adoc | 3 ++
fsck.h | 1 +
refs/reftable-backend.c | 3 ++
reftable/fsck.c | 14 +++++++++-
reftable/reftable-fsck.h | 2 ++
t/t0614-reftable-fsck.sh | 62 ++++++++++++++++++++++++++++++++++++++++++
6 files changed, 84 insertions(+), 1 deletion(-)
diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 1432b1de06..982d51876c 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -47,6 +47,9 @@
`badReftableTableName`::
(ERROR) A reftable table has an invalid name.
+`badReftableUpdateIndex`::
+ (ERROR) Incorrect update index found for table.
+
`badTagName`::
(INFO) A tag has an invalid format.
diff --git a/fsck.h b/fsck.h
index 33432bae79..60e9b84183 100644
--- a/fsck.h
+++ b/fsck.h
@@ -37,6 +37,7 @@ enum fsck_msg_type {
FUNC(BAD_REFTABLE_STACK_COUNT, ERROR) \
FUNC(BAD_REFTABLE_STACK_LIST_NEWLINE, ERROR) \
FUNC(BAD_REFTABLE_TABLE_NAME, ERROR) \
+ FUNC(BAD_REFTABLE_UPDATE_INDEX, ERROR) \
FUNC(BAD_REF_CONTENT, ERROR) \
FUNC(BAD_REF_FILETYPE, ERROR) \
FUNC(BAD_REF_NAME, ERROR) \
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 0087afa3ac..d5993238db 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -2701,6 +2701,9 @@ static int reftable_fsck_error_handler(struct reftable_fsck_info info,
case REFTABLE_FSCK_ERROR_STACK_LIST_MISSING_NEWLINE:
msg_id = FSCK_MSG_BAD_REFTABLE_STACK_LIST_NEWLINE;
break;
+ case REFTABLE_FSCK_ERROR_UPDATE_INDEX:
+ msg_id = FSCK_MSG_BAD_REFTABLE_UPDATE_INDEX;
+ break;
default:
BUG("unknown fsck error: %d", info.error);
}
diff --git a/reftable/fsck.c b/reftable/fsck.c
index b4898fd2cd..a6551b9a3c 100644
--- a/reftable/fsck.c
+++ b/reftable/fsck.c
@@ -53,8 +53,8 @@ int reftable_fsck_check(struct reftable_stack *stack,
reftable_fsck_verbose_fn verbose_fn,
void *cb_data)
{
+ uint64_t min, max, prev_max = 0;
char **names = NULL;
- uint64_t min, max;
int err = 0;
if (stack == NULL)
@@ -85,9 +85,21 @@ int reftable_fsck_check(struct reftable_stack *stack,
err = report_fn(info, cb_data);
}
+ if (min != (prev_max + 1) || max < min) {
+ struct reftable_fsck_info info = {
+ .error = REFTABLE_FSCK_ERROR_UPDATE_INDEX,
+ .path = names[i],
+ .msg = "incorrect update index in table name"
+ };
+
+ err = report_fn(info, cb_data);
+ }
+
if (strcmp(tail, ".ref")) {
err = report_fn(info, cb_data);
}
+
+ prev_max = max;
}
verbose_fn("Checking trailing newline in stack list", cb_data);
diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
index 8e6cb6c7d2..49437280bb 100644
--- a/reftable/reftable-fsck.h
+++ b/reftable/reftable-fsck.h
@@ -10,6 +10,8 @@ enum reftable_fsck_error {
REFTABLE_FSCK_ERROR_STACK_COUNT = -2,
/* Newline missing at the end of the stack list */
REFTABLE_FSCK_ERROR_STACK_LIST_MISSING_NEWLINE = -3,
+ /* Incorrect update index for table */
+ REFTABLE_FSCK_ERROR_UPDATE_INDEX = -4,
};
/* Represents an individual error encounctered during the FSCK checks. */
diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
index 937c5dd37a..bdcbd65a9f 100755
--- a/t/t0614-reftable-fsck.sh
+++ b/t/t0614-reftable-fsck.sh
@@ -73,4 +73,66 @@ test_expect_success 'stack list must contain trailing newline' '
)
'
+test_expect_success 'table update index should be sequential between tables' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git commit --allow-empty -m initial &&
+
+ # Lock the existing table to disable auto-compaction
+ CUR_TABLE=$(cat .git/reftable/tables.list | tail -n1) &&
+ touch .git/reftable/${CUR_TABLE}.lock &&
+ git update-ref refs/heads/sample @ &&
+ rm .git/reftable/${CUR_TABLE}.lock &&
+
+ git refs verify 2>err &&
+ test_must_be_empty err &&
+
+ TABLE_NAME=$(cat .git/reftable/tables.list | tail -n1) &&
+ NEW_TABLE_NAME=$(echo ${TABLE_NAME} | sed "s/0003/0009/g") &&
+
+ sed "2s/.*/${NEW_TABLE_NAME}/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
+ mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
+ mv .git/reftable/${TABLE_NAME} .git/reftable/${NEW_TABLE_NAME} &&
+
+ test_must_fail git refs verify 2>err &&
+ cat >expect <<-EOF &&
+ error: ${NEW_TABLE_NAME}: badReftableUpdateIndex: incorrect update index in table name
+ EOF
+ test_cmp expect err
+ )
+'
+
+test_expect_success 'table update index should be sequential within a table' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git commit --allow-empty -m initial &&
+
+ # Lock the existing table to disable auto-compaction
+ CUR_TABLE=$(cat .git/reftable/tables.list | tail -n1) &&
+ touch .git/reftable/${CUR_TABLE}.lock &&
+ git update-ref refs/heads/sample @ &&
+ rm .git/reftable/${CUR_TABLE}.lock &&
+
+ git refs verify 2>err &&
+ test_must_be_empty err &&
+
+ TABLE_NAME=$(cat .git/reftable/tables.list | tail -n1) &&
+ NEW_TABLE_NAME=$(echo ${TABLE_NAME} | sed "s/\(.*\)0003/\10002/") &&
+
+ sed "2s/.*/${NEW_TABLE_NAME}/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
+ mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
+ mv .git/reftable/${TABLE_NAME} .git/reftable/${NEW_TABLE_NAME} &&
+
+ test_must_fail git refs verify 2>err &&
+ cat >expect <<-EOF &&
+ error: ${NEW_TABLE_NAME}: badReftableUpdateIndex: incorrect update index in table name
+ EOF
+ test_cmp expect err
+ )
+'
+
test_done
--
2.50.1
^ permalink raw reply related [flat|nested] 96+ messages in thread* Re: [PATCH 0/5] refs/reftable: add fsck checks
2025-08-19 12:20 [PATCH 0/5] refs/reftable: add fsck checks Karthik Nayak
` (4 preceding siblings ...)
2025-08-19 12:21 ` [PATCH 5/5] refs/reftable: add fsck check for incorrect update index Karthik Nayak
@ 2025-08-26 16:39 ` shejialuo
2025-09-01 13:52 ` Karthik Nayak
2025-09-02 7:05 ` [PATCH v2 " Karthik Nayak
` (4 subsequent siblings)
10 siblings, 1 reply; 96+ messages in thread
From: shejialuo @ 2025-08-26 16:39 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git
On Tue, Aug 19, 2025 at 02:20:59PM +0200, Karthik Nayak wrote:
> This series adds the required infrastructure and also some fsck checks
> for the reftable backend.
>
> Since the reftable backend is treated as a library within the Git
> codebase, we don't want to spillover our internal fsck implementation
> into the library. At the same time, the fsck checks need to access
> internal structures of the reftable library which aren't exposed outside
> the library.
>
> So we solve this by adding a 'reftable/fsck.[ch]' which implements and
> exposes a checker for the reftable library and returns specific errors
> as defined by the library. We then add glue code within
> 'refs/reftable-backend.c' to map these errors to errors which Git's fsck
> implementation would understand. This allows us to separate concerns.
>
> This series then adds some checks on the stack ('reftable/tables.list')
> level of reftable, namely:
> 1. The table name is as per the spec
> 2. The number of tables are consistent
> 3. The tables.list has a newline at the end of file
> 4. The table names follow correct index sequences
>
> I also plan to send in follow up series's which will implement further
> checks and go into deeper layers (tables, block, references).
>
Thanks for your patches, it's very nice to see that we begin to
implement the consistency checks for reftable backend. And I have left
some comments.
Thanks,
Jialuo
^ permalink raw reply [flat|nested] 96+ messages in thread* Re: [PATCH 0/5] refs/reftable: add fsck checks
2025-08-26 16:39 ` [PATCH 0/5] refs/reftable: add fsck checks shejialuo
@ 2025-09-01 13:52 ` Karthik Nayak
0 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-09-01 13:52 UTC (permalink / raw)
To: shejialuo; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 1573 bytes --]
shejialuo <shejialuo@gmail.com> writes:
> On Tue, Aug 19, 2025 at 02:20:59PM +0200, Karthik Nayak wrote:
>> This series adds the required infrastructure and also some fsck checks
>> for the reftable backend.
>>
>> Since the reftable backend is treated as a library within the Git
>> codebase, we don't want to spillover our internal fsck implementation
>> into the library. At the same time, the fsck checks need to access
>> internal structures of the reftable library which aren't exposed outside
>> the library.
>>
>> So we solve this by adding a 'reftable/fsck.[ch]' which implements and
>> exposes a checker for the reftable library and returns specific errors
>> as defined by the library. We then add glue code within
>> 'refs/reftable-backend.c' to map these errors to errors which Git's fsck
>> implementation would understand. This allows us to separate concerns.
>>
>> This series then adds some checks on the stack ('reftable/tables.list')
>> level of reftable, namely:
>> 1. The table name is as per the spec
>> 2. The number of tables are consistent
>> 3. The tables.list has a newline at the end of file
>> 4. The table names follow correct index sequences
>>
>> I also plan to send in follow up series's which will implement further
>> checks and go into deeper layers (tables, block, references).
>>
>
> Thanks for your patches, it's very nice to see that we begin to
> implement the consistency checks for reftable backend. And I have left
> some comments.
>
Thanks for your comments and the review. I'll send in a new version soon.
> Thanks,
> Jialuo
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH v2 0/5] refs/reftable: add fsck checks
2025-08-19 12:20 [PATCH 0/5] refs/reftable: add fsck checks Karthik Nayak
` (5 preceding siblings ...)
2025-08-26 16:39 ` [PATCH 0/5] refs/reftable: add fsck checks shejialuo
@ 2025-09-02 7:05 ` Karthik Nayak
2025-09-02 7:05 ` [PATCH v2 1/5] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
` (4 more replies)
2025-09-18 8:11 ` [PATCH v3 0/8] refs/reftable: add consistency checks Karthik Nayak
` (3 subsequent siblings)
10 siblings, 5 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-09-02 7:05 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, jltobler, shejialuo
This series adds the required infrastructure and also some fsck checks
for the reftable backend.
Since the reftable backend is treated as a library within the Git
codebase, we don't want to spillover our internal fsck implementation
into the library. At the same time, the fsck checks need to access
internal structures of the reftable library which aren't exposed outside
the library.
So we solve this by adding a 'reftable/fsck.[ch]' which implements and
exposes a checker for the reftable library and returns specific errors
as defined by the library. We then add glue code within
'refs/reftable-backend.c' to map these errors to errors which Git's fsck
implementation would understand. This allows us to separate concerns.
This series then adds some checks on the stack ('reftable/tables.list')
level of reftable, namely:
1. The table name is as per the spec
2. The number of tables are consistent
3. The tables.list has a newline at the end of file
4. The table names follow correct index sequences
I also plan to send in follow up series's which will implement further
checks and go into deeper layers (tables, block, references).
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Changes in v2:
- Ensured that 'struct reftable_fsck_info' is passed around as a
pointer, this provides a smaller footprint (pointer size vs struct
size).
- Run FSCK checks for other worktrees too, even if one of them fails.
- Separate messaging for table name vs table check and add additional
test.
- Use the relative path in messages used.
- Small style and typo fixes.
- Link to v1: https://lore.kernel.org/r/20250819-228-reftable-introduce-consistency-checks-v1-0-8b8f6879fa9e@gmail.com
---
Documentation/fsck-msgids.adoc | 15 +++-
Makefile | 1 +
fsck.h | 154 ++++++++++++++++++++-------------------
meson.build | 1 +
refs/reftable-backend.c | 66 +++++++++++++++--
reftable/fsck.c | 134 ++++++++++++++++++++++++++++++++++
reftable/reftable-fsck.h | 44 +++++++++++
t/meson.build | 3 +-
t/t0614-reftable-fsck.sh | 161 +++++++++++++++++++++++++++++++++++++++++
9 files changed, 494 insertions(+), 85 deletions(-)
Karthik Nayak (5):
fsck: order 'fsck_msg_type' alphabetically
refs/reftable: add fsck check for checking the table name
refs/reftable: add fsck check for number of tables
refs/reftable: add fsck check for trailing newline
refs/reftable: add fsck check for incorrect update index
Range-diff versus v1:
1: d1875fbbc7 = 1: c049cd428a fsck: order 'fsck_msg_type' alphabetically
2: b63799aad1 ! 2: 1e46786745 refs/reftable: add fsck check for checking the table name
@@ Commit message
Also add 'badReftableTableName' as a corresponding error within Git. Add
a test to check for this behavior.
+ While here, remove a unused header `#include "../lockfile.h"` from
+ 'refs/reftable-backend.c'.
+
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
## Documentation/fsck-msgids.adoc ##
@@ refs/reftable-backend.c: static int reftable_be_reflog_expire(struct ref_store *
+ fprintf_ln(stderr, "%s", _(msg));
+}
+
-+static int reftable_fsck_error_handler(struct reftable_fsck_info info,
++static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
+ void *cb_data)
+{
++ struct fsck_ref_report report = { .path = info->path };
+ struct fsck_options *o = cb_data;
-+ struct fsck_ref_report report = { .path = info.path };
+ enum fsck_msg_id msg_id;
+
-+ switch (info.error) {
++ switch (info->error) {
+ case REFTABLE_FSCK_ERROR_TABLE_NAME:
+ msg_id = FSCK_MSG_BAD_REFTABLE_TABLE_NAME;
+ break;
+ default:
-+ BUG("unknown fsck error: %d", info.error);
++ BUG("unknown fsck error: %d", info->error);
+ }
+
-+ return fsck_report_ref(o, &report, msg_id, "%s", info.msg);
++ return fsck_report_ref(o, &report, msg_id, "%s", info->msg);
+}
+
+static int reftable_be_fsck(struct ref_store *ref_store, struct fsck_options *o,
@@ refs/reftable-backend.c: static int reftable_be_reflog_expire(struct ref_store *
+ if (o->verbose)
+ fprintf_ln(stderr, _("Checking references consistency"));
+
-+ ret = reftable_fsck_check(refs->main_backend.stack, reftable_fsck_error_handler,
++ ret |= reftable_fsck_check(refs->main_backend.stack, reftable_fsck_error_handler,
+ reftable_fsck_verbose_handler, o);
-+ if (!ret)
-+ return ret;
+
+ strmap_for_each_entry(&refs->worktree_backends, &iter, entry) {
+ struct reftable_backend *b = (struct reftable_backend *)entry->value;
-+ ret = reftable_fsck_check(b->stack, reftable_fsck_error_handler,
++ ret |= reftable_fsck_check(b->stack, reftable_fsck_error_handler,
+ reftable_fsck_verbose_handler, o);
-+ if (!ret)
-+ return ret;
+ }
+
+ return ret;
@@ reftable/fsck.c (new)
+ reftable_fsck_verbose_fn verbose_fn,
+ void *cb_data)
+{
++
+ char **names = NULL;
+ uint64_t min, max;
+ int err = 0;
@@ reftable/fsck.c (new)
+ struct reftable_fsck_info info = {
+ .error = REFTABLE_FSCK_ERROR_TABLE_NAME,
+ .path = names[i],
-+ .msg = "invalid reftable name"
+ };
+ uint32_t rnd;
+ /*
@@ reftable/fsck.c (new)
+
+ if (sscanf(names[i], "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x%5s",
+ &min, &max, &rnd, tail) != 4) {
-+ err = report_fn(info, cb_data);
++ info.msg = "invalid reftable table name";
++ err = report_fn(&info, cb_data);
++ continue;
+ }
+
+ if (strcmp(tail, ".ref")) {
-+ err = report_fn(info, cb_data);
++ info.msg = "invalid reftable table extension";
++ err = report_fn(&info, cb_data);
+ }
+ }
+
@@ reftable/reftable-fsck.h (new)
+ REFTABLE_FSCK_ERROR_TABLE_NAME = -1,
+};
+
-+/* Represents an individual error encounctered during the FSCK checks. */
++/* Represents an individual error encountered during the FSCK checks. */
+struct reftable_fsck_info {
+ enum reftable_fsck_error error;
+ const char *msg;
+ const char *path;
+};
+
-+typedef int reftable_fsck_report_fn(struct reftable_fsck_info info,
++typedef int reftable_fsck_report_fn(struct reftable_fsck_info *info,
+ void *cb_data);
+typedef void reftable_fsck_verbose_fn(const char *msg, void *cb_data);
+
@@ reftable/reftable-fsck.h (new)
+ *
+ * If an issue is encountered, the issue is reported to the callee via the
+ * provided 'report_fn'. If the issue is non-recoverable the flow will not
-+ * conitnue. If it is recoverable, the flow will continue and further issues
++ * continue. If it is recoverable, the flow will continue and further issues
+ * will be reported as identified.
+ *
+ * The 'verbose_fn' will be invoked to provide verbose information about
@@ t/t0614-reftable-fsck.sh (new)
+ test_must_be_empty err &&
+
+ TABLE_NAME=$(cat .git/reftable/tables.list | head -n1) &&
++ sed "1s/^/extra/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
++ mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
++ mv .git/reftable/${TABLE_NAME} .git/reftable/extra${TABLE_NAME} &&
++
++ test_must_fail git refs verify 2>err &&
++ cat >expect <<-EOF &&
++ error: extra${TABLE_NAME}: badReftableTableName: invalid reftable table name
++ EOF
++ test_cmp expect err
++ )
++'
++
++test_expect_success 'table name should be checked' '
++ test_when_finished "rm -rf repo" &&
++ git init repo &&
++ (
++ cd repo &&
++ git commit --allow-empty -m initial &&
++
++ git refs verify 2>err &&
++ test_must_be_empty err &&
++
++ TABLE_NAME=$(cat .git/reftable/tables.list | head -n1) &&
+ sed "1s/$/extra/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
+ mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
+ mv .git/reftable/${TABLE_NAME} .git/reftable/${TABLE_NAME}extra &&
+
+ test_must_fail git refs verify 2>err &&
+ cat >expect <<-EOF &&
-+ error: ${TABLE_NAME}extra: badReftableTableName: invalid reftable name
++ error: ${TABLE_NAME}extra: badReftableTableName: invalid reftable table extension
+ EOF
+ test_cmp expect err
+ )
3: 4c6c99ded3 ! 3: 52fc14fdeb refs/reftable: add fsck check for number of tables
@@ fsck.h: enum fsck_msg_type {
FUNC(BAD_REF_FILETYPE, ERROR) \
## refs/reftable-backend.c ##
-@@ refs/reftable-backend.c: static int reftable_fsck_error_handler(struct reftable_fsck_info info,
+@@ refs/reftable-backend.c: static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
case REFTABLE_FSCK_ERROR_TABLE_NAME:
msg_id = FSCK_MSG_BAD_REFTABLE_TABLE_NAME;
break;
@@ refs/reftable-backend.c: static int reftable_fsck_error_handler(struct reftable_
+ msg_id = FSCK_MSG_BAD_REFTABLE_STACK_COUNT;
+ break;
default:
- BUG("unknown fsck error: %d", info.error);
+ BUG("unknown fsck error: %d", info->error);
}
## reftable/fsck.c ##
@@ reftable/fsck.c: int reftable_fsck_check(struct reftable_stack *stack,
+ if (!reftable_fsck_valid_stack_count(stack)) {
+ struct reftable_fsck_info info = {
+ .error = REFTABLE_FSCK_ERROR_STACK_COUNT,
-+ .path = stack->list_file,
++ .path = "reftable/tables.list",
+ .msg = "mismatch in number of tables"
+ };
+
-+ err = report_fn(info, cb_data);
++ err = report_fn(&info, cb_data);
+ }
+
out:
@@ reftable/reftable-fsck.h
+ REFTABLE_FSCK_ERROR_STACK_COUNT = -2,
};
- /* Represents an individual error encounctered during the FSCK checks. */
+ /* Represents an individual error encountered during the FSCK checks. */
## t/t0614-reftable-fsck.sh ##
@@ t/t0614-reftable-fsck.sh: test_expect_success 'table name should be checked' '
@@ t/t0614-reftable-fsck.sh: test_expect_success 'table name should be checked' '
+
+ test_must_fail git refs verify 2>err &&
+ cat >expect <<-EOF &&
-+ error: $(pwd)/.git/reftable/tables.list: badReftableStackCount: mismatch in number of tables
++ error: reftable/tables.list: badReftableStackCount: mismatch in number of tables
+ EOF
+ test_cmp expect err
+ )
4: 7e8a14c77e ! 4: 4099878ceb refs/reftable: add fsck check for trailing newline
@@ fsck.h: enum fsck_msg_type {
FUNC(BAD_REF_FILETYPE, ERROR) \
## refs/reftable-backend.c ##
-@@ refs/reftable-backend.c: static int reftable_fsck_error_handler(struct reftable_fsck_info info,
+@@ refs/reftable-backend.c: static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
case REFTABLE_FSCK_ERROR_STACK_COUNT:
msg_id = FSCK_MSG_BAD_REFTABLE_STACK_COUNT;
break;
@@ refs/reftable-backend.c: static int reftable_fsck_error_handler(struct reftable_
+ msg_id = FSCK_MSG_BAD_REFTABLE_STACK_LIST_NEWLINE;
+ break;
default:
- BUG("unknown fsck error: %d", info.error);
+ BUG("unknown fsck error: %d", info->error);
}
## reftable/fsck.c ##
@@ reftable/fsck.c: int reftable_fsck_check(struct reftable_stack *stack,
+ if (!reftable_fsck_stack_contains_newline(stack->list_file)) {
+ struct reftable_fsck_info info = {
+ .error = REFTABLE_FSCK_ERROR_STACK_LIST_MISSING_NEWLINE,
-+ .path = stack->list_file,
++ .path = "reftable/tables.list",
+ .msg = "trailing newline missing in stack list"
+ };
+
-+ err = report_fn(info, cb_data);
++ err = report_fn(&info, cb_data);
+ }
+
verbose_fn("Checking reftable tables count", cb_data);
@@ reftable/reftable-fsck.h: enum reftable_fsck_error {
+ REFTABLE_FSCK_ERROR_STACK_LIST_MISSING_NEWLINE = -3,
};
- /* Represents an individual error encounctered during the FSCK checks. */
+ /* Represents an individual error encountered during the FSCK checks. */
## t/t0614-reftable-fsck.sh ##
@@ t/t0614-reftable-fsck.sh: test_expect_success 'table count should be checked' '
@@ t/t0614-reftable-fsck.sh: test_expect_success 'table count should be checked' '
+
+ test_must_fail git refs verify 2>err &&
+ cat >expect <<-EOF &&
-+ error: $(pwd)/.git/reftable/tables.list: badReftableStackListNewline: trailing newline missing in stack list
++ error: reftable/tables.list: badReftableStackListNewline: trailing newline missing in stack list
+ EOF
+ test_cmp expect err
+ )
5: 56ee4348d5 ! 5: e33345088b refs/reftable: add fsck check for incorrect update index
@@ fsck.h: enum fsck_msg_type {
FUNC(BAD_REF_NAME, ERROR) \
## refs/reftable-backend.c ##
-@@ refs/reftable-backend.c: static int reftable_fsck_error_handler(struct reftable_fsck_info info,
+@@ refs/reftable-backend.c: static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
case REFTABLE_FSCK_ERROR_STACK_LIST_MISSING_NEWLINE:
msg_id = FSCK_MSG_BAD_REFTABLE_STACK_LIST_NEWLINE;
break;
@@ refs/reftable-backend.c: static int reftable_fsck_error_handler(struct reftable_
+ msg_id = FSCK_MSG_BAD_REFTABLE_UPDATE_INDEX;
+ break;
default:
- BUG("unknown fsck error: %d", info.error);
+ BUG("unknown fsck error: %d", info->error);
}
## reftable/fsck.c ##
@@ reftable/fsck.c: int reftable_fsck_check(struct reftable_stack *stack,
reftable_fsck_verbose_fn verbose_fn,
void *cb_data)
{
+-
+ uint64_t min, max, prev_max = 0;
char **names = NULL;
- uint64_t min, max;
@@ reftable/fsck.c: int reftable_fsck_check(struct reftable_stack *stack,
if (stack == NULL)
@@ reftable/fsck.c: int reftable_fsck_check(struct reftable_stack *stack,
- err = report_fn(info, cb_data);
+ continue;
}
+ if (min != (prev_max + 1) || max < min) {
@@ reftable/fsck.c: int reftable_fsck_check(struct reftable_stack *stack,
+ .msg = "incorrect update index in table name"
+ };
+
-+ err = report_fn(info, cb_data);
++ err = report_fn(&info, cb_data);
+ }
+
if (strcmp(tail, ".ref")) {
- err = report_fn(info, cb_data);
+ info.msg = "invalid reftable table extension";
+ err = report_fn(&info, cb_data);
}
+
+ prev_max = max;
@@ reftable/reftable-fsck.h: enum reftable_fsck_error {
+ REFTABLE_FSCK_ERROR_UPDATE_INDEX = -4,
};
- /* Represents an individual error encounctered during the FSCK checks. */
+ /* Represents an individual error encountered during the FSCK checks. */
## t/t0614-reftable-fsck.sh ##
@@ t/t0614-reftable-fsck.sh: test_expect_success 'stack list must contain trailing newline' '
base-commit: c44beea485f0f2feaf460e2ac87fdd5608d63cf0
change-id: 20250714-228-reftable-introduce-consistency-checks-379ded93c544
Thanks
- Karthik
^ permalink raw reply [flat|nested] 96+ messages in thread* [PATCH v2 1/5] fsck: order 'fsck_msg_type' alphabetically
2025-09-02 7:05 ` [PATCH v2 " Karthik Nayak
@ 2025-09-02 7:05 ` Karthik Nayak
2025-09-02 22:25 ` Junio C Hamano
2025-09-02 7:05 ` [PATCH v2 2/5] refs/reftable: add fsck check for checking the table name Karthik Nayak
` (3 subsequent siblings)
4 siblings, 1 reply; 96+ messages in thread
From: Karthik Nayak @ 2025-09-02 7:05 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, jltobler, shejialuo
The list of 'fsck_msg_type' seem to be alphabetically ordered, but there
are a few small misses. Fix this by sorting the sub-sections of the
list to maintain alphabetical ordering. Also fix a clang-format issue
where the escaped newlines are not aligned.
While here, remove a duplicate instance of 'gitmodulesLarge' in the
'fsck-msgids' documentation.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Documentation/fsck-msgids.adoc | 3 -
fsck.h | 150 ++++++++++++++++++++---------------------
2 files changed, 75 insertions(+), 78 deletions(-)
diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 0ba4f9a27e..1c912615f9 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -104,9 +104,6 @@
`gitmodulesParse`::
(INFO) Could not parse `.gitmodules` blob.
-`gitmodulesLarge`;
- (ERROR) `.gitmodules` blob is too large to parse.
-
`gitmodulesPath`::
(ERROR) `.gitmodules` path is invalid.
diff --git a/fsck.h b/fsck.h
index dd7df3d5b3..559ad57807 100644
--- a/fsck.h
+++ b/fsck.h
@@ -20,82 +20,82 @@ enum fsck_msg_type {
* two in sync.
*/
-#define FOREACH_FSCK_MSG_ID(FUNC) \
- /* fatal errors */ \
- FUNC(NUL_IN_HEADER, FATAL) \
- FUNC(UNTERMINATED_HEADER, FATAL) \
- /* errors */ \
- FUNC(BAD_DATE, ERROR) \
- FUNC(BAD_DATE_OVERFLOW, ERROR) \
- FUNC(BAD_EMAIL, ERROR) \
- FUNC(BAD_NAME, ERROR) \
- FUNC(BAD_OBJECT_SHA1, ERROR) \
- FUNC(BAD_PACKED_REF_ENTRY, ERROR) \
- FUNC(BAD_PACKED_REF_HEADER, ERROR) \
- FUNC(BAD_PARENT_SHA1, ERROR) \
- FUNC(BAD_REF_CONTENT, ERROR) \
- FUNC(BAD_REF_FILETYPE, ERROR) \
- FUNC(BAD_REF_NAME, ERROR) \
- FUNC(BAD_REFERENT_NAME, ERROR) \
- FUNC(BAD_TIMEZONE, ERROR) \
- FUNC(BAD_TREE, ERROR) \
- FUNC(BAD_TREE_SHA1, ERROR) \
- FUNC(BAD_TYPE, ERROR) \
- FUNC(DUPLICATE_ENTRIES, ERROR) \
- FUNC(MISSING_AUTHOR, ERROR) \
- FUNC(MISSING_COMMITTER, ERROR) \
- FUNC(MISSING_EMAIL, ERROR) \
- FUNC(MISSING_NAME_BEFORE_EMAIL, ERROR) \
- FUNC(MISSING_OBJECT, ERROR) \
- FUNC(MISSING_SPACE_BEFORE_DATE, ERROR) \
- FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR) \
- FUNC(MISSING_TAG, ERROR) \
- FUNC(MISSING_TAG_ENTRY, ERROR) \
- FUNC(MISSING_TREE, ERROR) \
- FUNC(MISSING_TYPE, ERROR) \
- FUNC(MISSING_TYPE_ENTRY, ERROR) \
- FUNC(MULTIPLE_AUTHORS, ERROR) \
- FUNC(PACKED_REF_ENTRY_NOT_TERMINATED, ERROR) \
- FUNC(PACKED_REF_UNSORTED, ERROR) \
- FUNC(TREE_NOT_SORTED, ERROR) \
- FUNC(UNKNOWN_TYPE, ERROR) \
- FUNC(ZERO_PADDED_DATE, ERROR) \
- FUNC(GITMODULES_MISSING, ERROR) \
- FUNC(GITMODULES_BLOB, ERROR) \
- FUNC(GITMODULES_LARGE, ERROR) \
- FUNC(GITMODULES_NAME, ERROR) \
- FUNC(GITMODULES_SYMLINK, ERROR) \
- FUNC(GITMODULES_URL, ERROR) \
- FUNC(GITMODULES_PATH, ERROR) \
- FUNC(GITMODULES_UPDATE, ERROR) \
- FUNC(GITATTRIBUTES_MISSING, ERROR) \
- FUNC(GITATTRIBUTES_LARGE, ERROR) \
- FUNC(GITATTRIBUTES_LINE_LENGTH, ERROR) \
- FUNC(GITATTRIBUTES_BLOB, ERROR) \
- /* warnings */ \
- FUNC(EMPTY_NAME, WARN) \
- FUNC(FULL_PATHNAME, WARN) \
- FUNC(HAS_DOT, WARN) \
- FUNC(HAS_DOTDOT, WARN) \
- FUNC(HAS_DOTGIT, WARN) \
- FUNC(NULL_SHA1, WARN) \
- FUNC(ZERO_PADDED_FILEMODE, WARN) \
- FUNC(NUL_IN_COMMIT, WARN) \
- FUNC(LARGE_PATHNAME, WARN) \
+#define FOREACH_FSCK_MSG_ID(FUNC) \
+ /* fatal errors */ \
+ FUNC(NUL_IN_HEADER, FATAL) \
+ FUNC(UNTERMINATED_HEADER, FATAL) \
+ /* errors */ \
+ FUNC(BAD_DATE, ERROR) \
+ FUNC(BAD_DATE_OVERFLOW, ERROR) \
+ FUNC(BAD_EMAIL, ERROR) \
+ FUNC(BAD_NAME, ERROR) \
+ FUNC(BAD_OBJECT_SHA1, ERROR) \
+ FUNC(BAD_PACKED_REF_ENTRY, ERROR) \
+ FUNC(BAD_PACKED_REF_HEADER, ERROR) \
+ FUNC(BAD_PARENT_SHA1, ERROR) \
+ FUNC(BAD_REFERENT_NAME, ERROR) \
+ FUNC(BAD_REF_CONTENT, ERROR) \
+ FUNC(BAD_REF_FILETYPE, ERROR) \
+ FUNC(BAD_REF_NAME, ERROR) \
+ FUNC(BAD_TIMEZONE, ERROR) \
+ FUNC(BAD_TREE, ERROR) \
+ FUNC(BAD_TREE_SHA1, ERROR) \
+ FUNC(BAD_TYPE, ERROR) \
+ FUNC(DUPLICATE_ENTRIES, ERROR) \
+ FUNC(GITATTRIBUTES_BLOB, ERROR) \
+ FUNC(GITATTRIBUTES_LARGE, ERROR) \
+ FUNC(GITATTRIBUTES_LINE_LENGTH, ERROR) \
+ FUNC(GITATTRIBUTES_MISSING, ERROR) \
+ FUNC(GITMODULES_BLOB, ERROR) \
+ FUNC(GITMODULES_LARGE, ERROR) \
+ FUNC(GITMODULES_MISSING, ERROR) \
+ FUNC(GITMODULES_NAME, ERROR) \
+ FUNC(GITMODULES_PATH, ERROR) \
+ FUNC(GITMODULES_SYMLINK, ERROR) \
+ FUNC(GITMODULES_UPDATE, ERROR) \
+ FUNC(GITMODULES_URL, ERROR) \
+ FUNC(MISSING_AUTHOR, ERROR) \
+ FUNC(MISSING_COMMITTER, ERROR) \
+ FUNC(MISSING_EMAIL, ERROR) \
+ FUNC(MISSING_NAME_BEFORE_EMAIL, ERROR) \
+ FUNC(MISSING_OBJECT, ERROR) \
+ FUNC(MISSING_SPACE_BEFORE_DATE, ERROR) \
+ FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR) \
+ FUNC(MISSING_TAG, ERROR) \
+ FUNC(MISSING_TAG_ENTRY, ERROR) \
+ FUNC(MISSING_TREE, ERROR) \
+ FUNC(MISSING_TYPE, ERROR) \
+ FUNC(MISSING_TYPE_ENTRY, ERROR) \
+ FUNC(MULTIPLE_AUTHORS, ERROR) \
+ FUNC(PACKED_REF_ENTRY_NOT_TERMINATED, ERROR) \
+ FUNC(PACKED_REF_UNSORTED, ERROR) \
+ FUNC(TREE_NOT_SORTED, ERROR) \
+ FUNC(UNKNOWN_TYPE, ERROR) \
+ FUNC(ZERO_PADDED_DATE, ERROR) \
+ /* warnings */ \
+ FUNC(EMPTY_NAME, WARN) \
+ FUNC(FULL_PATHNAME, WARN) \
+ FUNC(HAS_DOT, WARN) \
+ FUNC(HAS_DOTDOT, WARN) \
+ FUNC(HAS_DOTGIT, WARN) \
+ FUNC(LARGE_PATHNAME, WARN) \
+ FUNC(NULL_SHA1, WARN) \
+ FUNC(NUL_IN_COMMIT, WARN) \
+ FUNC(ZERO_PADDED_FILEMODE, WARN) \
/* infos (reported as warnings, but ignored by default) */ \
- FUNC(BAD_FILEMODE, INFO) \
- FUNC(EMPTY_PACKED_REFS_FILE, INFO) \
- FUNC(GITMODULES_PARSE, INFO) \
- FUNC(GITIGNORE_SYMLINK, INFO) \
- FUNC(GITATTRIBUTES_SYMLINK, INFO) \
- FUNC(MAILMAP_SYMLINK, INFO) \
- FUNC(BAD_TAG_NAME, INFO) \
- FUNC(MISSING_TAGGER_ENTRY, INFO) \
- FUNC(SYMLINK_REF, INFO) \
- FUNC(REF_MISSING_NEWLINE, INFO) \
- FUNC(SYMREF_TARGET_IS_NOT_A_REF, INFO) \
- FUNC(TRAILING_REF_CONTENT, INFO) \
- /* ignored (elevated when requested) */ \
+ FUNC(BAD_FILEMODE, INFO) \
+ FUNC(BAD_TAG_NAME, INFO) \
+ FUNC(EMPTY_PACKED_REFS_FILE, INFO) \
+ FUNC(GITATTRIBUTES_SYMLINK, INFO) \
+ FUNC(GITIGNORE_SYMLINK, INFO) \
+ FUNC(GITMODULES_PARSE, INFO) \
+ FUNC(MAILMAP_SYMLINK, INFO) \
+ FUNC(MISSING_TAGGER_ENTRY, INFO) \
+ FUNC(REF_MISSING_NEWLINE, INFO) \
+ FUNC(SYMLINK_REF, INFO) \
+ FUNC(SYMREF_TARGET_IS_NOT_A_REF, INFO) \
+ FUNC(TRAILING_REF_CONTENT, INFO) \
+ /* ignored (elevated when requested) */ \
FUNC(EXTRA_HEADER_ENTRY, IGNORE)
#define MSG_ID(id, msg_type) FSCK_MSG_##id,
--
2.50.1
^ permalink raw reply related [flat|nested] 96+ messages in thread* Re: [PATCH v2 1/5] fsck: order 'fsck_msg_type' alphabetically
2025-09-02 7:05 ` [PATCH v2 1/5] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
@ 2025-09-02 22:25 ` Junio C Hamano
2025-09-08 13:00 ` Karthik Nayak
0 siblings, 1 reply; 96+ messages in thread
From: Junio C Hamano @ 2025-09-02 22:25 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, jltobler, shejialuo
Karthik Nayak <karthik.188@gmail.com> writes:
> The list of 'fsck_msg_type' seem to be alphabetically ordered, but there
> are a few small misses. Fix this by sorting the sub-sections of the
> list to maintain alphabetical ordering. Also fix a clang-format issue
> where the escaped newlines are not aligned.
>
> While here, remove a duplicate instance of 'gitmodulesLarge' in the
> 'fsck-msgids' documentation.
"A few small misses".
> diff --git a/fsck.h b/fsck.h
> index dd7df3d5b3..559ad57807 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -20,82 +20,82 @@ enum fsck_msg_type {
> ...
> -#define FOREACH_FSCK_MSG_ID(FUNC) \
> - /* fatal errors */ \
> - FUNC(NUL_IN_HEADER, FATAL) \
> - FUNC(UNTERMINATED_HEADER, FATAL) \
> ...
> +#define FOREACH_FSCK_MSG_ID(FUNC) \
> + /* fatal errors */ \
> + FUNC(NUL_IN_HEADER, FATAL) \
> + FUNC(UNTERMINATED_HEADER, FATAL) \
> ...
Please undo these "pad by spaces before backslash"; otherwise we
cannot tell which ones are "a few small misses".
Thanks.
^ permalink raw reply [flat|nested] 96+ messages in thread* Re: [PATCH v2 1/5] fsck: order 'fsck_msg_type' alphabetically
2025-09-02 22:25 ` Junio C Hamano
@ 2025-09-08 13:00 ` Karthik Nayak
0 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-09-08 13:00 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, jltobler, shejialuo
[-- Attachment #1: Type: text/plain, Size: 1402 bytes --]
Junio C Hamano <gitster@pobox.com> writes:
> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> The list of 'fsck_msg_type' seem to be alphabetically ordered, but there
>> are a few small misses. Fix this by sorting the sub-sections of the
>> list to maintain alphabetical ordering. Also fix a clang-format issue
>> where the escaped newlines are not aligned.
>>
>> While here, remove a duplicate instance of 'gitmodulesLarge' in the
>> 'fsck-msgids' documentation.
>
> "A few small misses".
>
>> diff --git a/fsck.h b/fsck.h
>> index dd7df3d5b3..559ad57807 100644
>> --- a/fsck.h
>> +++ b/fsck.h
>> @@ -20,82 +20,82 @@ enum fsck_msg_type {
>> ...
>> -#define FOREACH_FSCK_MSG_ID(FUNC) \
>> - /* fatal errors */ \
>> - FUNC(NUL_IN_HEADER, FATAL) \
>> - FUNC(UNTERMINATED_HEADER, FATAL) \
>> ...
>> +#define FOREACH_FSCK_MSG_ID(FUNC) \
>> + /* fatal errors */ \
>> + FUNC(NUL_IN_HEADER, FATAL) \
>> + FUNC(UNTERMINATED_HEADER, FATAL) \
>> ...
>
> Please undo these "pad by spaces before backslash"; otherwise we
> cannot tell which ones are "a few small misses".
>
> Thanks.
Yeah, you're right, its much harder to review this way. Let me add in a
commit at the end to do the clang-formatting for this section, that way
we can drop it if it is too much noise.
- Karthik
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH v2 2/5] refs/reftable: add fsck check for checking the table name
2025-09-02 7:05 ` [PATCH v2 " Karthik Nayak
2025-09-02 7:05 ` [PATCH v2 1/5] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
@ 2025-09-02 7:05 ` Karthik Nayak
2025-09-03 8:07 ` Patrick Steinhardt
2025-09-02 7:05 ` [PATCH v2 3/5] refs/reftable: add fsck check for number of tables Karthik Nayak
` (2 subsequent siblings)
4 siblings, 1 reply; 96+ messages in thread
From: Karthik Nayak @ 2025-09-02 7:05 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, jltobler, shejialuo
The `git refs verify` command is used to run fsck checks on the
reference backends. This command is also invoked when users run 'git
fsck'. While the files-backend has some fsck checks added, the reftable
backend lacks such checks. Let's add the required infrastructure and a
check to test for the table names in the 'tables.list' of reftables.
For the infrastructure, since the reftable library is treated as an
independent library we should ensure that the library code works
independently without knowledge about Git's internals. To do this,
add both 'reftable/fsck.c' and 'reftable/reftable-fsck.h'. Which
provide an entry point 'reftable_fsck_check' for running fsck checks
over a provided reftable stack. The callee provides the function with
callbacks to handle issue and information reporting.
Add glue code in 'refs/reftable-backend.c' which calls the reftable
library to perform the fsck checks. Here we also map the reftable errors
to Git' fsck errors.
Introduce a check to validate table names for a given reftable stack.
Also add 'badReftableTableName' as a corresponding error within Git. Add
a test to check for this behavior.
While here, remove a unused header `#include "../lockfile.h"` from
'refs/reftable-backend.c'.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Documentation/fsck-msgids.adoc | 3 +++
Makefile | 1 +
fsck.h | 1 +
meson.build | 1 +
refs/reftable-backend.c | 57 ++++++++++++++++++++++++++++++++++++-----
reftable/fsck.c | 53 ++++++++++++++++++++++++++++++++++++++
reftable/reftable-fsck.h | 38 +++++++++++++++++++++++++++
t/meson.build | 3 ++-
t/t0614-reftable-fsck.sh | 58 ++++++++++++++++++++++++++++++++++++++++++
9 files changed, 208 insertions(+), 7 deletions(-)
diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 1c912615f9..784ddc0df5 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -38,6 +38,9 @@
`badReferentName`::
(ERROR) The referent name of a symref is invalid.
+`badReftableTableName`::
+ (ERROR) A reftable table has an invalid name.
+
`badTagName`::
(INFO) A tag has an invalid format.
diff --git a/Makefile b/Makefile
index e11340c1ae..f2ddcc8d7c 100644
--- a/Makefile
+++ b/Makefile
@@ -2733,6 +2733,7 @@ REFTABLE_OBJS += reftable/error.o
REFTABLE_OBJS += reftable/block.o
REFTABLE_OBJS += reftable/blocksource.o
REFTABLE_OBJS += reftable/iter.o
+REFTABLE_OBJS += reftable/fsck.o
REFTABLE_OBJS += reftable/merged.o
REFTABLE_OBJS += reftable/pq.o
REFTABLE_OBJS += reftable/record.o
diff --git a/fsck.h b/fsck.h
index 559ad57807..5901f944a1 100644
--- a/fsck.h
+++ b/fsck.h
@@ -34,6 +34,7 @@ enum fsck_msg_type {
FUNC(BAD_PACKED_REF_HEADER, ERROR) \
FUNC(BAD_PARENT_SHA1, ERROR) \
FUNC(BAD_REFERENT_NAME, ERROR) \
+ FUNC(BAD_REFTABLE_TABLE_NAME, ERROR) \
FUNC(BAD_REF_CONTENT, ERROR) \
FUNC(BAD_REF_FILETYPE, ERROR) \
FUNC(BAD_REF_NAME, ERROR) \
diff --git a/meson.build b/meson.build
index 5dd299b496..82879fbfaa 100644
--- a/meson.build
+++ b/meson.build
@@ -452,6 +452,7 @@ libgit_sources = [
'reftable/error.c',
'reftable/block.c',
'reftable/blocksource.c',
+ 'reftable/fsck.c',
'reftable/iter.c',
'reftable/merged.c',
'reftable/pq.c',
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 8dae1e1112..c38c6422f8 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -6,20 +6,21 @@
#include "../config.h"
#include "../dir.h"
#include "../environment.h"
+#include "../fsck.h"
#include "../gettext.h"
#include "../hash.h"
#include "../hex.h"
#include "../iterator.h"
#include "../ident.h"
-#include "../lockfile.h"
#include "../object.h"
#include "../path.h"
#include "../refs.h"
#include "../reftable/reftable-basics.h"
-#include "../reftable/reftable-stack.h"
-#include "../reftable/reftable-record.h"
#include "../reftable/reftable-error.h"
+#include "../reftable/reftable-fsck.h"
#include "../reftable/reftable-iterator.h"
+#include "../reftable/reftable-record.h"
+#include "../reftable/reftable-stack.h"
#include "../repo-settings.h"
#include "../setup.h"
#include "../strmap.h"
@@ -2675,11 +2676,55 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
return ret;
}
-static int reftable_be_fsck(struct ref_store *ref_store UNUSED,
- struct fsck_options *o UNUSED,
+static void reftable_fsck_verbose_handler(const char *msg, void *cb_data)
+{
+ struct fsck_options *o = cb_data;
+
+ if (o->verbose)
+ fprintf_ln(stderr, "%s", _(msg));
+}
+
+static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
+ void *cb_data)
+{
+ struct fsck_ref_report report = { .path = info->path };
+ struct fsck_options *o = cb_data;
+ enum fsck_msg_id msg_id;
+
+ switch (info->error) {
+ case REFTABLE_FSCK_ERROR_TABLE_NAME:
+ msg_id = FSCK_MSG_BAD_REFTABLE_TABLE_NAME;
+ break;
+ default:
+ BUG("unknown fsck error: %d", info->error);
+ }
+
+ return fsck_report_ref(o, &report, msg_id, "%s", info->msg);
+}
+
+static int reftable_be_fsck(struct ref_store *ref_store, struct fsck_options *o,
struct worktree *wt UNUSED)
{
- return 0;
+ struct reftable_ref_store *refs;
+ struct strmap_entry *entry;
+ struct hashmap_iter iter;
+ int ret = 0;
+
+ refs = reftable_be_downcast(ref_store, REF_STORE_READ, "fsck");
+
+ if (o->verbose)
+ fprintf_ln(stderr, _("Checking references consistency"));
+
+ ret |= reftable_fsck_check(refs->main_backend.stack, reftable_fsck_error_handler,
+ reftable_fsck_verbose_handler, o);
+
+ strmap_for_each_entry(&refs->worktree_backends, &iter, entry) {
+ struct reftable_backend *b = (struct reftable_backend *)entry->value;
+ ret |= reftable_fsck_check(b->stack, reftable_fsck_error_handler,
+ reftable_fsck_verbose_handler, o);
+ }
+
+ return ret;
}
struct ref_storage_be refs_be_reftable = {
diff --git a/reftable/fsck.c b/reftable/fsck.c
new file mode 100644
index 0000000000..4282b1413e
--- /dev/null
+++ b/reftable/fsck.c
@@ -0,0 +1,53 @@
+#include "basics.h"
+#include "reftable-fsck.h"
+#include "stack.h"
+
+int reftable_fsck_check(struct reftable_stack *stack,
+ reftable_fsck_report_fn report_fn,
+ reftable_fsck_verbose_fn verbose_fn,
+ void *cb_data)
+{
+
+ char **names = NULL;
+ uint64_t min, max;
+ int err = 0;
+
+ if (stack == NULL)
+ goto out;
+
+ err = read_lines(stack->list_file, &names);
+ if (err < 0)
+ goto out;
+
+ verbose_fn("Checking reftable table names", cb_data);
+
+ for (size_t i = 0; names[i]; i++) {
+ struct reftable_fsck_info info = {
+ .error = REFTABLE_FSCK_ERROR_TABLE_NAME,
+ .path = names[i],
+ };
+ uint32_t rnd;
+ /*
+ * We want to match the tail '.ref'. One extra byte to ensure
+ * that there is no unexpected extra character and one byte for
+ * the null terminator added by sscanf.
+ */
+ char tail[6];
+
+ if (sscanf(names[i], "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x%5s",
+ &min, &max, &rnd, tail) != 4) {
+ info.msg = "invalid reftable table name";
+ err = report_fn(&info, cb_data);
+ continue;
+ }
+
+ if (strcmp(tail, ".ref")) {
+ info.msg = "invalid reftable table extension";
+ err = report_fn(&info, cb_data);
+ }
+ }
+
+out:
+ free_names(names);
+ return err;
+}
diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
new file mode 100644
index 0000000000..4cf0053234
--- /dev/null
+++ b/reftable/reftable-fsck.h
@@ -0,0 +1,38 @@
+#ifndef REFTABLE_FSCK_H
+#define REFTABLE_FSCK_H
+
+#include "reftable-stack.h"
+
+enum reftable_fsck_error {
+ /* Invalid table name */
+ REFTABLE_FSCK_ERROR_TABLE_NAME = -1,
+};
+
+/* Represents an individual error encountered during the FSCK checks. */
+struct reftable_fsck_info {
+ enum reftable_fsck_error error;
+ const char *msg;
+ const char *path;
+};
+
+typedef int reftable_fsck_report_fn(struct reftable_fsck_info *info,
+ void *cb_data);
+typedef void reftable_fsck_verbose_fn(const char *msg, void *cb_data);
+
+/*
+ * Given a reftable stack, perform FSCK check on the stack.
+ *
+ * If an issue is encountered, the issue is reported to the callee via the
+ * provided 'report_fn'. If the issue is non-recoverable the flow will not
+ * continue. If it is recoverable, the flow will continue and further issues
+ * will be reported as identified.
+ *
+ * The 'verbose_fn' will be invoked to provide verbose information about
+ * the progress and state of the FSCK checks.
+ */
+int reftable_fsck_check(struct reftable_stack *stack,
+ reftable_fsck_report_fn report_fn,
+ reftable_fsck_verbose_fn verbose_fn,
+ void *cb_data);
+
+#endif /* REFTABLE_FSCK_H */
diff --git a/t/meson.build b/t/meson.build
index bbeba1a8d5..a8eb44eb30 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -145,6 +145,7 @@ integration_tests = [
't0611-reftable-httpd.sh',
't0612-reftable-jgit-compatibility.sh',
't0613-reftable-write-options.sh',
+ 't0614-reftable-fsck.sh',
't1000-read-tree-m-3way.sh',
't1001-read-tree-m-2way.sh',
't1002-read-tree-m-u-2way.sh',
@@ -1214,4 +1215,4 @@ if perl.found() and time.found()
timeout: 0,
)
endforeach
-endif
\ No newline at end of file
+endif
diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
new file mode 100755
index 0000000000..81d30df2d7
--- /dev/null
+++ b/t/t0614-reftable-fsck.sh
@@ -0,0 +1,58 @@
+#!/bin/sh
+
+test_description='Test reftable backend consistency check'
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+GIT_TEST_DEFAULT_REF_FORMAT=reftable
+export GIT_TEST_DEFAULT_REF_FORMAT
+
+. ./test-lib.sh
+
+test_expect_success 'table name should be checked' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git commit --allow-empty -m initial &&
+
+ git refs verify 2>err &&
+ test_must_be_empty err &&
+
+ TABLE_NAME=$(cat .git/reftable/tables.list | head -n1) &&
+ sed "1s/^/extra/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
+ mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
+ mv .git/reftable/${TABLE_NAME} .git/reftable/extra${TABLE_NAME} &&
+
+ test_must_fail git refs verify 2>err &&
+ cat >expect <<-EOF &&
+ error: extra${TABLE_NAME}: badReftableTableName: invalid reftable table name
+ EOF
+ test_cmp expect err
+ )
+'
+
+test_expect_success 'table name should be checked' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git commit --allow-empty -m initial &&
+
+ git refs verify 2>err &&
+ test_must_be_empty err &&
+
+ TABLE_NAME=$(cat .git/reftable/tables.list | head -n1) &&
+ sed "1s/$/extra/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
+ mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
+ mv .git/reftable/${TABLE_NAME} .git/reftable/${TABLE_NAME}extra &&
+
+ test_must_fail git refs verify 2>err &&
+ cat >expect <<-EOF &&
+ error: ${TABLE_NAME}extra: badReftableTableName: invalid reftable table extension
+ EOF
+ test_cmp expect err
+ )
+'
+
+test_done
--
2.50.1
^ permalink raw reply related [flat|nested] 96+ messages in thread* Re: [PATCH v2 2/5] refs/reftable: add fsck check for checking the table name
2025-09-02 7:05 ` [PATCH v2 2/5] refs/reftable: add fsck check for checking the table name Karthik Nayak
@ 2025-09-03 8:07 ` Patrick Steinhardt
2025-09-03 16:51 ` shejialuo
2025-09-09 8:42 ` Karthik Nayak
0 siblings, 2 replies; 96+ messages in thread
From: Patrick Steinhardt @ 2025-09-03 8:07 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, jltobler, shejialuo
On Tue, Sep 02, 2025 at 09:05:22AM +0200, Karthik Nayak wrote:
> The `git refs verify` command is used to run fsck checks on the
> reference backends. This command is also invoked when users run 'git
> fsck'. While the files-backend has some fsck checks added, the reftable
> backend lacks such checks. Let's add the required infrastructure and a
> check to test for the table names in the 'tables.list' of reftables.
>
> For the infrastructure, since the reftable library is treated as an
> independent library we should ensure that the library code works
> independently without knowledge about Git's internals. To do this,
> add both 'reftable/fsck.c' and 'reftable/reftable-fsck.h'. Which
> provide an entry point 'reftable_fsck_check' for running fsck checks
> over a provided reftable stack. The callee provides the function with
> callbacks to handle issue and information reporting.
>
> Add glue code in 'refs/reftable-backend.c' which calls the reftable
> library to perform the fsck checks. Here we also map the reftable errors
> to Git' fsck errors.
>
> Introduce a check to validate table names for a given reftable stack.
> Also add 'badReftableTableName' as a corresponding error within Git. Add
> a test to check for this behavior.
>
> While here, remove a unused header `#include "../lockfile.h"` from
> 'refs/reftable-backend.c'.
It's quite a bunch of changes overall that could've been reasonably
split up into multiple commits. E.g. one to introduce the reftable-side
logic, one to start calling it in Git, and one to drop the superfluous
header.
> diff --git a/Makefile b/Makefile
> index e11340c1ae..f2ddcc8d7c 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -2733,6 +2733,7 @@ REFTABLE_OBJS += reftable/error.o
> REFTABLE_OBJS += reftable/block.o
> REFTABLE_OBJS += reftable/blocksource.o
> REFTABLE_OBJS += reftable/iter.o
> +REFTABLE_OBJS += reftable/fsck.o
"f" is before "i" in the alphabet I'm accustomed to :) So let's retain
lexicographic ordering here.
> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index 8dae1e1112..c38c6422f8 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -2675,11 +2676,55 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
> return ret;
> }
>
> -static int reftable_be_fsck(struct ref_store *ref_store UNUSED,
> - struct fsck_options *o UNUSED,
> +static void reftable_fsck_verbose_handler(const char *msg, void *cb_data)
> +{
> + struct fsck_options *o = cb_data;
> +
> + if (o->verbose)
> + fprintf_ln(stderr, "%s", _(msg));
> +}
Is this `_()` marker correct here? There isn't really any reasonable way
for somebody to translate a variable with unknown contents. So shouldn't
it only be the caller of `reftable_fsck_verbose_handler()` that should
mark the string as translatable?
> +static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
> + void *cb_data)
> +{
> + struct fsck_ref_report report = { .path = info->path };
> + struct fsck_options *o = cb_data;
> + enum fsck_msg_id msg_id;
> +
> + switch (info->error) {
> + case REFTABLE_FSCK_ERROR_TABLE_NAME:
> + msg_id = FSCK_MSG_BAD_REFTABLE_TABLE_NAME;
> + break;
> + default:
> + BUG("unknown fsck error: %d", info->error);
> + }
> +
> + return fsck_report_ref(o, &report, msg_id, "%s", info->msg);
> +}
I think this function will become a bit unwieldy over time. We might
instead want to have an array that maps from reftable-specific to
fsck-specific error code:
static const fsck_msg_id[] = {
[REFTABLE_FSCK_ERROR_TABLE_NAME] = FSCK_MSG_BAD_REFTABLE_TABLE_NAME,
};
So in that case, all we'd have to do is to perform bounds checking in
the above function. And maybe verify that the developer didn't forget to
fill in a new msg ID by checking that the derived message ID is non-zero.
> +static int reftable_be_fsck(struct ref_store *ref_store, struct fsck_options *o,
> struct worktree *wt UNUSED)
> {
> - return 0;
> + struct reftable_ref_store *refs;
> + struct strmap_entry *entry;
> + struct hashmap_iter iter;
> + int ret = 0;
> +
> + refs = reftable_be_downcast(ref_store, REF_STORE_READ, "fsck");
> +
> + if (o->verbose)
> + fprintf_ln(stderr, _("Checking references consistency"));
This line is duplicate across both backends, right? Maybe it's something
that we can do in the generic logic?
> + ret |= reftable_fsck_check(refs->main_backend.stack, reftable_fsck_error_handler,
> + reftable_fsck_verbose_handler, o);
> +
> + strmap_for_each_entry(&refs->worktree_backends, &iter, entry) {
> + struct reftable_backend *b = (struct reftable_backend *)entry->value;
> + ret |= reftable_fsck_check(b->stack, reftable_fsck_error_handler,
> + reftable_fsck_verbose_handler, o);
> + }
> +
> + return ret;
> }
>
> struct ref_storage_be refs_be_reftable = {
Looks good.
> diff --git a/reftable/fsck.c b/reftable/fsck.c
> new file mode 100644
> index 0000000000..4282b1413e
> --- /dev/null
> +++ b/reftable/fsck.c
> @@ -0,0 +1,53 @@
> +#include "basics.h"
> +#include "reftable-fsck.h"
> +#include "stack.h"
> +
> +int reftable_fsck_check(struct reftable_stack *stack,
> + reftable_fsck_report_fn report_fn,
> + reftable_fsck_verbose_fn verbose_fn,
> + void *cb_data)
> +{
> +
> + char **names = NULL;
> + uint64_t min, max;
> + int err = 0;
> +
> + if (stack == NULL)
> + goto out;
> +
> + err = read_lines(stack->list_file, &names);
> + if (err < 0)
> + goto out;
> +
> + verbose_fn("Checking reftable table names", cb_data);
> +
> + for (size_t i = 0; names[i]; i++) {
> + struct reftable_fsck_info info = {
> + .error = REFTABLE_FSCK_ERROR_TABLE_NAME,
> + .path = names[i],
> + };
> + uint32_t rnd;
> + /*
> + * We want to match the tail '.ref'. One extra byte to ensure
> + * that there is no unexpected extra character and one byte for
> + * the null terminator added by sscanf.
> + */
> + char tail[6];
> +
> + if (sscanf(names[i], "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x%5s",
> + &min, &max, &rnd, tail) != 4) {
> + info.msg = "invalid reftable table name";
This here is where the string should be translated.
> + err = report_fn(&info, cb_data);
> + continue;
> + }
I think sscanf is quite frowned-upon in the Git codebase. Maybe we
should manually parse through the string instead?
Furthermore, I think we should move every single check into a separate
function, similar to how the files backend does it. This ensures that
checks are self-contained and that it's way easier to add new checks
over time.
Another angle: did you verify that reftables written by JGit follow this
format?
> + if (strcmp(tail, ".ref")) {
> + info.msg = "invalid reftable table extension";
Same here, this should be translated.
> diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
> new file mode 100644
> index 0000000000..4cf0053234
> --- /dev/null
> +++ b/reftable/reftable-fsck.h
> @@ -0,0 +1,38 @@
> +#ifndef REFTABLE_FSCK_H
> +#define REFTABLE_FSCK_H
> +
> +#include "reftable-stack.h"
> +
> +enum reftable_fsck_error {
> + /* Invalid table name */
> + REFTABLE_FSCK_ERROR_TABLE_NAME = -1,
> +};
Wouldn't it be more natural to give these positive numbers?
> +/* Represents an individual error encountered during the FSCK checks. */
> +struct reftable_fsck_info {
> + enum reftable_fsck_error error;
> + const char *msg;
> + const char *path;
> +};
I wonder whether it should be the reftable library that decides on the
severity of each generated finding.
> +typedef int reftable_fsck_report_fn(struct reftable_fsck_info *info,
> + void *cb_data);
> +typedef void reftable_fsck_verbose_fn(const char *msg, void *cb_data);
> +
> +/*
> + * Given a reftable stack, perform FSCK check on the stack.
s/FSCK check/consistency checks/
> + *
> + * If an issue is encountered, the issue is reported to the callee via the
> + * provided 'report_fn'. If the issue is non-recoverable the flow will not
> + * continue. If it is recoverable, the flow will continue and further issues
> + * will be reported as identified.
> + *
> + * The 'verbose_fn' will be invoked to provide verbose information about
> + * the progress and state of the FSCK checks.
Same here.
> diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
> new file mode 100755
> index 0000000000..81d30df2d7
> --- /dev/null
> +++ b/t/t0614-reftable-fsck.sh
> @@ -0,0 +1,58 @@
> +#!/bin/sh
> +
> +test_description='Test reftable backend consistency check'
> +
> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
Tests shouldn't define these variables, but should dynamically figure
out what the default branch name is as required, e.g. by using
git-symbolic-ref(1).
> +GIT_TEST_DEFAULT_REF_FORMAT=reftable
> +export GIT_TEST_DEFAULT_REF_FORMAT
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'table name should be checked' '
> + test_when_finished "rm -rf repo" &&
> + git init repo &&
> + (
> + cd repo &&
> + git commit --allow-empty -m initial &&
> +
> + git refs verify 2>err &&
> + test_must_be_empty err &&
> +
> + TABLE_NAME=$(cat .git/reftable/tables.list | head -n1) &&
You can drop the cat(1) invocation and directly say `head -n1 file`.
> + sed "1s/^/extra/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
> + mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
> + mv .git/reftable/${TABLE_NAME} .git/reftable/extra${TABLE_NAME} &&
No need for the curly braces around TABLE_NAME here and further down. It
would be nice to quote these strings though.
> +
> + test_must_fail git refs verify 2>err &&
> + cat >expect <<-EOF &&
> + error: extra${TABLE_NAME}: badReftableTableName: invalid reftable table name
> + EOF
> + test_cmp expect err
> + )
> +'
> +
> +test_expect_success 'table name should be checked' '
> + test_when_finished "rm -rf repo" &&
> + git init repo &&
> + (
> + cd repo &&
> + git commit --allow-empty -m initial &&
> +
> + git refs verify 2>err &&
> + test_must_be_empty err &&
> +
> + TABLE_NAME=$(cat .git/reftable/tables.list | head -n1) &&
Same here wrt the extra invocation of cat(1).
Patrick
^ permalink raw reply [flat|nested] 96+ messages in thread* Re: [PATCH v2 2/5] refs/reftable: add fsck check for checking the table name
2025-09-03 8:07 ` Patrick Steinhardt
@ 2025-09-03 16:51 ` shejialuo
2025-09-09 13:49 ` Karthik Nayak
2025-09-09 8:42 ` Karthik Nayak
1 sibling, 1 reply; 96+ messages in thread
From: shejialuo @ 2025-09-03 16:51 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: Karthik Nayak, git, jltobler
On Wed, Sep 03, 2025 at 10:07:13AM +0200, Patrick Steinhardt wrote:
[snip]
> > +static int reftable_be_fsck(struct ref_store *ref_store, struct fsck_options *o,
> > struct worktree *wt UNUSED)
> > {
> > - return 0;
> > + struct reftable_ref_store *refs;
> > + struct strmap_entry *entry;
> > + struct hashmap_iter iter;
> > + int ret = 0;
> > +
> > + refs = reftable_be_downcast(ref_store, REF_STORE_READ, "fsck");
> > +
> > + if (o->verbose)
> > + fprintf_ln(stderr, _("Checking references consistency"));
>
> This line is duplicate across both backends, right? Maybe it's something
> that we can do in the generic logic?
>
That's right, it is duplicate. If we want to remove this, we need to do
this in the "builtin/refs.c". But I wonder whether we should do this in
the first place. Should we rather add more detailed information just
like the following code for packed backend?
if (o->verbose)
fprintf_ln(stderr, "Checking packed-refs file %s", refs->path);
Instead of just using
Checking references consistency
Could we use
Checking reftable references consistency
However, I also feel strange about above, :)
[snip]
> > +/* Represents an individual error encountered during the FSCK checks. */
> > +struct reftable_fsck_info {
> > + enum reftable_fsck_error error;
> > + const char *msg;
> > + const char *path;
> > +};
>
> I wonder whether it should be the reftable library that decides on the
> severity of each generated finding.
>
That's an interesting question. Let's inspect how Git handles the
severity. When defining the fsck message id, we need to specify its
severity like the following shows, this happens at compile time:
FUNC(BAD_REFERENT_NAME, ERROR)
And we could set the configuration "fsck.[message id]=" to change the
fsck message severity.
Then let's think if reftable library decides the severity. It means that
we need to use the API from reftable library to update
"fsck_option->msg_type" at the runtime. And it is bad because the fsck
infrastructure would be highly coupled with the reftable library.
So, I don't think it's a good idea for reftable library to choose the
severity. Instead, reftable library should just provide users with error
types and let the users decide the severity.
Thanks,
Jialuo
^ permalink raw reply [flat|nested] 96+ messages in thread* Re: [PATCH v2 2/5] refs/reftable: add fsck check for checking the table name
2025-09-03 16:51 ` shejialuo
@ 2025-09-09 13:49 ` Karthik Nayak
0 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-09-09 13:49 UTC (permalink / raw)
To: shejialuo, Patrick Steinhardt; +Cc: git, jltobler
[-- Attachment #1: Type: text/plain, Size: 2982 bytes --]
shejialuo <shejialuo@gmail.com> writes:
> On Wed, Sep 03, 2025 at 10:07:13AM +0200, Patrick Steinhardt wrote:
>
> [snip]
>
>> > +static int reftable_be_fsck(struct ref_store *ref_store, struct fsck_options *o,
>> > struct worktree *wt UNUSED)
>> > {
>> > - return 0;
>> > + struct reftable_ref_store *refs;
>> > + struct strmap_entry *entry;
>> > + struct hashmap_iter iter;
>> > + int ret = 0;
>> > +
>> > + refs = reftable_be_downcast(ref_store, REF_STORE_READ, "fsck");
>> > +
>> > + if (o->verbose)
>> > + fprintf_ln(stderr, _("Checking references consistency"));
>>
>> This line is duplicate across both backends, right? Maybe it's something
>> that we can do in the generic logic?
>>
>
> That's right, it is duplicate. If we want to remove this, we need to do
> this in the "builtin/refs.c". But I wonder whether we should do this in
> the first place. Should we rather add more detailed information just
> like the following code for packed backend?
>
> if (o->verbose)
> fprintf_ln(stderr, "Checking packed-refs file %s", refs->path);
>
> Instead of just using
>
> Checking references consistency
>
> Could we use
>
> Checking reftable references consistency
>
> However, I also feel strange about above, :)
>
> [snip]
>
>> > +/* Represents an individual error encountered during the FSCK checks. */
>> > +struct reftable_fsck_info {
>> > + enum reftable_fsck_error error;
>> > + const char *msg;
>> > + const char *path;
>> > +};
>>
>> I wonder whether it should be the reftable library that decides on the
>> severity of each generated finding.
>>
>
I think I did rush while agreeing to do this change and didn't realize
the complexity of it.
> That's an interesting question. Let's inspect how Git handles the
> severity. When defining the fsck message id, we need to specify its
> severity like the following shows, this happens at compile time:
>
> FUNC(BAD_REFERENT_NAME, ERROR)
>
This is used to create the enum of all values, but there is a
complimentary structure `msg_id_info` which holds the mapping for each
message id to its error category.
Both of these could be extended at compile time by including the errors
from the reftable header. But to do this in a backend agnostic way, we'd
have to receive and re-expose it via `refs.h`.
> And we could set the configuration "fsck.[message id]=" to change the
> fsck message severity.
>
> Then let's think if reftable library decides the severity. It means that
> we need to use the API from reftable library to update
> "fsck_option->msg_type" at the runtime. And it is bad because the fsck
> infrastructure would be highly coupled with the reftable library.
>
> So, I don't think it's a good idea for reftable library to choose the
> severity. Instead, reftable library should just provide users with error
> types and let the users decide the severity.
>
So while there are ways to do it, it won't be simple/elegant and I'm not
sure it'd be worth it.
> Thanks,
> Jialuo
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v2 2/5] refs/reftable: add fsck check for checking the table name
2025-09-03 8:07 ` Patrick Steinhardt
2025-09-03 16:51 ` shejialuo
@ 2025-09-09 8:42 ` Karthik Nayak
1 sibling, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-09-09 8:42 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, jltobler, shejialuo
[-- Attachment #1: Type: text/plain, Size: 11798 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
> On Tue, Sep 02, 2025 at 09:05:22AM +0200, Karthik Nayak wrote:
>> The `git refs verify` command is used to run fsck checks on the
>> reference backends. This command is also invoked when users run 'git
>> fsck'. While the files-backend has some fsck checks added, the reftable
>> backend lacks such checks. Let's add the required infrastructure and a
>> check to test for the table names in the 'tables.list' of reftables.
>>
>> For the infrastructure, since the reftable library is treated as an
>> independent library we should ensure that the library code works
>> independently without knowledge about Git's internals. To do this,
>> add both 'reftable/fsck.c' and 'reftable/reftable-fsck.h'. Which
>> provide an entry point 'reftable_fsck_check' for running fsck checks
>> over a provided reftable stack. The callee provides the function with
>> callbacks to handle issue and information reporting.
>>
>> Add glue code in 'refs/reftable-backend.c' which calls the reftable
>> library to perform the fsck checks. Here we also map the reftable errors
>> to Git' fsck errors.
>>
>> Introduce a check to validate table names for a given reftable stack.
>> Also add 'badReftableTableName' as a corresponding error within Git. Add
>> a test to check for this behavior.
>>
>> While here, remove a unused header `#include "../lockfile.h"` from
>> 'refs/reftable-backend.c'.
>
> It's quite a bunch of changes overall that could've been reasonably
> split up into multiple commits. E.g. one to introduce the reftable-side
> logic, one to start calling it in Git, and one to drop the superfluous
> header.
>
I'm always hesitant to have small commits for some reason. Thanks for
calling out, I'll split it up.
Sidenote: I used `git history split` for this, and it was just perfect.
>> diff --git a/Makefile b/Makefile
>> index e11340c1ae..f2ddcc8d7c 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -2733,6 +2733,7 @@ REFTABLE_OBJS += reftable/error.o
>> REFTABLE_OBJS += reftable/block.o
>> REFTABLE_OBJS += reftable/blocksource.o
>> REFTABLE_OBJS += reftable/iter.o
>> +REFTABLE_OBJS += reftable/fsck.o
>
> "f" is before "i" in the alphabet I'm accustomed to :) So let's retain
> lexicographic ordering here.
>
This ordering was already broken, but that's no reason to break it more.
Let' me fix it.
>> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
>> index 8dae1e1112..c38c6422f8 100644
>> --- a/refs/reftable-backend.c
>> +++ b/refs/reftable-backend.c
>> @@ -2675,11 +2676,55 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
>> return ret;
>> }
>>
>> -static int reftable_be_fsck(struct ref_store *ref_store UNUSED,
>> - struct fsck_options *o UNUSED,
>> +static void reftable_fsck_verbose_handler(const char *msg, void *cb_data)
>> +{
>> + struct fsck_options *o = cb_data;
>> +
>> + if (o->verbose)
>> + fprintf_ln(stderr, "%s", _(msg));
>> +}
>
> Is this `_()` marker correct here? There isn't really any reasonable way
> for somebody to translate a variable with unknown contents. So shouldn't
> it only be the caller of `reftable_fsck_verbose_handler()` that should
> mark the string as translatable?
>
True, but this is a callback function called from within the reftable
library. I guess for now I can leave out the translation and we can
think about the best way to fix that later.
We need a more generic way to translate output strings originating from
the reftable library.
>> +static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
>> + void *cb_data)
>> +{
>> + struct fsck_ref_report report = { .path = info->path };
>> + struct fsck_options *o = cb_data;
>> + enum fsck_msg_id msg_id;
>> +
>> + switch (info->error) {
>> + case REFTABLE_FSCK_ERROR_TABLE_NAME:
>> + msg_id = FSCK_MSG_BAD_REFTABLE_TABLE_NAME;
>> + break;
>> + default:
>> + BUG("unknown fsck error: %d", info->error);
>> + }
>> +
>> + return fsck_report_ref(o, &report, msg_id, "%s", info->msg);
>> +}
>
> I think this function will become a bit unwieldy over time. We might
> instead want to have an array that maps from reftable-specific to
> fsck-specific error code:
>
> static const fsck_msg_id[] = {
> [REFTABLE_FSCK_ERROR_TABLE_NAME] = FSCK_MSG_BAD_REFTABLE_TABLE_NAME,
> };
>
> So in that case, all we'd have to do is to perform bounds checking in
> the above function. And maybe verify that the developer didn't forget to
> fill in a new msg ID by checking that the derived message ID is non-zero.
>
Yeah that sounds like a really good improvement, let me add that.
>> +static int reftable_be_fsck(struct ref_store *ref_store, struct fsck_options *o,
>> struct worktree *wt UNUSED)
>> {
>> - return 0;
>> + struct reftable_ref_store *refs;
>> + struct strmap_entry *entry;
>> + struct hashmap_iter iter;
>> + int ret = 0;
>> +
>> + refs = reftable_be_downcast(ref_store, REF_STORE_READ, "fsck");
>> +
>> + if (o->verbose)
>> + fprintf_ln(stderr, _("Checking references consistency"));
>
> This line is duplicate across both backends, right? Maybe it's something
> that we can do in the generic logic?
>
Yeah, we can. Will do.
>> + ret |= reftable_fsck_check(refs->main_backend.stack, reftable_fsck_error_handler,
>> + reftable_fsck_verbose_handler, o);
>> +
>> + strmap_for_each_entry(&refs->worktree_backends, &iter, entry) {
>> + struct reftable_backend *b = (struct reftable_backend *)entry->value;
>> + ret |= reftable_fsck_check(b->stack, reftable_fsck_error_handler,
>> + reftable_fsck_verbose_handler, o);
>> + }
>> +
>> + return ret;
>> }
>>
>> struct ref_storage_be refs_be_reftable = {
>
> Looks good.
>
>> diff --git a/reftable/fsck.c b/reftable/fsck.c
>> new file mode 100644
>> index 0000000000..4282b1413e
>> --- /dev/null
>> +++ b/reftable/fsck.c
>> @@ -0,0 +1,53 @@
>> +#include "basics.h"
>> +#include "reftable-fsck.h"
>> +#include "stack.h"
>> +
>> +int reftable_fsck_check(struct reftable_stack *stack,
>> + reftable_fsck_report_fn report_fn,
>> + reftable_fsck_verbose_fn verbose_fn,
>> + void *cb_data)
>> +{
>> +
>> + char **names = NULL;
>> + uint64_t min, max;
>> + int err = 0;
>> +
>> + if (stack == NULL)
>> + goto out;
>> +
>> + err = read_lines(stack->list_file, &names);
>> + if (err < 0)
>> + goto out;
>> +
>> + verbose_fn("Checking reftable table names", cb_data);
>> +
>> + for (size_t i = 0; names[i]; i++) {
>> + struct reftable_fsck_info info = {
>> + .error = REFTABLE_FSCK_ERROR_TABLE_NAME,
>> + .path = names[i],
>> + };
>> + uint32_t rnd;
>> + /*
>> + * We want to match the tail '.ref'. One extra byte to ensure
>> + * that there is no unexpected extra character and one byte for
>> + * the null terminator added by sscanf.
>> + */
>> + char tail[6];
>> +
>> + if (sscanf(names[i], "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x%5s",
>> + &min, &max, &rnd, tail) != 4) {
>> + info.msg = "invalid reftable table name";
>
> This here is where the string should be translated.
>
But we don't have translation capabilities within the reftable lib no?
Or am I mistaken?
>> + err = report_fn(&info, cb_data);
>> + continue;
>> + }
>
> I think sscanf is quite frowned-upon in the Git codebase. Maybe we
> should manually parse through the string instead?
>
That would be cumbersome. This isn't user input data, so I thought this
would be okay. But let me do the change.
> Furthermore, I think we should move every single check into a separate
> function, similar to how the files backend does it. This ensures that
> checks are self-contained and that it's way easier to add new checks
> over time.
>
I think Shejialuo also mentioned this, let me do that.
> Another angle: did you verify that reftables written by JGit follow this
> format?
>
No I haven't.
>> + if (strcmp(tail, ".ref")) {
>> + info.msg = "invalid reftable table extension";
>
> Same here, this should be translated.
>
>> diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
>> new file mode 100644
>> index 0000000000..4cf0053234
>> --- /dev/null
>> +++ b/reftable/reftable-fsck.h
>> @@ -0,0 +1,38 @@
>> +#ifndef REFTABLE_FSCK_H
>> +#define REFTABLE_FSCK_H
>> +
>> +#include "reftable-stack.h"
>> +
>> +enum reftable_fsck_error {
>> + /* Invalid table name */
>> + REFTABLE_FSCK_ERROR_TABLE_NAME = -1,
>> +};
>
> Wouldn't it be more natural to give these positive numbers?
>
Yes, that would be better and also fits in with the array suggstion you
made earlier.
>> +/* Represents an individual error encountered during the FSCK checks. */
>> +struct reftable_fsck_info {
>> + enum reftable_fsck_error error;
>> + const char *msg;
>> + const char *path;
>> +};
>
> I wonder whether it should be the reftable library that decides on the
> severity of each generated finding.
>
I think that'd make sense. Let me add that in.
>> +typedef int reftable_fsck_report_fn(struct reftable_fsck_info *info,
>> + void *cb_data);
>> +typedef void reftable_fsck_verbose_fn(const char *msg, void *cb_data);
>> +
>> +/*
>> + * Given a reftable stack, perform FSCK check on the stack.
>
> s/FSCK check/consistency checks/
>
>> + *
>> + * If an issue is encountered, the issue is reported to the callee via the
>> + * provided 'report_fn'. If the issue is non-recoverable the flow will not
>> + * continue. If it is recoverable, the flow will continue and further issues
>> + * will be reported as identified.
>> + *
>> + * The 'verbose_fn' will be invoked to provide verbose information about
>> + * the progress and state of the FSCK checks.
>
> Same here.
>
Thanks, changed both.
>> diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
>> new file mode 100755
>> index 0000000000..81d30df2d7
>> --- /dev/null
>> +++ b/t/t0614-reftable-fsck.sh
>> @@ -0,0 +1,58 @@
>> +#!/bin/sh
>> +
>> +test_description='Test reftable backend consistency check'
>> +
>> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
>> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
>
> Tests shouldn't define these variables, but should dynamically figure
> out what the default branch name is as required, e.g. by using
> git-symbolic-ref(1).
>
Yeah, makes sense. Will change it.
>> +GIT_TEST_DEFAULT_REF_FORMAT=reftable
>> +export GIT_TEST_DEFAULT_REF_FORMAT
>> +
>> +. ./test-lib.sh
>> +
>> +test_expect_success 'table name should be checked' '
>> + test_when_finished "rm -rf repo" &&
>> + git init repo &&
>> + (
>> + cd repo &&
>> + git commit --allow-empty -m initial &&
>> +
>> + git refs verify 2>err &&
>> + test_must_be_empty err &&
>> +
>> + TABLE_NAME=$(cat .git/reftable/tables.list | head -n1) &&
>
> You can drop the cat(1) invocation and directly say `head -n1 file`.
>
Indeed, thanks!
>> + sed "1s/^/extra/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
>> + mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
>> + mv .git/reftable/${TABLE_NAME} .git/reftable/extra${TABLE_NAME} &&
>
> No need for the curly braces around TABLE_NAME here and further down. It
> would be nice to quote these strings though.
>
Understandable. I always prefer using them, since they make it much
easier to read, even without the ambiguity issue.
>> +
>> + test_must_fail git refs verify 2>err &&
>> + cat >expect <<-EOF &&
>> + error: extra${TABLE_NAME}: badReftableTableName: invalid reftable table name
>> + EOF
>> + test_cmp expect err
>> + )
>> +'
>> +
>> +test_expect_success 'table name should be checked' '
>> + test_when_finished "rm -rf repo" &&
>> + git init repo &&
>> + (
>> + cd repo &&
>> + git commit --allow-empty -m initial &&
>> +
>> + git refs verify 2>err &&
>> + test_must_be_empty err &&
>> +
>> + TABLE_NAME=$(cat .git/reftable/tables.list | head -n1) &&
>
> Same here wrt the extra invocation of cat(1).
>
Will change! Thanks.
> Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH v2 3/5] refs/reftable: add fsck check for number of tables
2025-09-02 7:05 ` [PATCH v2 " Karthik Nayak
2025-09-02 7:05 ` [PATCH v2 1/5] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
2025-09-02 7:05 ` [PATCH v2 2/5] refs/reftable: add fsck check for checking the table name Karthik Nayak
@ 2025-09-02 7:05 ` Karthik Nayak
2025-09-03 8:07 ` Patrick Steinhardt
2025-09-02 7:05 ` [PATCH v2 4/5] refs/reftable: add fsck check for trailing newline Karthik Nayak
2025-09-02 7:05 ` [PATCH v2 5/5] refs/reftable: add fsck check for incorrect update index Karthik Nayak
4 siblings, 1 reply; 96+ messages in thread
From: Karthik Nayak @ 2025-09-02 7:05 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, jltobler, shejialuo
Introduce a reftable fsck check to check that the number of files in the
reftable directory matches the number of files listed in 'tables.list'.
We do this by iterating over the files in the reftable directory and
counting all the files present excluding the 'tables.list'. This is also
exposed over Git's fsck checks as a 'badReftableStackCount' error.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Documentation/fsck-msgids.adoc | 3 +++
fsck.h | 1 +
refs/reftable-backend.c | 3 +++
reftable/fsck.c | 34 ++++++++++++++++++++++++++++++++++
reftable/reftable-fsck.h | 2 ++
t/t0614-reftable-fsck.sh | 20 ++++++++++++++++++++
6 files changed, 63 insertions(+)
diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 784ddc0df5..707e2fc50a 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -38,6 +38,9 @@
`badReferentName`::
(ERROR) The referent name of a symref is invalid.
+`badReftableStackCount`::
+ (ERROR) Mismatch in number of tables.
+
`badReftableTableName`::
(ERROR) A reftable table has an invalid name.
diff --git a/fsck.h b/fsck.h
index 5901f944a1..256effc4f8 100644
--- a/fsck.h
+++ b/fsck.h
@@ -34,6 +34,7 @@ enum fsck_msg_type {
FUNC(BAD_PACKED_REF_HEADER, ERROR) \
FUNC(BAD_PARENT_SHA1, ERROR) \
FUNC(BAD_REFERENT_NAME, ERROR) \
+ FUNC(BAD_REFTABLE_STACK_COUNT, ERROR) \
FUNC(BAD_REFTABLE_TABLE_NAME, ERROR) \
FUNC(BAD_REF_CONTENT, ERROR) \
FUNC(BAD_REF_FILETYPE, ERROR) \
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index c38c6422f8..59c39f9b52 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -2695,6 +2695,9 @@ static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
case REFTABLE_FSCK_ERROR_TABLE_NAME:
msg_id = FSCK_MSG_BAD_REFTABLE_TABLE_NAME;
break;
+ case REFTABLE_FSCK_ERROR_STACK_COUNT:
+ msg_id = FSCK_MSG_BAD_REFTABLE_STACK_COUNT;
+ break;
default:
BUG("unknown fsck error: %d", info->error);
}
diff --git a/reftable/fsck.c b/reftable/fsck.c
index 4282b1413e..20e6bfb0f1 100644
--- a/reftable/fsck.c
+++ b/reftable/fsck.c
@@ -2,6 +2,28 @@
#include "reftable-fsck.h"
#include "stack.h"
+static int reftable_fsck_valid_stack_count(struct reftable_stack *st)
+{
+ DIR *dir = opendir(st->reftable_dir);
+ struct dirent *d = NULL;
+ unsigned int count = 0;
+
+ if (!dir)
+ return 0;
+
+ while ((d = readdir(dir))) {
+ if (!strcmp(d->d_name, "tables.list"))
+ continue;
+
+ if (d->d_type == DT_REG)
+ count++;
+ }
+
+ closedir(dir);
+
+ return count == st->tables_len;
+}
+
int reftable_fsck_check(struct reftable_stack *stack,
reftable_fsck_report_fn report_fn,
reftable_fsck_verbose_fn verbose_fn,
@@ -47,6 +69,18 @@ int reftable_fsck_check(struct reftable_stack *stack,
}
}
+ verbose_fn("Checking reftable tables count", cb_data);
+
+ if (!reftable_fsck_valid_stack_count(stack)) {
+ struct reftable_fsck_info info = {
+ .error = REFTABLE_FSCK_ERROR_STACK_COUNT,
+ .path = "reftable/tables.list",
+ .msg = "mismatch in number of tables"
+ };
+
+ err = report_fn(&info, cb_data);
+ }
+
out:
free_names(names);
return err;
diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
index 4cf0053234..beba1bdd1a 100644
--- a/reftable/reftable-fsck.h
+++ b/reftable/reftable-fsck.h
@@ -6,6 +6,8 @@
enum reftable_fsck_error {
/* Invalid table name */
REFTABLE_FSCK_ERROR_TABLE_NAME = -1,
+ /* Incorrect number of tables present */
+ REFTABLE_FSCK_ERROR_STACK_COUNT = -2,
};
/* Represents an individual error encountered during the FSCK checks. */
diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
index 81d30df2d7..3a34a31890 100755
--- a/t/t0614-reftable-fsck.sh
+++ b/t/t0614-reftable-fsck.sh
@@ -55,4 +55,24 @@ test_expect_success 'table name should be checked' '
)
'
+test_expect_success 'table count should be checked' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git commit --allow-empty -m initial &&
+
+ git refs verify 2>err &&
+ test_must_be_empty err &&
+
+ touch .git/reftable/0x000000002812-0x000000002813-c830a596.ref &&
+
+ test_must_fail git refs verify 2>err &&
+ cat >expect <<-EOF &&
+ error: reftable/tables.list: badReftableStackCount: mismatch in number of tables
+ EOF
+ test_cmp expect err
+ )
+'
+
test_done
--
2.50.1
^ permalink raw reply related [flat|nested] 96+ messages in thread* Re: [PATCH v2 3/5] refs/reftable: add fsck check for number of tables
2025-09-02 7:05 ` [PATCH v2 3/5] refs/reftable: add fsck check for number of tables Karthik Nayak
@ 2025-09-03 8:07 ` Patrick Steinhardt
2025-09-15 9:27 ` Karthik Nayak
0 siblings, 1 reply; 96+ messages in thread
From: Patrick Steinhardt @ 2025-09-03 8:07 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, jltobler, shejialuo
On Tue, Sep 02, 2025 at 09:05:23AM +0200, Karthik Nayak wrote:
> Introduce a reftable fsck check to check that the number of files in the
> reftable directory matches the number of files listed in 'tables.list'.
> We do this by iterating over the files in the reftable directory and
> counting all the files present excluding the 'tables.list'. This is also
> exposed over Git's fsck checks as a 'badReftableStackCount' error.
This feels overly strict, as it can always be the case that a concurrent
process is currently updating the stack. Furthermore, it's expected that
on Windows systems deletion of an old table may not work because the
file is still kept open by another process. The reftable library is
prepared to handle this alright and will re-try deleting the table at a
later point in time.
So maybe a better check would be to verify that there are no files with
unexpected names in the directory?
Patrick
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v2 3/5] refs/reftable: add fsck check for number of tables
2025-09-03 8:07 ` Patrick Steinhardt
@ 2025-09-15 9:27 ` Karthik Nayak
0 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-09-15 9:27 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, jltobler, shejialuo
[-- Attachment #1: Type: text/plain, Size: 1457 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
> On Tue, Sep 02, 2025 at 09:05:23AM +0200, Karthik Nayak wrote:
>> Introduce a reftable fsck check to check that the number of files in the
>> reftable directory matches the number of files listed in 'tables.list'.
>> We do this by iterating over the files in the reftable directory and
>> counting all the files present excluding the 'tables.list'. This is also
>> exposed over Git's fsck checks as a 'badReftableStackCount' error.
>
> This feels overly strict, as it can always be the case that a concurrent
> process is currently updating the stack. Furthermore, it's expected that
> on Windows systems deletion of an old table may not work because the
> file is still kept open by another process. The reftable library is
> prepared to handle this alright and will re-try deleting the table at a
> later point in time.
>
Yeah that makes sense.
> So maybe a better check would be to verify that there are no files with
> unexpected names in the directory?
>
I was hoping to add structured consistency check in a layered format
- Stack
- For each `Table`
- For each `Block`
- For each `Ref`
But this wouldn't belong to that, since it isn't part of the stack. So,
I'll keep the above structure and also add this to the stack level. So
we'll have
- Stack
- For each `Table`
- For each `Block`
- For each `Ref`
- Other Stack level checks
- Check other files in the repo
> Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH v2 4/5] refs/reftable: add fsck check for trailing newline
2025-09-02 7:05 ` [PATCH v2 " Karthik Nayak
` (2 preceding siblings ...)
2025-09-02 7:05 ` [PATCH v2 3/5] refs/reftable: add fsck check for number of tables Karthik Nayak
@ 2025-09-02 7:05 ` Karthik Nayak
2025-09-02 22:38 ` Junio C Hamano
2025-09-02 7:05 ` [PATCH v2 5/5] refs/reftable: add fsck check for incorrect update index Karthik Nayak
4 siblings, 1 reply; 96+ messages in thread
From: Karthik Nayak @ 2025-09-02 7:05 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, jltobler, shejialuo
Introduce a fsck check for the reftable backend, which checks if the
'tables.list' contains a newline. The reftable backend writes a trailing
newline when writing the 'tables.list', but it doesn't check for it when
reading the file. A missing newline however indicates that the file was
manually tampered with, so let's raise this as an error to the user.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Documentation/fsck-msgids.adoc | 3 +++
fsck.h | 1 +
refs/reftable-backend.c | 3 +++
reftable/fsck.c | 36 ++++++++++++++++++++++++++++++++++++
reftable/reftable-fsck.h | 2 ++
t/t0614-reftable-fsck.sh | 21 +++++++++++++++++++++
6 files changed, 66 insertions(+)
diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 707e2fc50a..1432b1de06 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -41,6 +41,9 @@
`badReftableStackCount`::
(ERROR) Mismatch in number of tables.
+`badReftableStackListNewline`::
+ (ERROR) Reftable stack list missing trailing newline.
+
`badReftableTableName`::
(ERROR) A reftable table has an invalid name.
diff --git a/fsck.h b/fsck.h
index 256effc4f8..33432bae79 100644
--- a/fsck.h
+++ b/fsck.h
@@ -35,6 +35,7 @@ enum fsck_msg_type {
FUNC(BAD_PARENT_SHA1, ERROR) \
FUNC(BAD_REFERENT_NAME, ERROR) \
FUNC(BAD_REFTABLE_STACK_COUNT, ERROR) \
+ FUNC(BAD_REFTABLE_STACK_LIST_NEWLINE, ERROR) \
FUNC(BAD_REFTABLE_TABLE_NAME, ERROR) \
FUNC(BAD_REF_CONTENT, ERROR) \
FUNC(BAD_REF_FILETYPE, ERROR) \
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 59c39f9b52..7331513b19 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -2698,6 +2698,9 @@ static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
case REFTABLE_FSCK_ERROR_STACK_COUNT:
msg_id = FSCK_MSG_BAD_REFTABLE_STACK_COUNT;
break;
+ case REFTABLE_FSCK_ERROR_STACK_LIST_MISSING_NEWLINE:
+ msg_id = FSCK_MSG_BAD_REFTABLE_STACK_LIST_NEWLINE;
+ break;
default:
BUG("unknown fsck error: %d", info->error);
}
diff --git a/reftable/fsck.c b/reftable/fsck.c
index 20e6bfb0f1..9a7f22c56b 100644
--- a/reftable/fsck.c
+++ b/reftable/fsck.c
@@ -1,7 +1,31 @@
#include "basics.h"
+#include "reftable-error.h"
#include "reftable-fsck.h"
#include "stack.h"
+static int reftable_fsck_stack_contains_newline(const char *list_file)
+{
+ FILE *f = fopen(list_file, "r");
+ int c = 0;
+
+ if (f == NULL) {
+ if (errno == ENOENT)
+ return 0;
+ return REFTABLE_IO_ERROR;
+ }
+
+ if (fseek(f, 0, SEEK_END) == 0) {
+ long size = ftell(f);
+ if (size <= 0)
+ return REFTABLE_IO_ERROR;
+ fseek(f, -1, SEEK_END);
+ c = fgetc(f);
+ }
+ fclose(f);
+
+ return c == '\n';
+}
+
static int reftable_fsck_valid_stack_count(struct reftable_stack *st)
{
DIR *dir = opendir(st->reftable_dir);
@@ -69,6 +93,18 @@ int reftable_fsck_check(struct reftable_stack *stack,
}
}
+ verbose_fn("Checking trailing newline in stack list", cb_data);
+
+ if (!reftable_fsck_stack_contains_newline(stack->list_file)) {
+ struct reftable_fsck_info info = {
+ .error = REFTABLE_FSCK_ERROR_STACK_LIST_MISSING_NEWLINE,
+ .path = "reftable/tables.list",
+ .msg = "trailing newline missing in stack list"
+ };
+
+ err = report_fn(&info, cb_data);
+ }
+
verbose_fn("Checking reftable tables count", cb_data);
if (!reftable_fsck_valid_stack_count(stack)) {
diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
index beba1bdd1a..17df661da8 100644
--- a/reftable/reftable-fsck.h
+++ b/reftable/reftable-fsck.h
@@ -8,6 +8,8 @@ enum reftable_fsck_error {
REFTABLE_FSCK_ERROR_TABLE_NAME = -1,
/* Incorrect number of tables present */
REFTABLE_FSCK_ERROR_STACK_COUNT = -2,
+ /* Newline missing at the end of the stack list */
+ REFTABLE_FSCK_ERROR_STACK_LIST_MISSING_NEWLINE = -3,
};
/* Represents an individual error encountered during the FSCK checks. */
diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
index 3a34a31890..3b119eae62 100755
--- a/t/t0614-reftable-fsck.sh
+++ b/t/t0614-reftable-fsck.sh
@@ -75,4 +75,25 @@ test_expect_success 'table count should be checked' '
)
'
+test_expect_success 'stack list must contain trailing newline' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git commit --allow-empty -m initial &&
+
+ git refs verify 2>err &&
+ test_must_be_empty err &&
+
+ printf "%s" "$(cat .git/reftable/tables.list)" >.git/reftable/tables.list.tmp &&
+ mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
+
+ test_must_fail git refs verify 2>err &&
+ cat >expect <<-EOF &&
+ error: reftable/tables.list: badReftableStackListNewline: trailing newline missing in stack list
+ EOF
+ test_cmp expect err
+ )
+'
+
test_done
--
2.50.1
^ permalink raw reply related [flat|nested] 96+ messages in thread* Re: [PATCH v2 4/5] refs/reftable: add fsck check for trailing newline
2025-09-02 7:05 ` [PATCH v2 4/5] refs/reftable: add fsck check for trailing newline Karthik Nayak
@ 2025-09-02 22:38 ` Junio C Hamano
2025-09-03 8:07 ` Patrick Steinhardt
0 siblings, 1 reply; 96+ messages in thread
From: Junio C Hamano @ 2025-09-02 22:38 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, jltobler, shejialuo
Karthik Nayak <karthik.188@gmail.com> writes:
> Introduce a fsck check for the reftable backend, which checks if the
> 'tables.list' contains a newline. The reftable backend writes a trailing
> newline when writing the 'tables.list', but it doesn't check for it when
> reading the file. A missing newline however indicates that the file was
> manually tampered with, so let's raise this as an error to the user.
Hmph, how does the code react to other kinds of "manual tampering"?
For example, if an empty line is inserted between two existing lines
(or at the beginning of the file, for that matter), would the parser
detect it as a corrupt file and die?
If so, it makes me strongly suspect that we are better off enforcing
that the file does not end in an incomplete line at runtime and barf
just the same way, instead of "most of the anomalies that the write
codepath would never produce would cause error on the read codepath,
but only this one that the read codepath is happy with is caught by
the fsck", which does not sound quite right.
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v2 4/5] refs/reftable: add fsck check for trailing newline
2025-09-02 22:38 ` Junio C Hamano
@ 2025-09-03 8:07 ` Patrick Steinhardt
0 siblings, 0 replies; 96+ messages in thread
From: Patrick Steinhardt @ 2025-09-03 8:07 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Karthik Nayak, git, jltobler, shejialuo
On Tue, Sep 02, 2025 at 03:38:33PM -0700, Junio C Hamano wrote:
> Karthik Nayak <karthik.188@gmail.com> writes:
>
> > Introduce a fsck check for the reftable backend, which checks if the
> > 'tables.list' contains a newline. The reftable backend writes a trailing
> > newline when writing the 'tables.list', but it doesn't check for it when
> > reading the file. A missing newline however indicates that the file was
> > manually tampered with, so let's raise this as an error to the user.
>
> Hmph, how does the code react to other kinds of "manual tampering"?
> For example, if an empty line is inserted between two existing lines
> (or at the beginning of the file, for that matter), would the parser
> detect it as a corrupt file and die?
>
> If so, it makes me strongly suspect that we are better off enforcing
> that the file does not end in an incomplete line at runtime and barf
> just the same way, instead of "most of the anomalies that the write
> codepath would never produce would cause error on the read codepath,
> but only this one that the read codepath is happy with is caught by
> the fsck", which does not sound quite right.
Fair, I'm also of the opinion that we should tighten the parser logic to
detect and reject any invalid files. For previous checks where we verify
that the table names are sane I think it's fair to live with it and warn
about those, as the actual names don't really matter. But as soon as we
hit actually-broken formats I also think that we're better of rejecting
those altogether.
Patrick
^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH v2 5/5] refs/reftable: add fsck check for incorrect update index
2025-09-02 7:05 ` [PATCH v2 " Karthik Nayak
` (3 preceding siblings ...)
2025-09-02 7:05 ` [PATCH v2 4/5] refs/reftable: add fsck check for trailing newline Karthik Nayak
@ 2025-09-02 7:05 ` Karthik Nayak
2025-09-02 22:42 ` Junio C Hamano
4 siblings, 1 reply; 96+ messages in thread
From: Karthik Nayak @ 2025-09-02 7:05 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, jltobler, shejialuo
Introduce a fsck check for the reftable backend, which checks if the
tables in 'tables.list' contain sequential update index. The tables in
the reftable backend should contain sequential update index. This fsck
check ensures that.
We must note that the reftable backend itself doesn't check to ensure
this and it also doesn't check to ensure that the index in the table
name matches the index in the header or the table. The latter is not
implemented in this fsck check either and will be added in a future
patch where we add fsck checks for internals of a table.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Documentation/fsck-msgids.adoc | 3 ++
fsck.h | 1 +
refs/reftable-backend.c | 3 ++
reftable/fsck.c | 15 ++++++++--
reftable/reftable-fsck.h | 2 ++
t/t0614-reftable-fsck.sh | 62 ++++++++++++++++++++++++++++++++++++++++++
6 files changed, 84 insertions(+), 2 deletions(-)
diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 1432b1de06..982d51876c 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -47,6 +47,9 @@
`badReftableTableName`::
(ERROR) A reftable table has an invalid name.
+`badReftableUpdateIndex`::
+ (ERROR) Incorrect update index found for table.
+
`badTagName`::
(INFO) A tag has an invalid format.
diff --git a/fsck.h b/fsck.h
index 33432bae79..60e9b84183 100644
--- a/fsck.h
+++ b/fsck.h
@@ -37,6 +37,7 @@ enum fsck_msg_type {
FUNC(BAD_REFTABLE_STACK_COUNT, ERROR) \
FUNC(BAD_REFTABLE_STACK_LIST_NEWLINE, ERROR) \
FUNC(BAD_REFTABLE_TABLE_NAME, ERROR) \
+ FUNC(BAD_REFTABLE_UPDATE_INDEX, ERROR) \
FUNC(BAD_REF_CONTENT, ERROR) \
FUNC(BAD_REF_FILETYPE, ERROR) \
FUNC(BAD_REF_NAME, ERROR) \
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 7331513b19..519ade24b8 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -2701,6 +2701,9 @@ static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
case REFTABLE_FSCK_ERROR_STACK_LIST_MISSING_NEWLINE:
msg_id = FSCK_MSG_BAD_REFTABLE_STACK_LIST_NEWLINE;
break;
+ case REFTABLE_FSCK_ERROR_UPDATE_INDEX:
+ msg_id = FSCK_MSG_BAD_REFTABLE_UPDATE_INDEX;
+ break;
default:
BUG("unknown fsck error: %d", info->error);
}
diff --git a/reftable/fsck.c b/reftable/fsck.c
index 9a7f22c56b..5c6d842ac1 100644
--- a/reftable/fsck.c
+++ b/reftable/fsck.c
@@ -53,9 +53,8 @@ int reftable_fsck_check(struct reftable_stack *stack,
reftable_fsck_verbose_fn verbose_fn,
void *cb_data)
{
-
+ uint64_t min, max, prev_max = 0;
char **names = NULL;
- uint64_t min, max;
int err = 0;
if (stack == NULL)
@@ -87,10 +86,22 @@ int reftable_fsck_check(struct reftable_stack *stack,
continue;
}
+ if (min != (prev_max + 1) || max < min) {
+ struct reftable_fsck_info info = {
+ .error = REFTABLE_FSCK_ERROR_UPDATE_INDEX,
+ .path = names[i],
+ .msg = "incorrect update index in table name"
+ };
+
+ err = report_fn(&info, cb_data);
+ }
+
if (strcmp(tail, ".ref")) {
info.msg = "invalid reftable table extension";
err = report_fn(&info, cb_data);
}
+
+ prev_max = max;
}
verbose_fn("Checking trailing newline in stack list", cb_data);
diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
index 17df661da8..0ab20a99b6 100644
--- a/reftable/reftable-fsck.h
+++ b/reftable/reftable-fsck.h
@@ -10,6 +10,8 @@ enum reftable_fsck_error {
REFTABLE_FSCK_ERROR_STACK_COUNT = -2,
/* Newline missing at the end of the stack list */
REFTABLE_FSCK_ERROR_STACK_LIST_MISSING_NEWLINE = -3,
+ /* Incorrect update index for table */
+ REFTABLE_FSCK_ERROR_UPDATE_INDEX = -4,
};
/* Represents an individual error encountered during the FSCK checks. */
diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
index 3b119eae62..1f37691b2e 100755
--- a/t/t0614-reftable-fsck.sh
+++ b/t/t0614-reftable-fsck.sh
@@ -96,4 +96,66 @@ test_expect_success 'stack list must contain trailing newline' '
)
'
+test_expect_success 'table update index should be sequential between tables' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git commit --allow-empty -m initial &&
+
+ # Lock the existing table to disable auto-compaction
+ CUR_TABLE=$(cat .git/reftable/tables.list | tail -n1) &&
+ touch .git/reftable/${CUR_TABLE}.lock &&
+ git update-ref refs/heads/sample @ &&
+ rm .git/reftable/${CUR_TABLE}.lock &&
+
+ git refs verify 2>err &&
+ test_must_be_empty err &&
+
+ TABLE_NAME=$(cat .git/reftable/tables.list | tail -n1) &&
+ NEW_TABLE_NAME=$(echo ${TABLE_NAME} | sed "s/0003/0009/g") &&
+
+ sed "2s/.*/${NEW_TABLE_NAME}/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
+ mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
+ mv .git/reftable/${TABLE_NAME} .git/reftable/${NEW_TABLE_NAME} &&
+
+ test_must_fail git refs verify 2>err &&
+ cat >expect <<-EOF &&
+ error: ${NEW_TABLE_NAME}: badReftableUpdateIndex: incorrect update index in table name
+ EOF
+ test_cmp expect err
+ )
+'
+
+test_expect_success 'table update index should be sequential within a table' '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git commit --allow-empty -m initial &&
+
+ # Lock the existing table to disable auto-compaction
+ CUR_TABLE=$(cat .git/reftable/tables.list | tail -n1) &&
+ touch .git/reftable/${CUR_TABLE}.lock &&
+ git update-ref refs/heads/sample @ &&
+ rm .git/reftable/${CUR_TABLE}.lock &&
+
+ git refs verify 2>err &&
+ test_must_be_empty err &&
+
+ TABLE_NAME=$(cat .git/reftable/tables.list | tail -n1) &&
+ NEW_TABLE_NAME=$(echo ${TABLE_NAME} | sed "s/\(.*\)0003/\10002/") &&
+
+ sed "2s/.*/${NEW_TABLE_NAME}/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
+ mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
+ mv .git/reftable/${TABLE_NAME} .git/reftable/${NEW_TABLE_NAME} &&
+
+ test_must_fail git refs verify 2>err &&
+ cat >expect <<-EOF &&
+ error: ${NEW_TABLE_NAME}: badReftableUpdateIndex: incorrect update index in table name
+ EOF
+ test_cmp expect err
+ )
+'
+
test_done
--
2.50.1
^ permalink raw reply related [flat|nested] 96+ messages in thread* Re: [PATCH v2 5/5] refs/reftable: add fsck check for incorrect update index
2025-09-02 7:05 ` [PATCH v2 5/5] refs/reftable: add fsck check for incorrect update index Karthik Nayak
@ 2025-09-02 22:42 ` Junio C Hamano
2025-09-18 8:11 ` Karthik Nayak
0 siblings, 1 reply; 96+ messages in thread
From: Junio C Hamano @ 2025-09-02 22:42 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, jltobler, shejialuo
Karthik Nayak <karthik.188@gmail.com> writes:
> Introduce a fsck check for the reftable backend, which checks if the
> tables in 'tables.list' contain sequential update index. The tables in
> the reftable backend should contain sequential update index. This fsck
> check ensures that.
>
> We must note that the reftable backend itself doesn't check to ensure
> this and it also doesn't check to ensure that the index in the table
> name matches the index in the header or the table. The latter is not
> implemented in this fsck check either and will be added in a future
> patch where we add fsck checks for internals of a table.
Similar to the previous step, I am not sure why this should not be
checked at runtime and is flagged as an error.
In general, we do try to avoid retroactively tightening rules, but
the reftable is so new and not even the default. If we noticed that
the runtime has been overly loose, the time to tighten it is now,
not after even more installations use it.
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v2 5/5] refs/reftable: add fsck check for incorrect update index
2025-09-02 22:42 ` Junio C Hamano
@ 2025-09-18 8:11 ` Karthik Nayak
0 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-09-18 8:11 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, jltobler, shejialuo
[-- Attachment #1: Type: text/plain, Size: 1514 bytes --]
Junio C Hamano <gitster@pobox.com> writes:
> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> Introduce a fsck check for the reftable backend, which checks if the
>> tables in 'tables.list' contain sequential update index. The tables in
>> the reftable backend should contain sequential update index. This fsck
>> check ensures that.
>>
>> We must note that the reftable backend itself doesn't check to ensure
>> this and it also doesn't check to ensure that the index in the table
>> name matches the index in the header or the table. The latter is not
>> implemented in this fsck check either and will be added in a future
>> patch where we add fsck checks for internals of a table.
>
> Similar to the previous step, I am not sure why this should not be
> checked at runtime and is flagged as an error.
>
> In general, we do try to avoid retroactively tightening rules, but
> the reftable is so new and not even the default. If we noticed that
> the runtime has been overly loose, the time to tighten it is now,
> not after even more installations use it.
I think your point is fair and I agree. I did consider it, but didn't
want to 'retroactively tightening rules'. But I think it is justified,
more so since we're not introducing any changes on the format, but just
more validation around it.
I should've replied here sooner, but I didn't ever get to it. I found
some time to finally fix the comments from this series and will send in
a new version. I will be more diligent about it here on.
- Karthik
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH v3 0/8] refs/reftable: add consistency checks
2025-08-19 12:20 [PATCH 0/5] refs/reftable: add fsck checks Karthik Nayak
` (6 preceding siblings ...)
2025-09-02 7:05 ` [PATCH v2 " Karthik Nayak
@ 2025-09-18 8:11 ` Karthik Nayak
2025-09-18 8:11 ` [PATCH v3 1/8] refs: remove unused headers Karthik Nayak
` (7 more replies)
2025-09-26 7:25 ` [PATCH v4 0/7] refs/reftable: add consistency checks Karthik Nayak
` (2 subsequent siblings)
10 siblings, 8 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-09-18 8:11 UTC (permalink / raw)
To: git; +Cc: ps, gitster, shejialuo, Karthik Nayak
The reference subsystems allows for adding backend specific consistency
checks. These checks are run as part of 'git refs verify'.
While the files backend has some consistency checks added, the reftable
backend currently has none. This series first tightens the reftable
backend to make it a little more strict and then also adds the required
infrastructure and some simple consistency checks.
Since the reftable backend is treated as a library within the Git
codebase, we don't want to spillover our internal fsck implementation
into the library. At the same time, the fsck checks need to access
internal structures of the reftable library which aren't exposed outside
the library.
So we solve this by adding a 'reftable/fsck.[ch]' which implements and
exposes a checker for the reftable library and returns specific errors
as defined by the library. We then add glue code within
'refs/reftable-backend.c' to map these errors to errors which Git's fsck
implementation would understand. This allows us to separate concerns.
We add the following consistency checks:
1. Check for validating the reftable table name. This is treated as a
warning since the reftable specification only suggests a table name
but doesn't enforce it. Also there is a difference in the table name
used in Git vs that in jGit.
2. Check for checking additional files present in the reftable
directory.
We tighten the reftable backend by raising a REFTABLE_FORMAT_ERROR error
when:
1. The 'tables.list' file doesn't have a trailing newline.
2. Tables added to a reftable stack are not sequential.
---
Changes in v3:
- I took a long hiatus from this topic, mostly due to other priorities.
This has been rebased on top of '92c87bdc40 (The eighth batch,
2025-09-12)' since there were conflicts.
- Junio suggested that two of the consistency checks (trailing newlines,
sequential update indices for tables in stack) should actually be
checked during runtime. I have made that change in this version.
- I've cleaned up the code and modularized the 'reftable/fsck.c' code.
- Invalid table name emits a warning, since the reftable spec doesn't
enforce it but only makes a suggestion.
- Broken down the commits to make it easier to review.
- Link to v2: https://lore.kernel.org/r/20250902-228-reftable-introduce-consistency-checks-v2-0-4f96b3834779@gmail.com
Changes in v2:
- Ensured that 'struct reftable_fsck_info' is passed around as a
pointer, this provides a smaller footprint (pointer size vs struct
size).
- Run FSCK checks for other worktrees too, even if one of them fails.
- Separate messaging for table name vs table check and add additional
test.
- Use the relative path in messages used.
- Small style and typo fixes.
- Link to v1: https://lore.kernel.org/r/20250819-228-reftable-introduce-consistency-checks-v1-0-8b8f6879fa9e@gmail.com
---
Documentation/fsck-msgids.adoc | 9 ++--
Makefile | 3 +-
fsck.h | 40 +++++++-------
meson.build | 1 +
refs.c | 4 ++
refs/debug.c | 1 -
refs/files-backend.c | 3 --
refs/reftable-backend.c | 59 ++++++++++++++++++---
reftable/basics.c | 28 ++++++----
reftable/basics.h | 7 +--
reftable/fsck.c | 112 +++++++++++++++++++++++++++++++++++++++
reftable/reftable-fsck.h | 42 +++++++++++++++
reftable/stack.c | 15 ++++--
t/meson.build | 1 +
t/t0614-reftable-fsck.sh | 55 +++++++++++++++++++
t/unit-tests/u-reftable-basics.c | 23 ++++++--
t/unit-tests/u-reftable-stack.c | 28 ++++++++++
17 files changed, 378 insertions(+), 53 deletions(-)
Karthik Nayak (8):
refs: remove unused headers
refs: move consistency check msg to generic layer
reftable: check for trailing newline in 'tables.list'
reftable: ensure tables in a stack use sequential update indices
Documentation/fsck-msgids: remove duplicate msg id
fsck: order 'fsck_msg_type' alphabetically
reftable: add code to facilitate consistency checks
refs/reftable: add fsck check for checking the table name
Range-diff versus v2:
1: eea34c56f0 < -: ---------- fsck: order 'fsck_msg_type' alphabetically
2: dafcf618e9 < -: ---------- refs/reftable: add fsck check for checking the table name
3: 20294ade9b < -: ---------- refs/reftable: add fsck check for number of tables
4: 03c7979528 < -: ---------- refs/reftable: add fsck check for trailing newline
5: eb74502cd3 < -: ---------- refs/reftable: add fsck check for incorrect update index
-: ---------- > 1: c9f39a04ca refs: remove unused headers
-: ---------- > 2: e1baf61a8a refs: move consistency check msg to generic layer
-: ---------- > 3: 88a2ae1171 reftable: check for trailing newline in 'tables.list'
-: ---------- > 4: 2dd1750a9d reftable: ensure tables in a stack use sequential update indices
-: ---------- > 5: a7f6c52385 Documentation/fsck-msgids: remove duplicate msg id
-: ---------- > 6: 873c21c73f fsck: order 'fsck_msg_type' alphabetically
-: ---------- > 7: cbaac94328 reftable: add code to facilitate consistency checks
-: ---------- > 8: e7fcc15608 refs/reftable: add fsck check for checking the table name
base-commit: a483264b01b977f3e65a4419103c21e6af7412a2
change-id: 20250714-228-reftable-introduce-consistency-checks-379ded93c544
Thanks
- Karthik
^ permalink raw reply [flat|nested] 96+ messages in thread* [PATCH v3 1/8] refs: remove unused headers
2025-09-18 8:11 ` [PATCH v3 0/8] refs/reftable: add consistency checks Karthik Nayak
@ 2025-09-18 8:11 ` Karthik Nayak
2025-09-18 8:11 ` [PATCH v3 2/8] refs: move consistency check msg to generic layer Karthik Nayak
` (6 subsequent siblings)
7 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-09-18 8:11 UTC (permalink / raw)
To: git; +Cc: ps, gitster, shejialuo, Karthik Nayak
In the 'refs/' namespace, some of the included header files are not
needed, let's remove them.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
refs/debug.c | 1 -
refs/files-backend.c | 1 -
refs/reftable-backend.c | 1 -
3 files changed, 3 deletions(-)
diff --git a/refs/debug.c b/refs/debug.c
index 1cb955961e..697adbd0dc 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -1,7 +1,6 @@
#include "git-compat-util.h"
#include "hex.h"
#include "refs-internal.h"
-#include "string-list.h"
#include "trace.h"
static struct trace_key trace_refs = TRACE_KEY_INIT(REFS);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 1b3bf26add..d4fb033417 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -20,7 +20,6 @@
#include "../dir-iterator.h"
#include "../lockfile.h"
#include "../object.h"
-#include "../object-file.h"
#include "../path.h"
#include "../dir.h"
#include "../chdir-notify.h"
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 9e889da2ff..2152349cb9 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -11,7 +11,6 @@
#include "../hex.h"
#include "../iterator.h"
#include "../ident.h"
-#include "../lockfile.h"
#include "../object.h"
#include "../path.h"
#include "../refs.h"
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* [PATCH v3 2/8] refs: move consistency check msg to generic layer
2025-09-18 8:11 ` [PATCH v3 0/8] refs/reftable: add consistency checks Karthik Nayak
2025-09-18 8:11 ` [PATCH v3 1/8] refs: remove unused headers Karthik Nayak
@ 2025-09-18 8:11 ` Karthik Nayak
2025-09-18 8:11 ` [PATCH v3 3/8] reftable: check for trailing newline in 'tables.list' Karthik Nayak
` (5 subsequent siblings)
7 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-09-18 8:11 UTC (permalink / raw)
To: git; +Cc: ps, gitster, shejialuo, Karthik Nayak
The files-backend prints a message before the consistency checks run.
Move this to the generic layer so both the files and reftable backend
can benefit from this message.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
refs.c | 4 ++++
refs/files-backend.c | 2 --
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/refs.c b/refs.c
index 4ff55cf24f..4a7c394226 100644
--- a/refs.c
+++ b/refs.c
@@ -32,6 +32,7 @@
#include "commit.h"
#include "wildmatch.h"
#include "ident.h"
+#include "fsck.h"
/*
* List of all available backends
@@ -323,6 +324,9 @@ int check_refname_format(const char *refname, int flags)
int refs_fsck(struct ref_store *refs, struct fsck_options *o,
struct worktree *wt)
{
+ if (o->verbose)
+ fprintf_ln(stderr, _("Checking references consistency"));
+
return refs->be->fsck(refs, o, wt);
}
diff --git a/refs/files-backend.c b/refs/files-backend.c
index d4fb033417..603b1343d8 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3906,8 +3906,6 @@ static int files_fsck_refs(struct ref_store *ref_store,
NULL,
};
- if (o->verbose)
- fprintf_ln(stderr, _("Checking references consistency"));
return files_fsck_refs_dir(ref_store, o, "refs", wt, fsck_refs_fn);
}
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* [PATCH v3 3/8] reftable: check for trailing newline in 'tables.list'
2025-09-18 8:11 ` [PATCH v3 0/8] refs/reftable: add consistency checks Karthik Nayak
2025-09-18 8:11 ` [PATCH v3 1/8] refs: remove unused headers Karthik Nayak
2025-09-18 8:11 ` [PATCH v3 2/8] refs: move consistency check msg to generic layer Karthik Nayak
@ 2025-09-18 8:11 ` Karthik Nayak
2025-09-18 15:36 ` Junio C Hamano
` (2 more replies)
2025-09-18 8:11 ` [PATCH v3 4/8] reftable: ensure tables in a stack use sequential update indices Karthik Nayak
` (4 subsequent siblings)
7 siblings, 3 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-09-18 8:11 UTC (permalink / raw)
To: git; +Cc: ps, gitster, shejialuo, Karthik Nayak
In the reftable format, the 'tables.list' file contains a newline
separated list of tables. While we parse this file, we do not check or
care about trailing newlines. Tighten the parser in `parse_names()` to
return an appropriate error if there is no trailing newline.
This requires modification to `parse_names()` to accept a third argument
which will hold the error value.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
reftable/basics.c | 28 +++++++++++++++++++---------
reftable/basics.h | 7 ++++---
reftable/stack.c | 6 ++----
t/unit-tests/u-reftable-basics.c | 23 +++++++++++++++++++----
4 files changed, 44 insertions(+), 20 deletions(-)
diff --git a/reftable/basics.c b/reftable/basics.c
index 9988ebd635..75d4086769 100644
--- a/reftable/basics.c
+++ b/reftable/basics.c
@@ -195,7 +195,7 @@ size_t names_length(const char **names)
return p - names;
}
-char **parse_names(char *buf, int size)
+char **parse_names(char *buf, int size, int *err)
{
char **names = NULL;
size_t names_cap = 0;
@@ -205,30 +205,40 @@ char **parse_names(char *buf, int size)
while (p < end) {
char *next = strchr(p, '\n');
- if (next && next < end) {
+ if (!next) {
+ *err = REFTABLE_FORMAT_ERROR;
+ goto done;
+ } else if (next < end) {
*next = 0;
} else {
next = end;
}
+
if (p < next) {
if (REFTABLE_ALLOC_GROW(names, names_len + 1,
- names_cap))
- goto err;
+ names_cap)) {
+ *err = REFTABLE_OUT_OF_MEMORY_ERROR;
+ goto done;
+ }
names[names_len] = reftable_strdup(p);
- if (!names[names_len++])
- goto err;
+ if (!names[names_len++]) {
+ *err = REFTABLE_OUT_OF_MEMORY_ERROR;
+ goto done;
+ }
}
p = next + 1;
}
- if (REFTABLE_ALLOC_GROW(names, names_len + 1, names_cap))
- goto err;
+ if (REFTABLE_ALLOC_GROW(names, names_len + 1, names_cap)) {
+ *err = REFTABLE_OUT_OF_MEMORY_ERROR;
+ goto done;
+ }
names[names_len] = NULL;
return names;
-err:
+done:
for (size_t i = 0; i < names_len; i++)
reftable_free(names[i]);
reftable_free(names);
diff --git a/reftable/basics.h b/reftable/basics.h
index 7d22f96261..019dfe6d7e 100644
--- a/reftable/basics.h
+++ b/reftable/basics.h
@@ -167,10 +167,11 @@ void free_names(char **a);
/*
* Parse a newline separated list of names. `size` is the length of the buffer,
- * without terminating '\0'. Empty names are discarded. Returns a `NULL`
- * pointer when allocations fail.
+ * without terminating '\0'. Empty names are discarded.
+ *
+ * Errors are assigned to the `err` variable.
*/
-char **parse_names(char *buf, int size);
+char **parse_names(char *buf, int size, int *err);
/* compares two NULL-terminated arrays of strings. */
int names_equal(const char **a, const char **b);
diff --git a/reftable/stack.c b/reftable/stack.c
index f91ce50bcd..955be1edb6 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -109,11 +109,9 @@ static int fd_read_lines(int fd, char ***namesp)
}
buf[size] = 0;
- *namesp = parse_names(buf, size);
- if (!*namesp) {
- err = REFTABLE_OUT_OF_MEMORY_ERROR;
+ *namesp = parse_names(buf, size, &err);
+ if (!*namesp)
goto done;
- }
done:
reftable_free(buf);
diff --git a/t/unit-tests/u-reftable-basics.c b/t/unit-tests/u-reftable-basics.c
index a0471083e7..f77ec96429 100644
--- a/t/unit-tests/u-reftable-basics.c
+++ b/t/unit-tests/u-reftable-basics.c
@@ -9,6 +9,7 @@ license that can be found in the LICENSE file or at
#include "unit-test.h"
#include "lib-reftable.h"
#include "reftable/basics.h"
+#include "reftable/reftable-error.h"
struct integer_needle_lesseq_args {
int needle;
@@ -79,14 +80,17 @@ void test_reftable_basics__names_equal(void)
void test_reftable_basics__parse_names(void)
{
char in1[] = "line\n";
- char in2[] = "a\nb\nc";
- char **out = parse_names(in1, strlen(in1));
+ char in2[] = "a\nb\nc\n";
+ int err = 0;
+ char **out = parse_names(in1, strlen(in1), &err);
+ cl_assert(err == 0);
cl_assert(out != NULL);
cl_assert_equal_s(out[0], "line");
cl_assert(!out[1]);
free_names(out);
- out = parse_names(in2, strlen(in2));
+ out = parse_names(in2, strlen(in2), &err);
+ cl_assert(err == 0);
cl_assert(out != NULL);
cl_assert_equal_s(out[0], "a");
cl_assert_equal_s(out[1], "b");
@@ -95,10 +99,21 @@ void test_reftable_basics__parse_names(void)
free_names(out);
}
+void test_reftable_basics__parse_names_missing_newline(void)
+{
+ char in1[] = "line\nline2";
+ int err = 0;
+ char **out = parse_names(in1, strlen(in1), &err);
+ cl_assert(err == REFTABLE_FORMAT_ERROR);
+ cl_assert(out == NULL);
+}
+
void test_reftable_basics__parse_names_drop_empty_string(void)
{
char in[] = "a\n\nb\n";
- char **out = parse_names(in, strlen(in));
+ int err = 0;
+ char **out = parse_names(in, strlen(in), &err);
+ cl_assert(err == 0);
cl_assert(out != NULL);
cl_assert_equal_s(out[0], "a");
/* simply '\n' should be dropped as empty string */
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* Re: [PATCH v3 3/8] reftable: check for trailing newline in 'tables.list'
2025-09-18 8:11 ` [PATCH v3 3/8] reftable: check for trailing newline in 'tables.list' Karthik Nayak
@ 2025-09-18 15:36 ` Junio C Hamano
2025-09-23 15:42 ` Karthik Nayak
2025-09-24 5:54 ` Patrick Steinhardt
2025-09-24 7:24 ` Kristoffer Haugsbakk
2 siblings, 1 reply; 96+ messages in thread
From: Junio C Hamano @ 2025-09-18 15:36 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, ps, shejialuo
Karthik Nayak <karthik.188@gmail.com> writes:
> diff --git a/reftable/basics.h b/reftable/basics.h
> index 7d22f96261..019dfe6d7e 100644
> --- a/reftable/basics.h
> +++ b/reftable/basics.h
> @@ -167,10 +167,11 @@ void free_names(char **a);
>
> /*
> * Parse a newline separated list of names. `size` is the length of the buffer,
> - * without terminating '\0'. Empty names are discarded. Returns a `NULL`
> - * pointer when allocations fail.
> + * without terminating '\0'. Empty names are discarded.
> + *
> + * Errors are assigned to the `err` variable.
> */
> -char **parse_names(char *buf, int size);
> +char **parse_names(char *buf, int size, int *err);
>
> /* compares two NULL-terminated arrays of strings. */
> int names_equal(const char **a, const char **b);
Makes sense.
> diff --git a/reftable/stack.c b/reftable/stack.c
> index f91ce50bcd..955be1edb6 100644
> --- a/reftable/stack.c
> +++ b/reftable/stack.c
> @@ -109,11 +109,9 @@ static int fd_read_lines(int fd, char ***namesp)
> }
> buf[size] = 0;
>
> - *namesp = parse_names(buf, size);
> - if (!*namesp) {
> - err = REFTABLE_OUT_OF_MEMORY_ERROR;
> + *namesp = parse_names(buf, size, &err);
> + if (!*namesp)
> goto done;
Nice.
> diff --git a/t/unit-tests/u-reftable-basics.c b/t/unit-tests/u-reftable-basics.c
> index a0471083e7..f77ec96429 100644
> --- a/t/unit-tests/u-reftable-basics.c
> +++ b/t/unit-tests/u-reftable-basics.c
> @@ -9,6 +9,7 @@ license that can be found in the LICENSE file or at
> #include "unit-test.h"
> #include "lib-reftable.h"
> #include "reftable/basics.h"
> +#include "reftable/reftable-error.h"
>
> struct integer_needle_lesseq_args {
> int needle;
> @@ -79,14 +80,17 @@ void test_reftable_basics__names_equal(void)
> void test_reftable_basics__parse_names(void)
> {
> char in1[] = "line\n";
> - char in2[] = "a\nb\nc";
> - char **out = parse_names(in1, strlen(in1));
> + char in2[] = "a\nb\nc\n";
> + int err = 0;
> + char **out = parse_names(in1, strlen(in1), &err);
> + cl_assert(err == 0);
> cl_assert(out != NULL);
> cl_assert_equal_s(out[0], "line");
> cl_assert(!out[1]);
> free_names(out);
>
> - out = parse_names(in2, strlen(in2));
> + out = parse_names(in2, strlen(in2), &err);
> + cl_assert(err == 0);
> cl_assert(out != NULL);
> cl_assert_equal_s(out[0], "a");
> cl_assert_equal_s(out[1], "b");
Sensible.
> @@ -95,10 +99,21 @@ void test_reftable_basics__parse_names(void)
> free_names(out);
> }
>
> +void test_reftable_basics__parse_names_missing_newline(void)
> +{
> + char in1[] = "line\nline2";
> + int err = 0;
> + char **out = parse_names(in1, strlen(in1), &err);
> + cl_assert(err == REFTABLE_FORMAT_ERROR);
> + cl_assert(out == NULL);
> +}
OK.
> void test_reftable_basics__parse_names_drop_empty_string(void)
> {
> char in[] = "a\n\nb\n";
> - char **out = parse_names(in, strlen(in));
> + int err = 0;
> + char **out = parse_names(in, strlen(in), &err);
> + cl_assert(err == 0);
I'll drop an extra SP after == here (no need to resend only to fix
this).
> cl_assert(out != NULL);
> cl_assert_equal_s(out[0], "a");
> /* simply '\n' should be dropped as empty string */
^ permalink raw reply [flat|nested] 96+ messages in thread* Re: [PATCH v3 3/8] reftable: check for trailing newline in 'tables.list'
2025-09-18 15:36 ` Junio C Hamano
@ 2025-09-23 15:42 ` Karthik Nayak
0 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-09-23 15:42 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, ps, shejialuo
[-- Attachment #1: Type: text/plain, Size: 3263 bytes --]
Junio C Hamano <gitster@pobox.com> writes:
> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> diff --git a/reftable/basics.h b/reftable/basics.h
>> index 7d22f96261..019dfe6d7e 100644
>> --- a/reftable/basics.h
>> +++ b/reftable/basics.h
>> @@ -167,10 +167,11 @@ void free_names(char **a);
>>
>> /*
>> * Parse a newline separated list of names. `size` is the length of the buffer,
>> - * without terminating '\0'. Empty names are discarded. Returns a `NULL`
>> - * pointer when allocations fail.
>> + * without terminating '\0'. Empty names are discarded.
>> + *
>> + * Errors are assigned to the `err` variable.
>> */
>> -char **parse_names(char *buf, int size);
>> +char **parse_names(char *buf, int size, int *err);
>>
>> /* compares two NULL-terminated arrays of strings. */
>> int names_equal(const char **a, const char **b);
>
> Makes sense.
>
>> diff --git a/reftable/stack.c b/reftable/stack.c
>> index f91ce50bcd..955be1edb6 100644
>> --- a/reftable/stack.c
>> +++ b/reftable/stack.c
>> @@ -109,11 +109,9 @@ static int fd_read_lines(int fd, char ***namesp)
>> }
>> buf[size] = 0;
>>
>> - *namesp = parse_names(buf, size);
>> - if (!*namesp) {
>> - err = REFTABLE_OUT_OF_MEMORY_ERROR;
>> + *namesp = parse_names(buf, size, &err);
>> + if (!*namesp)
>> goto done;
>
> Nice.
>
>> diff --git a/t/unit-tests/u-reftable-basics.c b/t/unit-tests/u-reftable-basics.c
>> index a0471083e7..f77ec96429 100644
>> --- a/t/unit-tests/u-reftable-basics.c
>> +++ b/t/unit-tests/u-reftable-basics.c
>> @@ -9,6 +9,7 @@ license that can be found in the LICENSE file or at
>> #include "unit-test.h"
>> #include "lib-reftable.h"
>> #include "reftable/basics.h"
>> +#include "reftable/reftable-error.h"
>>
>> struct integer_needle_lesseq_args {
>> int needle;
>> @@ -79,14 +80,17 @@ void test_reftable_basics__names_equal(void)
>> void test_reftable_basics__parse_names(void)
>> {
>> char in1[] = "line\n";
>> - char in2[] = "a\nb\nc";
>> - char **out = parse_names(in1, strlen(in1));
>> + char in2[] = "a\nb\nc\n";
>> + int err = 0;
>> + char **out = parse_names(in1, strlen(in1), &err);
>> + cl_assert(err == 0);
>> cl_assert(out != NULL);
>> cl_assert_equal_s(out[0], "line");
>> cl_assert(!out[1]);
>> free_names(out);
>>
>> - out = parse_names(in2, strlen(in2));
>> + out = parse_names(in2, strlen(in2), &err);
>> + cl_assert(err == 0);
>> cl_assert(out != NULL);
>> cl_assert_equal_s(out[0], "a");
>> cl_assert_equal_s(out[1], "b");
>
> Sensible.
>
>> @@ -95,10 +99,21 @@ void test_reftable_basics__parse_names(void)
>> free_names(out);
>> }
>>
>> +void test_reftable_basics__parse_names_missing_newline(void)
>> +{
>> + char in1[] = "line\nline2";
>> + int err = 0;
>> + char **out = parse_names(in1, strlen(in1), &err);
>> + cl_assert(err == REFTABLE_FORMAT_ERROR);
>> + cl_assert(out == NULL);
>> +}
>
> OK.
>
>> void test_reftable_basics__parse_names_drop_empty_string(void)
>> {
>> char in[] = "a\n\nb\n";
>> - char **out = parse_names(in, strlen(in));
>> + int err = 0;
>> + char **out = parse_names(in, strlen(in), &err);
>> + cl_assert(err == 0);
>
> I'll drop an extra SP after == here (no need to resend only to fix
> this).
>
Ah! thanks for doing that. I'll patch it locally incase I need to
reroll!
Karthik
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v3 3/8] reftable: check for trailing newline in 'tables.list'
2025-09-18 8:11 ` [PATCH v3 3/8] reftable: check for trailing newline in 'tables.list' Karthik Nayak
2025-09-18 15:36 ` Junio C Hamano
@ 2025-09-24 5:54 ` Patrick Steinhardt
2025-09-24 10:02 ` Karthik Nayak
2025-09-24 7:24 ` Kristoffer Haugsbakk
2 siblings, 1 reply; 96+ messages in thread
From: Patrick Steinhardt @ 2025-09-24 5:54 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, gitster, shejialuo
On Thu, Sep 18, 2025 at 10:11:44AM +0200, Karthik Nayak wrote:
> diff --git a/reftable/basics.c b/reftable/basics.c
> index 9988ebd635..75d4086769 100644
> --- a/reftable/basics.c
> +++ b/reftable/basics.c
> @@ -195,7 +195,7 @@ size_t names_length(const char **names)
> return p - names;
> }
>
> -char **parse_names(char *buf, int size)
> +char **parse_names(char *buf, int size, int *err)
> {
> char **names = NULL;
> size_t names_cap = 0;
Nit: Wouldn't it be more natural to return an `int` and assign the
result to an out-pointer?
> @@ -205,30 +205,40 @@ char **parse_names(char *buf, int size)
>
> while (p < end) {
> char *next = strchr(p, '\n');
Not a new issue, but it's kind of broken that we use strchr(3p) here. We
really should be using `memchr(p, '\n', size - (end - p))` as the user
provides the size to us. And the provided size should be `size_t`.
> - if (next && next < end) {
> + if (!next) {
> + *err = REFTABLE_FORMAT_ERROR;
> + goto done;
> + } else if (next < end) {
> *next = 0;
Can we maybe convert this line to `*next = '\0'` while at it? It made my
reading hiccup a bit.
Patrick
^ permalink raw reply [flat|nested] 96+ messages in thread* Re: [PATCH v3 3/8] reftable: check for trailing newline in 'tables.list'
2025-09-24 5:54 ` Patrick Steinhardt
@ 2025-09-24 10:02 ` Karthik Nayak
0 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-09-24 10:02 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, gitster, shejialuo
[-- Attachment #1: Type: text/plain, Size: 1550 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
> On Thu, Sep 18, 2025 at 10:11:44AM +0200, Karthik Nayak wrote:
>> diff --git a/reftable/basics.c b/reftable/basics.c
>> index 9988ebd635..75d4086769 100644
>> --- a/reftable/basics.c
>> +++ b/reftable/basics.c
>> @@ -195,7 +195,7 @@ size_t names_length(const char **names)
>> return p - names;
>> }
>>
>> -char **parse_names(char *buf, int size)
>> +char **parse_names(char *buf, int size, int *err)
>> {
>> char **names = NULL;
>> size_t names_cap = 0;
>
> Nit: Wouldn't it be more natural to return an `int` and assign the
> result to an out-pointer?
>
I thought about that too, I couldn't find enough consistency or reason to
warrant one over the other. So I picked the one with the least change.
Let me change it.
>> @@ -205,30 +205,40 @@ char **parse_names(char *buf, int size)
>>
>> while (p < end) {
>> char *next = strchr(p, '\n');
>
> Not a new issue, but it's kind of broken that we use strchr(3p) here. We
> really should be using `memchr(p, '\n', size - (end - p))` as the user
> provides the size to us. And the provided size should be `size_t`.
>
I think that's fair. But I'll avoid making this change now, I've already
added a few commits which are mostly tangential.
>> - if (next && next < end) {
>> + if (!next) {
>> + *err = REFTABLE_FORMAT_ERROR;
>> + goto done;
>> + } else if (next < end) {
>> *next = 0;
>
> Can we maybe convert this line to `*next = '\0'` while at it? It made my
> reading hiccup a bit.
>
Yeah, I could definitely add this in.
> Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v3 3/8] reftable: check for trailing newline in 'tables.list'
2025-09-18 8:11 ` [PATCH v3 3/8] reftable: check for trailing newline in 'tables.list' Karthik Nayak
2025-09-18 15:36 ` Junio C Hamano
2025-09-24 5:54 ` Patrick Steinhardt
@ 2025-09-24 7:24 ` Kristoffer Haugsbakk
2025-09-24 11:06 ` Karthik Nayak
2 siblings, 1 reply; 96+ messages in thread
From: Kristoffer Haugsbakk @ 2025-09-24 7:24 UTC (permalink / raw)
To: Karthik Nayak, git; +Cc: Patrick Steinhardt, Junio C Hamano, shejialuo
On Thu, Sep 18, 2025, at 10:11, Karthik Nayak wrote:
> In the reftable format, the 'tables.list' file contains a newline
> separated list of tables. While we parse this file, we do not check or
> care about trailing newlines. Tighten the parser in `parse_names()` to
> return an appropriate error if there is no trailing newline.
Nit:[1] newline-separated + requiring a trailing newline sounds like it
really equals: newline-terminated list. Is this moving from
effectively using newline-separated to a newline-terminated format?
† 1: Since others have commented anyway
>
> This requires modification to `parse_names()` to accept a third argument
> which will hold the error value.
>
> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
> ---
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v3 3/8] reftable: check for trailing newline in 'tables.list'
2025-09-24 7:24 ` Kristoffer Haugsbakk
@ 2025-09-24 11:06 ` Karthik Nayak
0 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-09-24 11:06 UTC (permalink / raw)
To: Kristoffer Haugsbakk, git; +Cc: Patrick Steinhardt, Junio C Hamano, shejialuo
[-- Attachment #1: Type: text/plain, Size: 1070 bytes --]
"Kristoffer Haugsbakk" <kristofferhaugsbakk@fastmail.com> writes:
> On Thu, Sep 18, 2025, at 10:11, Karthik Nayak wrote:
>> In the reftable format, the 'tables.list' file contains a newline
>> separated list of tables. While we parse this file, we do not check or
>> care about trailing newlines. Tighten the parser in `parse_names()` to
>> return an appropriate error if there is no trailing newline.
>
> Nit:[1] newline-separated + requiring a trailing newline sounds like it
> really equals: newline-terminated list. Is this moving from
> effectively using newline-separated to a newline-terminated format?
>
> † 1: Since others have commented anyway
>
I see the confusion, it is a newline-separated list, but we don't
check/care for the last newline. We don't require a separate terminating
newline. Let me amend the commit message to make this clearer.
>>
>> This requires modification to `parse_names()` to accept a third argument
>> which will hold the error value.
>>
>> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
>> ---
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH v3 4/8] reftable: ensure tables in a stack use sequential update indices
2025-09-18 8:11 ` [PATCH v3 0/8] refs/reftable: add consistency checks Karthik Nayak
` (2 preceding siblings ...)
2025-09-18 8:11 ` [PATCH v3 3/8] reftable: check for trailing newline in 'tables.list' Karthik Nayak
@ 2025-09-18 8:11 ` Karthik Nayak
2025-09-24 5:54 ` Patrick Steinhardt
2025-09-18 8:11 ` [PATCH v3 5/8] Documentation/fsck-msgids: remove duplicate msg id Karthik Nayak
` (3 subsequent siblings)
7 siblings, 1 reply; 96+ messages in thread
From: Karthik Nayak @ 2025-09-18 8:11 UTC (permalink / raw)
To: git; +Cc: ps, gitster, shejialuo, Karthik Nayak
When tables are loaded into a stack, we expect that the tables are
sequentially ordered by their update indices. But there is no validation
done for this. Add validation to ensure that tables loaded are
sequential.
Raise a 'REFTABLE_FORMAT_ERROR' when this validation fails.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
reftable/stack.c | 9 +++++++++
t/unit-tests/u-reftable-stack.c | 28 ++++++++++++++++++++++++++++
2 files changed, 37 insertions(+)
diff --git a/reftable/stack.c b/reftable/stack.c
index 955be1edb6..a458f5a4c5 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -238,6 +238,7 @@ static int reftable_stack_reload_once(struct reftable_stack *st,
int reuse_open)
{
size_t cur_len = !st->merged ? 0 : st->merged->tables_len;
+ const struct reftable_table *prev_table = NULL;
struct reftable_table **cur = NULL;
struct reftable_table **reused = NULL;
struct reftable_table **new_tables = NULL;
@@ -317,6 +318,14 @@ static int reftable_stack_reload_once(struct reftable_stack *st,
new_tables[new_tables_len] = table;
new_tables_len++;
+
+ /* table's update indices must be sequential */
+ if (prev_table && (prev_table->max_update_index != table->min_update_index - 1)) {
+ err = REFTABLE_FORMAT_ERROR;
+ goto done;
+ }
+
+ prev_table = table;
}
/* success! */
diff --git a/t/unit-tests/u-reftable-stack.c b/t/unit-tests/u-reftable-stack.c
index a8b91812e8..465f4a2689 100644
--- a/t/unit-tests/u-reftable-stack.c
+++ b/t/unit-tests/u-reftable-stack.c
@@ -1330,3 +1330,31 @@ void test_reftable_stack__invalid_limit_updates(void)
reftable_stack_destroy(st);
clear_dir(dir);
}
+
+void test_reftable_stack__non_seq_update_indices(void)
+{
+ struct reftable_write_options opts = { 0 };
+ struct reftable_stack *st1 = NULL;
+ char *dir = get_tmp_dir(__LINE__);
+
+ struct reftable_ref_record ref1 = {
+ .refname = (char *)"HEAD",
+ .update_index = 1,
+ .value_type = REFTABLE_REF_SYMREF,
+ .value.symref = (char *)"master",
+ };
+ struct reftable_ref_record ref2 = {
+ .refname = (char *)"branch2",
+ .update_index = 3,
+ .value_type = REFTABLE_REF_SYMREF,
+ .value.symref = (char *)"master",
+ };
+
+ cl_assert_equal_i(reftable_new_stack(&st1, dir, &opts), 0);
+ cl_assert_equal_i(reftable_stack_add(st1, write_test_ref, &ref1, 0), 0);
+ cl_assert_equal_i(reftable_stack_add(st1, write_test_ref, &ref2, 0),
+ REFTABLE_FORMAT_ERROR);
+
+ reftable_stack_destroy(st1);
+ clear_dir(dir);
+}
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* Re: [PATCH v3 4/8] reftable: ensure tables in a stack use sequential update indices
2025-09-18 8:11 ` [PATCH v3 4/8] reftable: ensure tables in a stack use sequential update indices Karthik Nayak
@ 2025-09-24 5:54 ` Patrick Steinhardt
2025-09-24 11:20 ` Karthik Nayak
0 siblings, 1 reply; 96+ messages in thread
From: Patrick Steinhardt @ 2025-09-24 5:54 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, gitster, shejialuo
On Thu, Sep 18, 2025 at 10:11:45AM +0200, Karthik Nayak wrote:
> diff --git a/reftable/stack.c b/reftable/stack.c
> index 955be1edb6..a458f5a4c5 100644
> --- a/reftable/stack.c
> +++ b/reftable/stack.c
> @@ -317,6 +318,14 @@ static int reftable_stack_reload_once(struct reftable_stack *st,
>
> new_tables[new_tables_len] = table;
> new_tables_len++;
> +
> + /* table's update indices must be sequential */
Let's make this a full sentence starting with an upper-case letter and a
period.
> + if (prev_table && (prev_table->max_update_index != table->min_update_index - 1)) {
I wonder whether this check is too strict. It _must_ be true that the
new table's minimum update index is greater than the previous table's
maximum update index. But in theory, there is no reason why there cannot
be a gap between those.
The reason why this makes me a bit uneasy is stack compaction. Say we
have three different tables:
- A base table with record r1 with update index 1.
- A second table with record r2 with update index 2.
- A third table with a deletion record d(r2) and a new record r3 with
update index 3.
Now if we compact the second and the third table, the compaction will
realize that r2 is deleted and thus no longer needs to be part of the
compacted table. So the new state is:
- A base table with record r1 and update index r1.
- The compacted table with record r3 with update index 3.
I'm not too certain how the minimum update index of that second table
would be encoded in the header. In theory, both minimum and maximum
update index of that table could truthfully be 3, and the result would
still be both valid and sensible. The new check you introduce would
trigger though, as there now is a gap between those two tables.
So I think we should loosen that condition to ensure that we have proper
ordering of update indices, but not a gapless order.
Patrick
^ permalink raw reply [flat|nested] 96+ messages in thread* Re: [PATCH v3 4/8] reftable: ensure tables in a stack use sequential update indices
2025-09-24 5:54 ` Patrick Steinhardt
@ 2025-09-24 11:20 ` Karthik Nayak
2025-09-24 18:04 ` Junio C Hamano
0 siblings, 1 reply; 96+ messages in thread
From: Karthik Nayak @ 2025-09-24 11:20 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, gitster, shejialuo
[-- Attachment #1: Type: text/plain, Size: 2940 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
> On Thu, Sep 18, 2025 at 10:11:45AM +0200, Karthik Nayak wrote:
>> diff --git a/reftable/stack.c b/reftable/stack.c
>> index 955be1edb6..a458f5a4c5 100644
>> --- a/reftable/stack.c
>> +++ b/reftable/stack.c
>> @@ -317,6 +318,14 @@ static int reftable_stack_reload_once(struct reftable_stack *st,
>>
>> new_tables[new_tables_len] = table;
>> new_tables_len++;
>> +
>> + /* table's update indices must be sequential */
>
> Let's make this a full sentence starting with an upper-case letter and a
> period.
>
>> + if (prev_table && (prev_table->max_update_index != table->min_update_index - 1)) {
>
> I wonder whether this check is too strict. It _must_ be true that the
> new table's minimum update index is greater than the previous table's
> maximum update index. But in theory, there is no reason why there cannot
> be a gap between those.
>
> The reason why this makes me a bit uneasy is stack compaction. Say we
> have three different tables:
>
> - A base table with record r1 with update index 1.
> - A second table with record r2 with update index 2.
> - A third table with a deletion record d(r2) and a new record r3 with
> update index 3.
>
> Now if we compact the second and the third table, the compaction will
> realize that r2 is deleted and thus no longer needs to be part of the
> compacted table. So the new state is:
>
> - A base table with record r1 and update index r1.
> - The compacted table with record r3 with update index 3.
>
That's a good counter example. I didn't know this was possible with the
reftable format. From 'reftable/stack.c: stack_compact_locked()', we
use the min,max index from the first, last table being compacted for the
table name.
err = format_name(&next_name,
reftable_table_min_update_index(st->tables[first]),
reftable_table_max_update_index(st->tables[last]));
we also set the writer's limit in 'reftable/stack.c:
stack_write_compact()' similarly, which sets the min,max index for the
writer:
err = reftable_writer_set_limits(wr, st->tables[first]->min_update_index,
st->tables[last]->max_update_index);
> I'm not too certain how the minimum update index of that second table
> would be encoded in the header. In theory, both minimum and maximum
> update index of that table could truthfully be 3, and the result would
> still be both valid and sensible. The new check you introduce would
> trigger though, as there now is a gap between those two tables.
>
> So I think we should loosen that condition to ensure that we have proper
> ordering of update indices, but not a gapless order.
>
> Patrick
So currently it does seem like our implementation, still uses the first
and last table's indices to set the min,max index of the new table.
However, I think your point holds. I do think eventually we could
optimize this to ensure that we do something like you described.
I will make changes accordingly.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 96+ messages in thread* Re: [PATCH v3 4/8] reftable: ensure tables in a stack use sequential update indices
2025-09-24 11:20 ` Karthik Nayak
@ 2025-09-24 18:04 ` Junio C Hamano
2025-09-24 20:13 ` Karthik Nayak
0 siblings, 1 reply; 96+ messages in thread
From: Junio C Hamano @ 2025-09-24 18:04 UTC (permalink / raw)
To: Karthik Nayak; +Cc: Patrick Steinhardt, git, shejialuo
Karthik Nayak <karthik.188@gmail.com> writes:
> Patrick Steinhardt <ps@pks.im> writes:
>
>> On Thu, Sep 18, 2025 at 10:11:45AM +0200, Karthik Nayak wrote:
>>> diff --git a/reftable/stack.c b/reftable/stack.c
>>> index 955be1edb6..a458f5a4c5 100644
>>> --- a/reftable/stack.c
>>> +++ b/reftable/stack.c
>>> @@ -317,6 +318,14 @@ static int reftable_stack_reload_once(struct reftable_stack *st,
>>>
>>> new_tables[new_tables_len] = table;
>>> new_tables_len++;
>>> +
>>> + /* table's update indices must be sequential */
>>
>> Let's make this a full sentence starting with an upper-case letter and a
>> period.
>>
>>> + if (prev_table && (prev_table->max_update_index != table->min_update_index - 1)) {
>>
>> I wonder whether this check is too strict. It _must_ be true that the
>> new table's minimum update index is greater than the previous table's
>> maximum update index. But in theory, there is no reason why there cannot
>> be a gap between those.
>>
>> The reason why this makes me a bit uneasy is stack compaction. Say we
>> have three different tables:
>>
>> - A base table with record r1 with update index 1.
>> - A second table with record r2 with update index 2.
>> - A third table with a deletion record d(r2) and a new record r3 with
>> update index 3.
>>
>> Now if we compact the second and the third table, the compaction will
>> realize that r2 is deleted and thus no longer needs to be part of the
>> compacted table. So the new state is:
>>
>> - A base table with record r1 and update index r1.
>> - The compacted table with record r3 with update index 3.
> ...
> However, I think your point holds. I do think eventually we could
> optimize this to ensure that we do something like you described.
>
> I will make changes accordingly.
If you allow gaps in the indices, it is a bit confusing to call them
"sequential"; "monotonically increasing" is less confusing and it
conveys the author's intention to allow gaps clear (otherwise the
author wouldn't be using such an awkward two-word phrase instead of
"sequencial").
^ permalink raw reply [flat|nested] 96+ messages in thread* Re: [PATCH v3 4/8] reftable: ensure tables in a stack use sequential update indices
2025-09-24 18:04 ` Junio C Hamano
@ 2025-09-24 20:13 ` Karthik Nayak
2025-09-25 6:12 ` Patrick Steinhardt
0 siblings, 1 reply; 96+ messages in thread
From: Karthik Nayak @ 2025-09-24 20:13 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Patrick Steinhardt, git, shejialuo
[-- Attachment #1: Type: text/plain, Size: 1030 bytes --]
Junio C Hamano <gitster@pobox.com> writes:
>>> Now if we compact the second and the third table, the compaction will
>>> realize that r2 is deleted and thus no longer needs to be part of the
>>> compacted table. So the new state is:
>>>
>>> - A base table with record r1 and update index r1.
>>> - The compacted table with record r3 with update index 3.
>> ...
>> However, I think your point holds. I do think eventually we could
>> optimize this to ensure that we do something like you described.
>>
>> I will make changes accordingly.
>
> If you allow gaps in the indices, it is a bit confusing to call them
> "sequential"; "monotonically increasing" is less confusing and it
> conveys the author's intention to allow gaps clear (otherwise the
> author wouldn't be using such an awkward two-word phrase instead of
> "sequencial").
Wouldn't 'monotonically increasing' suggest that
prev_table.max_update_index can be equal to cur_table.min_update_index?
I have locally changed it to 'ascending order' for similar reasons.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v3 4/8] reftable: ensure tables in a stack use sequential update indices
2025-09-24 20:13 ` Karthik Nayak
@ 2025-09-25 6:12 ` Patrick Steinhardt
2025-09-25 16:22 ` Junio C Hamano
0 siblings, 1 reply; 96+ messages in thread
From: Patrick Steinhardt @ 2025-09-25 6:12 UTC (permalink / raw)
To: Karthik Nayak; +Cc: Junio C Hamano, git, shejialuo
On Wed, Sep 24, 2025 at 01:13:51PM -0700, Karthik Nayak wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>
> >>> Now if we compact the second and the third table, the compaction will
> >>> realize that r2 is deleted and thus no longer needs to be part of the
> >>> compacted table. So the new state is:
> >>>
> >>> - A base table with record r1 and update index r1.
> >>> - The compacted table with record r3 with update index 3.
> >> ...
> >> However, I think your point holds. I do think eventually we could
> >> optimize this to ensure that we do something like you described.
> >>
> >> I will make changes accordingly.
> >
> > If you allow gaps in the indices, it is a bit confusing to call them
> > "sequential"; "monotonically increasing" is less confusing and it
> > conveys the author's intention to allow gaps clear (otherwise the
> > author wouldn't be using such an awkward two-word phrase instead of
> > "sequencial").
>
> Wouldn't 'monotonically increasing' suggest that
> prev_table.max_update_index can be equal to cur_table.min_update_index?
> I have locally changed it to 'ascending order' for similar reasons.
I guess the correct phrase here is "strictly monotonically increasing".
Patrick
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v3 4/8] reftable: ensure tables in a stack use sequential update indices
2025-09-25 6:12 ` Patrick Steinhardt
@ 2025-09-25 16:22 ` Junio C Hamano
0 siblings, 0 replies; 96+ messages in thread
From: Junio C Hamano @ 2025-09-25 16:22 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: Karthik Nayak, git, shejialuo
Patrick Steinhardt <ps@pks.im> writes:
>> Wouldn't 'monotonically increasing' suggest that
>> prev_table.max_update_index can be equal to cur_table.min_update_index?
>> I have locally changed it to 'ascending order' for similar reasons.
>
> I guess the correct phrase here is "strictly monotonically increasing".
Both of you are right and I was wrong. Thanks for a clarification.
^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH v3 5/8] Documentation/fsck-msgids: remove duplicate msg id
2025-09-18 8:11 ` [PATCH v3 0/8] refs/reftable: add consistency checks Karthik Nayak
` (3 preceding siblings ...)
2025-09-18 8:11 ` [PATCH v3 4/8] reftable: ensure tables in a stack use sequential update indices Karthik Nayak
@ 2025-09-18 8:11 ` Karthik Nayak
2025-09-18 8:11 ` [PATCH v3 6/8] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
` (2 subsequent siblings)
7 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-09-18 8:11 UTC (permalink / raw)
To: git; +Cc: ps, gitster, shejialuo, Karthik Nayak
The `gitmodulesLarge` is repeated twice. Remove the second duplicate.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Documentation/fsck-msgids.adoc | 3 ---
1 file changed, 3 deletions(-)
diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 0ba4f9a27e..1c912615f9 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -104,9 +104,6 @@
`gitmodulesParse`::
(INFO) Could not parse `.gitmodules` blob.
-`gitmodulesLarge`;
- (ERROR) `.gitmodules` blob is too large to parse.
-
`gitmodulesPath`::
(ERROR) `.gitmodules` path is invalid.
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* [PATCH v3 6/8] fsck: order 'fsck_msg_type' alphabetically
2025-09-18 8:11 ` [PATCH v3 0/8] refs/reftable: add consistency checks Karthik Nayak
` (4 preceding siblings ...)
2025-09-18 8:11 ` [PATCH v3 5/8] Documentation/fsck-msgids: remove duplicate msg id Karthik Nayak
@ 2025-09-18 8:11 ` Karthik Nayak
2025-09-18 8:11 ` [PATCH v3 7/8] reftable: add code to facilitate consistency checks Karthik Nayak
2025-09-18 8:11 ` [PATCH v3 8/8] refs/reftable: add fsck check for checking the table name Karthik Nayak
7 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-09-18 8:11 UTC (permalink / raw)
To: git; +Cc: ps, gitster, shejialuo, Karthik Nayak
The list of 'fsck_msg_type' seem to be alphabetically ordered, but there
are a few small misses. Fix this by sorting the sub-sections of the
list to maintain alphabetical ordering.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
fsck.h | 38 +++++++++++++++++++-------------------
1 file changed, 19 insertions(+), 19 deletions(-)
diff --git a/fsck.h b/fsck.h
index dd7df3d5b3..6b0db235e0 100644
--- a/fsck.h
+++ b/fsck.h
@@ -33,15 +33,27 @@ enum fsck_msg_type {
FUNC(BAD_PACKED_REF_ENTRY, ERROR) \
FUNC(BAD_PACKED_REF_HEADER, ERROR) \
FUNC(BAD_PARENT_SHA1, ERROR) \
+ FUNC(BAD_REFERENT_NAME, ERROR) \
FUNC(BAD_REF_CONTENT, ERROR) \
FUNC(BAD_REF_FILETYPE, ERROR) \
FUNC(BAD_REF_NAME, ERROR) \
- FUNC(BAD_REFERENT_NAME, ERROR) \
FUNC(BAD_TIMEZONE, ERROR) \
FUNC(BAD_TREE, ERROR) \
FUNC(BAD_TREE_SHA1, ERROR) \
FUNC(BAD_TYPE, ERROR) \
FUNC(DUPLICATE_ENTRIES, ERROR) \
+ FUNC(GITATTRIBUTES_BLOB, ERROR) \
+ FUNC(GITATTRIBUTES_LARGE, ERROR) \
+ FUNC(GITATTRIBUTES_LINE_LENGTH, ERROR) \
+ FUNC(GITATTRIBUTES_MISSING, ERROR) \
+ FUNC(GITMODULES_BLOB, ERROR) \
+ FUNC(GITMODULES_LARGE, ERROR) \
+ FUNC(GITMODULES_MISSING, ERROR) \
+ FUNC(GITMODULES_NAME, ERROR) \
+ FUNC(GITMODULES_PATH, ERROR) \
+ FUNC(GITMODULES_SYMLINK, ERROR) \
+ FUNC(GITMODULES_UPDATE, ERROR) \
+ FUNC(GITMODULES_URL, ERROR) \
FUNC(MISSING_AUTHOR, ERROR) \
FUNC(MISSING_COMMITTER, ERROR) \
FUNC(MISSING_EMAIL, ERROR) \
@@ -60,39 +72,27 @@ enum fsck_msg_type {
FUNC(TREE_NOT_SORTED, ERROR) \
FUNC(UNKNOWN_TYPE, ERROR) \
FUNC(ZERO_PADDED_DATE, ERROR) \
- FUNC(GITMODULES_MISSING, ERROR) \
- FUNC(GITMODULES_BLOB, ERROR) \
- FUNC(GITMODULES_LARGE, ERROR) \
- FUNC(GITMODULES_NAME, ERROR) \
- FUNC(GITMODULES_SYMLINK, ERROR) \
- FUNC(GITMODULES_URL, ERROR) \
- FUNC(GITMODULES_PATH, ERROR) \
- FUNC(GITMODULES_UPDATE, ERROR) \
- FUNC(GITATTRIBUTES_MISSING, ERROR) \
- FUNC(GITATTRIBUTES_LARGE, ERROR) \
- FUNC(GITATTRIBUTES_LINE_LENGTH, ERROR) \
- FUNC(GITATTRIBUTES_BLOB, ERROR) \
/* warnings */ \
FUNC(EMPTY_NAME, WARN) \
FUNC(FULL_PATHNAME, WARN) \
FUNC(HAS_DOT, WARN) \
FUNC(HAS_DOTDOT, WARN) \
FUNC(HAS_DOTGIT, WARN) \
+ FUNC(LARGE_PATHNAME, WARN) \
FUNC(NULL_SHA1, WARN) \
- FUNC(ZERO_PADDED_FILEMODE, WARN) \
FUNC(NUL_IN_COMMIT, WARN) \
- FUNC(LARGE_PATHNAME, WARN) \
+ FUNC(ZERO_PADDED_FILEMODE, WARN) \
/* infos (reported as warnings, but ignored by default) */ \
FUNC(BAD_FILEMODE, INFO) \
+ FUNC(BAD_TAG_NAME, INFO) \
FUNC(EMPTY_PACKED_REFS_FILE, INFO) \
- FUNC(GITMODULES_PARSE, INFO) \
- FUNC(GITIGNORE_SYMLINK, INFO) \
FUNC(GITATTRIBUTES_SYMLINK, INFO) \
+ FUNC(GITIGNORE_SYMLINK, INFO) \
+ FUNC(GITMODULES_PARSE, INFO) \
FUNC(MAILMAP_SYMLINK, INFO) \
- FUNC(BAD_TAG_NAME, INFO) \
FUNC(MISSING_TAGGER_ENTRY, INFO) \
- FUNC(SYMLINK_REF, INFO) \
FUNC(REF_MISSING_NEWLINE, INFO) \
+ FUNC(SYMLINK_REF, INFO) \
FUNC(SYMREF_TARGET_IS_NOT_A_REF, INFO) \
FUNC(TRAILING_REF_CONTENT, INFO) \
/* ignored (elevated when requested) */ \
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* [PATCH v3 7/8] reftable: add code to facilitate consistency checks
2025-09-18 8:11 ` [PATCH v3 0/8] refs/reftable: add consistency checks Karthik Nayak
` (5 preceding siblings ...)
2025-09-18 8:11 ` [PATCH v3 6/8] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
@ 2025-09-18 8:11 ` Karthik Nayak
2025-09-24 5:54 ` Patrick Steinhardt
2025-09-18 8:11 ` [PATCH v3 8/8] refs/reftable: add fsck check for checking the table name Karthik Nayak
7 siblings, 1 reply; 96+ messages in thread
From: Karthik Nayak @ 2025-09-18 8:11 UTC (permalink / raw)
To: git; +Cc: ps, gitster, shejialuo, Karthik Nayak
The `git refs verify` command is used to run consistency checks on the
reference backends. This command is also invoked when users run 'git
fsck'. While the files-backend has some fsck checks added, the reftable
backend lacks such checks. Let's add the required infrastructure and a
check to test for the files present in the reftable directory.
Since the reftable library is treated as an independent library we
should ensure that the library code works independently without
knowledge about Git's internals. To do this, add both 'reftable/fsck.c'
and 'reftable/reftable-fsck.h'. Which provide an entry point
'reftable_fsck_check' for running fsck checks over a provided reftable
stack. The callee provides the function with callbacks to handle issue
and information reporting.
The added check, goes over all files in the reftable directory and
validates that they have the expected file type and a valid name. It
raises specific errors for both.
While here, move 'reftable/error.o' in the Makefile to retain
lexicographic ordering.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Makefile | 3 +-
meson.build | 1 +
reftable/fsck.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++
reftable/reftable-fsck.h | 42 ++++++++++++++++++
4 files changed, 157 insertions(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index 4c95affadb..03fbaf2b21 100644
--- a/Makefile
+++ b/Makefile
@@ -2732,9 +2732,10 @@ XDIFF_OBJS += xdiff/xutils.o
xdiff-objs: $(XDIFF_OBJS)
REFTABLE_OBJS += reftable/basics.o
-REFTABLE_OBJS += reftable/error.o
REFTABLE_OBJS += reftable/block.o
REFTABLE_OBJS += reftable/blocksource.o
+REFTABLE_OBJS += reftable/error.o
+REFTABLE_OBJS += reftable/fsck.o
REFTABLE_OBJS += reftable/iter.o
REFTABLE_OBJS += reftable/merged.o
REFTABLE_OBJS += reftable/pq.o
diff --git a/meson.build b/meson.build
index b3dfcc0497..8914252910 100644
--- a/meson.build
+++ b/meson.build
@@ -452,6 +452,7 @@ libgit_sources = [
'reftable/error.c',
'reftable/block.c',
'reftable/blocksource.c',
+ 'reftable/fsck.c',
'reftable/iter.c',
'reftable/merged.c',
'reftable/pq.c',
diff --git a/reftable/fsck.c b/reftable/fsck.c
new file mode 100644
index 0000000000..785e4b43e8
--- /dev/null
+++ b/reftable/fsck.c
@@ -0,0 +1,112 @@
+#include "basics.h"
+#include "reftable-fsck.h"
+#include "stack.h"
+
+static bool valid_table_name(const char *name, uint64_t *min_update_index,
+ uint64_t *max_update_index)
+{
+ const char *ptr = name;
+ char *endptr;
+
+ /* strtoull doesn't set errno on success */
+ errno = 0;
+
+ *min_update_index = strtoull(ptr, &endptr, 16);
+ if (errno == EINVAL)
+ return false;
+ ptr = endptr;
+
+ if (strncmp(ptr, "-", 1))
+ return false;
+ ptr++;
+
+ *max_update_index = strtoull(ptr, &endptr, 16);
+ if (errno == EINVAL)
+ return false;
+ ptr = endptr;
+
+ if (*ptr != '-')
+ return false;
+ ptr++;
+
+ strtoul(ptr, &endptr, 16);
+ if (errno == EINVAL)
+ return false;
+ ptr = endptr;
+
+ if (strcmp(ptr, ".ref") && strcmp(ptr, ".log"))
+ return false;
+
+ return true;
+}
+
+static int stack_check_all_files_in_dir(struct reftable_stack *stack,
+ reftable_fsck_report_fn report_fn,
+ void *cb_data)
+{
+ DIR *dir = opendir(stack->reftable_dir);
+ struct reftable_fsck_info info;
+ struct dirent *d = NULL;
+ uint64_t min, max;
+ int err = 0;
+
+ if (!dir)
+ return 0;
+
+ while ((d = readdir(dir))) {
+ if (!strcmp(d->d_name, "tables.list"))
+ continue;
+
+ if ((d->d_name[0] == '.' &&
+ (d->d_name[1] == '\0' ||
+ (d->d_name[1] == '.' && d->d_name[2] == '\0'))))
+ continue;
+
+ if (d->d_type == DT_REG) {
+ if (!valid_table_name(d->d_name, &min, &max)) {
+ info.error = REFTABLE_FSCK_ERROR_TABLE_NAME;
+ info.msg = "file with invalid table name";
+ info.path = d->d_name;
+
+ err |= report_fn(&info, cb_data);
+ }
+ } else {
+ info.error = REFTABLE_FSCK_ERROR_INVALID_FILE_TYPE;
+ info.msg = "file with unexpected type";
+ info.path = d->d_name;
+
+ err |= report_fn(&info, cb_data);
+ }
+ }
+
+ closedir(dir);
+ return err;
+}
+
+static int stack_checks(struct reftable_stack *stack,
+ reftable_fsck_report_fn report_fn,
+ void *cb_data)
+{
+ struct reftable_buf msg = REFTABLE_BUF_INIT;
+ char **names = NULL;
+ int err = 0;
+
+ if (stack == NULL)
+ goto out;
+
+ err |= stack_check_all_files_in_dir(stack, report_fn, cb_data);
+
+out:
+ free_names(names);
+ reftable_buf_release(&msg);
+ return err;
+}
+
+int reftable_fsck_check(struct reftable_stack *stack,
+ reftable_fsck_report_fn report_fn,
+ reftable_fsck_verbose_fn verbose_fn,
+ void *cb_data)
+{
+ verbose_fn("Checking reftable: stack checks", cb_data);
+ return stack_checks(stack, report_fn, cb_data);
+}
diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
new file mode 100644
index 0000000000..5e13ac9f02
--- /dev/null
+++ b/reftable/reftable-fsck.h
@@ -0,0 +1,42 @@
+#ifndef REFTABLE_FSCK_H
+#define REFTABLE_FSCK_H
+
+#include "reftable-stack.h"
+
+enum reftable_fsck_error {
+ /* Non regular file in the reftable directory */
+ REFTABLE_FSCK_ERROR_INVALID_FILE_TYPE = 0,
+ /* Invalid table name */
+ REFTABLE_FSCK_ERROR_TABLE_NAME,
+ /* Used for bounds checking, must be last */
+ REFTABLE_FSCK_MAX_VALUE
+};
+
+/* Represents an individual error encountered during the FSCK checks. */
+struct reftable_fsck_info {
+ enum reftable_fsck_error error;
+ const char *msg;
+ const char *path;
+};
+
+typedef int reftable_fsck_report_fn(struct reftable_fsck_info *info,
+ void *cb_data);
+typedef void reftable_fsck_verbose_fn(const char *msg, void *cb_data);
+
+/*
+ * Given a reftable stack, perform consistency checks on the stack.
+ *
+ * If an issue is encountered, the issue is reported to the callee via the
+ * provided 'report_fn'. If the issue is non-recoverable the flow will not
+ * continue. If it is recoverable, the flow will continue and further issues
+ * will be reported as identified.
+ *
+ * The 'verbose_fn' will be invoked to provide verbose information about
+ * the progress and state of the consistency checks.
+ */
+int reftable_fsck_check(struct reftable_stack *stack,
+ reftable_fsck_report_fn report_fn,
+ reftable_fsck_verbose_fn verbose_fn,
+ void *cb_data);
+
+#endif /* REFTABLE_FSCK_H */
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* Re: [PATCH v3 7/8] reftable: add code to facilitate consistency checks
2025-09-18 8:11 ` [PATCH v3 7/8] reftable: add code to facilitate consistency checks Karthik Nayak
@ 2025-09-24 5:54 ` Patrick Steinhardt
2025-09-24 18:40 ` Karthik Nayak
0 siblings, 1 reply; 96+ messages in thread
From: Patrick Steinhardt @ 2025-09-24 5:54 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, gitster, shejialuo
On Thu, Sep 18, 2025 at 10:11:48AM +0200, Karthik Nayak wrote:
> diff --git a/reftable/fsck.c b/reftable/fsck.c
> new file mode 100644
> index 0000000000..785e4b43e8
> --- /dev/null
> +++ b/reftable/fsck.c
> @@ -0,0 +1,112 @@
> +#include "basics.h"
> +#include "reftable-fsck.h"
> +#include "stack.h"
> +
> +static bool valid_table_name(const char *name, uint64_t *min_update_index,
> + uint64_t *max_update_index)
> +{
> + const char *ptr = name;
> + char *endptr;
> +
> + /* strtoull doesn't set errno on success */
> + errno = 0;
> +
> + *min_update_index = strtoull(ptr, &endptr, 16);
> + if (errno == EINVAL)
> + return false;
strtoull may also return ERANGE. In general, shouldn't we abort whenever
errno is non-zero here?
> + ptr = endptr;
> +
> + if (strncmp(ptr, "-", 1))
> + return false;
Better:
if (*ptr != '-')
return false;
> + ptr++;
> +
> + *max_update_index = strtoull(ptr, &endptr, 16);
> + if (errno == EINVAL)
> + return false;
> + ptr = endptr;
> +
> + if (*ptr != '-')
> + return false;
> + ptr++;
> +
> + strtoul(ptr, &endptr, 16);
> + if (errno == EINVAL)
> + return false;
> + ptr = endptr;
> +
> + if (strcmp(ptr, ".ref") && strcmp(ptr, ".log"))
> + return false;
Yup, makes sense. We don't do so ourselves, but in theory it is possible
for tables to have a ".log" suffix. If so, they are expected to only
contain reflog records.
> + return true;
> +}
> +
> +static int stack_check_all_files_in_dir(struct reftable_stack *stack,
> + reftable_fsck_report_fn report_fn,
> + void *cb_data)
> +{
> + DIR *dir = opendir(stack->reftable_dir);
I think it would make sense to move this function call close to the
conditional.
> + struct reftable_fsck_info info;
> + struct dirent *d = NULL;
> + uint64_t min, max;
> + int err = 0;
> +
> + if (!dir)
> + return 0;
> +
> + while ((d = readdir(dir))) {
> + if (!strcmp(d->d_name, "tables.list"))
> + continue;
> +
> + if ((d->d_name[0] == '.' &&
> + (d->d_name[1] == '\0' ||
> + (d->d_name[1] == '.' && d->d_name[2] == '\0'))))
> + continue;
> +
> + if (d->d_type == DT_REG) {
> + if (!valid_table_name(d->d_name, &min, &max)) {
> + info.error = REFTABLE_FSCK_ERROR_TABLE_NAME;
> + info.msg = "file with invalid table name";
> + info.path = d->d_name;
> +
> + err |= report_fn(&info, cb_data);
> + }
One problem with this is that this is racy with concurrent writers. We
don't recognize the "tables.list.lock" file, and neither do we recognize
"0x*-0x*.{ref,log}.temp.XXXXXX"-style files.
Would it be a better approach be to instead go through table names as
loaded by the stack? The reftable code already knows to prune unknown
files anyway, so I don't think we should scan for any other files.
> + } else {
> + info.error = REFTABLE_FSCK_ERROR_INVALID_FILE_TYPE;
> + info.msg = "file with unexpected type";
> + info.path = d->d_name;
> +
> + err |= report_fn(&info, cb_data);
> + }
> + }
> +
> + closedir(dir);
> + return err;
> +}
> +
> +static int stack_checks(struct reftable_stack *stack,
> + reftable_fsck_report_fn report_fn,
> + void *cb_data)
> +{
> + struct reftable_buf msg = REFTABLE_BUF_INIT;
> + char **names = NULL;
This variable is unused.
> + int err = 0;
> +
> + if (stack == NULL)
> + goto out;
Why should someone ever pass a `NULL` stack?
> + err |= stack_check_all_files_in_dir(stack, report_fn, cb_data);
> +
> +out:
> + free_names(names);
> + reftable_buf_release(&msg);
> + return err;
> +}
> +
> +int reftable_fsck_check(struct reftable_stack *stack,
> + reftable_fsck_report_fn report_fn,
> + reftable_fsck_verbose_fn verbose_fn,
> + void *cb_data)
> +{
> + verbose_fn("Checking reftable: stack checks", cb_data);
> + return stack_checks(stack, report_fn, cb_data);
Nit: having this extra function call to `stack_checks()` feels a bit
weird as it could just as well be inlined. Is this preparing for a
future change?
> +}
> diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
> new file mode 100644
> index 0000000000..5e13ac9f02
> --- /dev/null
> +++ b/reftable/reftable-fsck.h
> @@ -0,0 +1,42 @@
> +#ifndef REFTABLE_FSCK_H
> +#define REFTABLE_FSCK_H
> +
> +#include "reftable-stack.h"
> +
> +enum reftable_fsck_error {
> + /* Non regular file in the reftable directory */
> + REFTABLE_FSCK_ERROR_INVALID_FILE_TYPE = 0,
> + /* Invalid table name */
> + REFTABLE_FSCK_ERROR_TABLE_NAME,
> + /* Used for bounds checking, must be last */
> + REFTABLE_FSCK_MAX_VALUE
Let's add a trailing comma here.
Patrick
^ permalink raw reply [flat|nested] 96+ messages in thread* Re: [PATCH v3 7/8] reftable: add code to facilitate consistency checks
2025-09-24 5:54 ` Patrick Steinhardt
@ 2025-09-24 18:40 ` Karthik Nayak
2025-09-25 6:14 ` Patrick Steinhardt
0 siblings, 1 reply; 96+ messages in thread
From: Karthik Nayak @ 2025-09-24 18:40 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, gitster, shejialuo
[-- Attachment #1: Type: text/plain, Size: 5981 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
> On Thu, Sep 18, 2025 at 10:11:48AM +0200, Karthik Nayak wrote:
>> diff --git a/reftable/fsck.c b/reftable/fsck.c
>> new file mode 100644
>> index 0000000000..785e4b43e8
>> --- /dev/null
>> +++ b/reftable/fsck.c
>> @@ -0,0 +1,112 @@
>> +#include "basics.h"
>> +#include "reftable-fsck.h"
>> +#include "stack.h"
>> +
>> +static bool valid_table_name(const char *name, uint64_t *min_update_index,
>> + uint64_t *max_update_index)
>> +{
>> + const char *ptr = name;
>> + char *endptr;
>> +
>> + /* strtoull doesn't set errno on success */
>> + errno = 0;
>> +
>> + *min_update_index = strtoull(ptr, &endptr, 16);
>> + if (errno == EINVAL)
>> + return false;
>
> strtoull may also return ERANGE. In general, shouldn't we abort whenever
> errno is non-zero here?
>
Yeah, that would be much better. will change.
>> + ptr = endptr;
>> +
>> + if (strncmp(ptr, "-", 1))
>> + return false;
>
> Better:
>
> if (*ptr != '-')
> return false;
>
I did use that below. I think I missed changing this, will do.
>> + ptr++;
>> +
>> + *max_update_index = strtoull(ptr, &endptr, 16);
>> + if (errno == EINVAL)
>> + return false;
>> + ptr = endptr;
>> +
>> + if (*ptr != '-')
>> + return false;
>> + ptr++;
>> +
>> + strtoul(ptr, &endptr, 16);
>> + if (errno == EINVAL)
>> + return false;
>> + ptr = endptr;
>> +
>> + if (strcmp(ptr, ".ref") && strcmp(ptr, ".log"))
>> + return false;
>
> Yup, makes sense. We don't do so ourselves, but in theory it is possible
> for tables to have a ".log" suffix. If so, they are expected to only
> contain reflog records.
>
Yeah, I missed this in the previous iteration, but realized while
reading the spec that this could be possible.
>> + return true;
>> +}
>> +
>> +static int stack_check_all_files_in_dir(struct reftable_stack *stack,
>> + reftable_fsck_report_fn report_fn,
>> + void *cb_data)
>> +{
>> + DIR *dir = opendir(stack->reftable_dir);
>
> I think it would make sense to move this function call close to the
> conditional.
>
Fair enough, will move.
>> + struct reftable_fsck_info info;
>> + struct dirent *d = NULL;
>> + uint64_t min, max;
>> + int err = 0;
>> +
>> + if (!dir)
>> + return 0;
>> +
>> + while ((d = readdir(dir))) {
>> + if (!strcmp(d->d_name, "tables.list"))
>> + continue;
>> +
>> + if ((d->d_name[0] == '.' &&
>> + (d->d_name[1] == '\0' ||
>> + (d->d_name[1] == '.' && d->d_name[2] == '\0'))))
>> + continue;
>> +
>> + if (d->d_type == DT_REG) {
>> + if (!valid_table_name(d->d_name, &min, &max)) {
>> + info.error = REFTABLE_FSCK_ERROR_TABLE_NAME;
>> + info.msg = "file with invalid table name";
>> + info.path = d->d_name;
>> +
>> + err |= report_fn(&info, cb_data);
>> + }
>
> One problem with this is that this is racy with concurrent writers. We
> don't recognize the "tables.list.lock" file, and neither do we recognize
> "0x*-0x*.{ref,log}.temp.XXXXXX"-style files.
>
> Would it be a better approach be to instead go through table names as
> loaded by the stack? The reftable code already knows to prune unknown
> files anyway, so I don't think we should scan for any other files.
>
I actually had a more structured code here, where the idea was:
- For each stack
- Run stack level checks
- For each table in stack
- Run table level checks
- For each block in table
- Run block level checks
- For each ref / log
- Run ref / log level checks
But we move some of my tests to be runtime checks, leaving this as the
only check remaining. We could still do the first level of what I
mentioned above. The only reason I didn't was because we wanted to check
all files in the stack dir. But I think this is much better, having
unknown files in the reftable directory doesn't affect the repository in
any way. So I would argue perhaps that we shouldn't even care about it.
>> + } else {
>> + info.error = REFTABLE_FSCK_ERROR_INVALID_FILE_TYPE;
>> + info.msg = "file with unexpected type";
>> + info.path = d->d_name;
>> +
>> + err |= report_fn(&info, cb_data);
>> + }
>> + }
>> +
>> + closedir(dir);
>> + return err;
>> +}
>> +
>> +static int stack_checks(struct reftable_stack *stack,
>> + reftable_fsck_report_fn report_fn,
>> + void *cb_data)
>> +{
>> + struct reftable_buf msg = REFTABLE_BUF_INIT;
>> + char **names = NULL;
>
> This variable is unused.
>
Leftover code, will cleanup.
>> + int err = 0;
>> +
>> + if (stack == NULL)
>> + goto out;
>
> Why should someone ever pass a `NULL` stack?
>
This should be safe to remove.
>> + err |= stack_check_all_files_in_dir(stack, report_fn, cb_data);
>> +
>> +out:
>> + free_names(names);
>> + reftable_buf_release(&msg);
>> + return err;
>> +}
>> +
>> +int reftable_fsck_check(struct reftable_stack *stack,
>> + reftable_fsck_report_fn report_fn,
>> + reftable_fsck_verbose_fn verbose_fn,
>> + void *cb_data)
>> +{
>> + verbose_fn("Checking reftable: stack checks", cb_data);
>> + return stack_checks(stack, report_fn, cb_data);
>
> Nit: having this extra function call to `stack_checks()` feels a bit
> weird as it could just as well be inlined. Is this preparing for a
> future change?
Yeah, mostly the idea was to break things up into layers as I mentioned
above. Let's make it simpler for now and we can make it nicer when we
get around adding more checks.
>
>> +}
>> diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
>> new file mode 100644
>> index 0000000000..5e13ac9f02
>> --- /dev/null
>> +++ b/reftable/reftable-fsck.h
>> @@ -0,0 +1,42 @@
>> +#ifndef REFTABLE_FSCK_H
>> +#define REFTABLE_FSCK_H
>> +
>> +#include "reftable-stack.h"
>> +
>> +enum reftable_fsck_error {
>> + /* Non regular file in the reftable directory */
>> + REFTABLE_FSCK_ERROR_INVALID_FILE_TYPE = 0,
>> + /* Invalid table name */
>> + REFTABLE_FSCK_ERROR_TABLE_NAME,
>> + /* Used for bounds checking, must be last */
>> + REFTABLE_FSCK_MAX_VALUE
>
> Let's add a trailing comma here.
>
> Patrick
Will do.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 96+ messages in thread* Re: [PATCH v3 7/8] reftable: add code to facilitate consistency checks
2025-09-24 18:40 ` Karthik Nayak
@ 2025-09-25 6:14 ` Patrick Steinhardt
0 siblings, 0 replies; 96+ messages in thread
From: Patrick Steinhardt @ 2025-09-25 6:14 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, gitster, shejialuo
On Wed, Sep 24, 2025 at 11:40:31AM -0700, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > On Thu, Sep 18, 2025 at 10:11:48AM +0200, Karthik Nayak wrote:
> >> diff --git a/reftable/fsck.c b/reftable/fsck.c
> >> new file mode 100644
> >> index 0000000000..785e4b43e8
> >> --- /dev/null
> >> +++ b/reftable/fsck.c
[snip]
> >> + struct reftable_fsck_info info;
> >> + struct dirent *d = NULL;
> >> + uint64_t min, max;
> >> + int err = 0;
> >> +
> >> + if (!dir)
> >> + return 0;
> >> +
> >> + while ((d = readdir(dir))) {
> >> + if (!strcmp(d->d_name, "tables.list"))
> >> + continue;
> >> +
> >> + if ((d->d_name[0] == '.' &&
> >> + (d->d_name[1] == '\0' ||
> >> + (d->d_name[1] == '.' && d->d_name[2] == '\0'))))
> >> + continue;
> >> +
> >> + if (d->d_type == DT_REG) {
> >> + if (!valid_table_name(d->d_name, &min, &max)) {
> >> + info.error = REFTABLE_FSCK_ERROR_TABLE_NAME;
> >> + info.msg = "file with invalid table name";
> >> + info.path = d->d_name;
> >> +
> >> + err |= report_fn(&info, cb_data);
> >> + }
> >
> > One problem with this is that this is racy with concurrent writers. We
> > don't recognize the "tables.list.lock" file, and neither do we recognize
> > "0x*-0x*.{ref,log}.temp.XXXXXX"-style files.
> >
> > Would it be a better approach be to instead go through table names as
> > loaded by the stack? The reftable code already knows to prune unknown
> > files anyway, so I don't think we should scan for any other files.
> >
>
> I actually had a more structured code here, where the idea was:
>
> - For each stack
> - Run stack level checks
> - For each table in stack
> - Run table level checks
> - For each block in table
> - Run block level checks
> - For each ref / log
> - Run ref / log level checks
>
> But we move some of my tests to be runtime checks, leaving this as the
> only check remaining. We could still do the first level of what I
> mentioned above. The only reason I didn't was because we wanted to check
> all files in the stack dir. But I think this is much better, having
> unknown files in the reftable directory doesn't affect the repository in
> any way. So I would argue perhaps that we shouldn't even care about it.
Yeah, agreed. As long as we don't know about any edge cases where this
does or did create problems I agree.
Patrick
^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH v3 8/8] refs/reftable: add fsck check for checking the table name
2025-09-18 8:11 ` [PATCH v3 0/8] refs/reftable: add consistency checks Karthik Nayak
` (6 preceding siblings ...)
2025-09-18 8:11 ` [PATCH v3 7/8] reftable: add code to facilitate consistency checks Karthik Nayak
@ 2025-09-18 8:11 ` Karthik Nayak
2025-09-24 5:54 ` Patrick Steinhardt
7 siblings, 1 reply; 96+ messages in thread
From: Karthik Nayak @ 2025-09-18 8:11 UTC (permalink / raw)
To: git; +Cc: ps, gitster, shejialuo, Karthik Nayak
Add glue code in 'refs/reftable-backend.c' which calls the reftable
library to perform the fsck checks. Here we also map the reftable errors
to Git' fsck errors.
Introduce a check to validate table names for a given reftable stack.
Also add 'badReftableTableName' as a corresponding error within Git. The
reftable specification mentions:
It suggested to use
${min_update_index}-${max_update_index}-${random}.ref as a naming
convention.
So treat non-conformant file names as warnings. Introduce another check
to check for file types, non-expected filetypes will be treated as
errors.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Documentation/fsck-msgids.adoc | 6 +++++
fsck.h | 2 ++
refs/reftable-backend.c | 58 ++++++++++++++++++++++++++++++++++++++----
t/meson.build | 1 +
t/t0614-reftable-fsck.sh | 55 +++++++++++++++++++++++++++++++++++++++
5 files changed, 117 insertions(+), 5 deletions(-)
diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 1c912615f9..d10fe9bb35 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -38,6 +38,12 @@
`badReferentName`::
(ERROR) The referent name of a symref is invalid.
+`badReftableFiletype`::
+ (ERROR) File with unexpected type in reftable directory.
+
+`badReftableTableName`::
+ (WARN) A reftable table has an invalid name.
+
`badTagName`::
(INFO) A tag has an invalid format.
diff --git a/fsck.h b/fsck.h
index 6b0db235e0..c857fcdd7c 100644
--- a/fsck.h
+++ b/fsck.h
@@ -34,6 +34,7 @@ enum fsck_msg_type {
FUNC(BAD_PACKED_REF_HEADER, ERROR) \
FUNC(BAD_PARENT_SHA1, ERROR) \
FUNC(BAD_REFERENT_NAME, ERROR) \
+ FUNC(BAD_REFTABLE_FILETYPE, ERROR) \
FUNC(BAD_REF_CONTENT, ERROR) \
FUNC(BAD_REF_FILETYPE, ERROR) \
FUNC(BAD_REF_NAME, ERROR) \
@@ -73,6 +74,7 @@ enum fsck_msg_type {
FUNC(UNKNOWN_TYPE, ERROR) \
FUNC(ZERO_PADDED_DATE, ERROR) \
/* warnings */ \
+ FUNC(BAD_REFTABLE_TABLE_NAME, WARN) \
FUNC(EMPTY_NAME, WARN) \
FUNC(FULL_PATHNAME, WARN) \
FUNC(HAS_DOT, WARN) \
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 2152349cb9..1a18f4bf92 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -6,6 +6,7 @@
#include "../config.h"
#include "../dir.h"
#include "../environment.h"
+#include "../fsck.h"
#include "../gettext.h"
#include "../hash.h"
#include "../hex.h"
@@ -15,10 +16,11 @@
#include "../path.h"
#include "../refs.h"
#include "../reftable/reftable-basics.h"
-#include "../reftable/reftable-stack.h"
-#include "../reftable/reftable-record.h"
#include "../reftable/reftable-error.h"
+#include "../reftable/reftable-fsck.h"
#include "../reftable/reftable-iterator.h"
+#include "../reftable/reftable-record.h"
+#include "../reftable/reftable-stack.h"
#include "../repo-settings.h"
#include "../setup.h"
#include "../strmap.h"
@@ -2707,11 +2709,57 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
return ret;
}
-static int reftable_be_fsck(struct ref_store *ref_store UNUSED,
- struct fsck_options *o UNUSED,
+static void reftable_fsck_verbose_handler(const char *msg, void *cb_data)
+{
+ struct fsck_options *o = cb_data;
+
+ if (o->verbose)
+ fprintf_ln(stderr, "%s", msg);
+}
+
+static const enum fsck_msg_id fsck_msg_id_map[] = {
+ [REFTABLE_FSCK_ERROR_INVALID_FILE_TYPE] = FSCK_MSG_BAD_REFTABLE_FILETYPE,
+ [REFTABLE_FSCK_ERROR_TABLE_NAME] = FSCK_MSG_BAD_REFTABLE_TABLE_NAME,
+};
+
+static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
+ void *cb_data)
+{
+ struct fsck_ref_report report = { .path = info->path };
+ struct fsck_options *o = cb_data;
+ enum fsck_msg_id msg_id;
+
+ if (info->error < 0 || info->error >= REFTABLE_FSCK_MAX_VALUE)
+ BUG("unknown fsck error: %d", info->error);
+
+ msg_id = fsck_msg_id_map[info->error];
+
+ if (!msg_id)
+ BUG("fsck_msg_id value missing for reftable error: %d", info->error);
+
+ return fsck_report_ref(o, &report, msg_id, "%s", info->msg);
+}
+
+static int reftable_be_fsck(struct ref_store *ref_store, struct fsck_options *o,
struct worktree *wt UNUSED)
{
- return 0;
+ struct reftable_ref_store *refs;
+ struct strmap_entry *entry;
+ struct hashmap_iter iter;
+ int ret = 0;
+
+ refs = reftable_be_downcast(ref_store, REF_STORE_READ, "fsck");
+
+ ret |= reftable_fsck_check(refs->main_backend.stack, reftable_fsck_error_handler,
+ reftable_fsck_verbose_handler, o);
+
+ strmap_for_each_entry(&refs->worktree_backends, &iter, entry) {
+ struct reftable_backend *b = (struct reftable_backend *)entry->value;
+ ret |= reftable_fsck_check(b->stack, reftable_fsck_error_handler,
+ reftable_fsck_verbose_handler, o);
+ }
+
+ return ret;
}
struct ref_storage_be refs_be_reftable = {
diff --git a/t/meson.build b/t/meson.build
index 7974795fe4..ec1fc0b2a1 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -146,6 +146,7 @@ integration_tests = [
't0611-reftable-httpd.sh',
't0612-reftable-jgit-compatibility.sh',
't0613-reftable-write-options.sh',
+ 't0614-reftable-fsck.sh',
't1000-read-tree-m-3way.sh',
't1001-read-tree-m-2way.sh',
't1002-read-tree-m-u-2way.sh',
diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
new file mode 100755
index 0000000000..d4e6765b6b
--- /dev/null
+++ b/t/t0614-reftable-fsck.sh
@@ -0,0 +1,55 @@
+#!/bin/sh
+
+test_description='Test reftable backend consistency check'
+
+GIT_TEST_DEFAULT_REF_FORMAT=reftable
+export GIT_TEST_DEFAULT_REF_FORMAT
+
+. ./test-lib.sh
+
+for TABLE_NAME in "foo-bar-e4d12d59.ref" \
+ "0x00000000zzzz-0x00000000zzzz-e4d12d59.ref" \
+ "0x000000000001-0x000000000002-e4d12d59.abc" \
+ "0x000000000001-0x000000000002-e4d12d59.refabc"; do
+ test_expect_success "table name $TABLE_NAME should be checked" '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git commit --allow-empty -m initial &&
+
+ git refs verify 2>err &&
+ test_must_be_empty err &&
+
+ touch ".git/reftable/$TABLE_NAME" &&
+
+ git refs verify 2>err &&
+ cat >expect <<-EOF &&
+ warning: ${TABLE_NAME}: badReftableTableName: file with invalid table name
+ EOF
+ test_cmp expect err
+ )
+ '
+done
+
+test_expect_success "invalid file type should be checked" '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git commit --allow-empty -m initial &&
+
+ git refs verify 2>err &&
+ test_must_be_empty err &&
+
+ mkdir ".git/reftable/foo" &&
+
+ test_must_fail git refs verify 2>err &&
+ cat >expect <<-EOF &&
+ error: foo: badReftableFiletype: file with unexpected type
+ EOF
+ test_cmp expect err
+ )
+'
+
+test_done
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* Re: [PATCH v3 8/8] refs/reftable: add fsck check for checking the table name
2025-09-18 8:11 ` [PATCH v3 8/8] refs/reftable: add fsck check for checking the table name Karthik Nayak
@ 2025-09-24 5:54 ` Patrick Steinhardt
2025-09-24 18:44 ` Karthik Nayak
0 siblings, 1 reply; 96+ messages in thread
From: Patrick Steinhardt @ 2025-09-24 5:54 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, gitster, shejialuo
On Thu, Sep 18, 2025 at 10:11:49AM +0200, Karthik Nayak wrote:
> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index 2152349cb9..1a18f4bf92 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -2707,11 +2709,57 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
> return ret;
> }
>
> -static int reftable_be_fsck(struct ref_store *ref_store UNUSED,
> - struct fsck_options *o UNUSED,
> +static void reftable_fsck_verbose_handler(const char *msg, void *cb_data)
> +{
> + struct fsck_options *o = cb_data;
> +
> + if (o->verbose)
> + fprintf_ln(stderr, "%s", msg);
> +}
> +
> +static const enum fsck_msg_id fsck_msg_id_map[] = {
> + [REFTABLE_FSCK_ERROR_INVALID_FILE_TYPE] = FSCK_MSG_BAD_REFTABLE_FILETYPE,
> + [REFTABLE_FSCK_ERROR_TABLE_NAME] = FSCK_MSG_BAD_REFTABLE_TABLE_NAME,
> +};
> +
> +static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
> + void *cb_data)
> +{
> + struct fsck_ref_report report = { .path = info->path };
> + struct fsck_options *o = cb_data;
> + enum fsck_msg_id msg_id;
> +
> + if (info->error < 0 || info->error >= REFTABLE_FSCK_MAX_VALUE)
> + BUG("unknown fsck error: %d", info->error);
`info->error` is an enum, and whether or not it is signed is an
implementation detail of the platform. But I wonder whether this check
may cause some platforms to warn about an impossible condition.
> +
> + msg_id = fsck_msg_id_map[info->error];
> +
> + if (!msg_id)
> + BUG("fsck_msg_id value missing for reftable error: %d", info->error);
Yup, makes sense.
Patrick
^ permalink raw reply [flat|nested] 96+ messages in thread* Re: [PATCH v3 8/8] refs/reftable: add fsck check for checking the table name
2025-09-24 5:54 ` Patrick Steinhardt
@ 2025-09-24 18:44 ` Karthik Nayak
0 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-09-24 18:44 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, gitster, shejialuo
[-- Attachment #1: Type: text/plain, Size: 922 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
>> +static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
>> + void *cb_data)
>> +{
>> + struct fsck_ref_report report = { .path = info->path };
>> + struct fsck_options *o = cb_data;
>> + enum fsck_msg_id msg_id;
>> +
>> + if (info->error < 0 || info->error >= REFTABLE_FSCK_MAX_VALUE)
>> + BUG("unknown fsck error: %d", info->error);
>
> `info->error` is an enum, and whether or not it is signed is an
> implementation detail of the platform. But I wonder whether this check
> may cause some platforms to warn about an impossible condition.
>
I didn't really think of that. I guess typecasting it to an int would be
the best way forward here.
>> +
>> + msg_id = fsck_msg_id_map[info->error];
>> +
>> + if (!msg_id)
>> + BUG("fsck_msg_id value missing for reftable error: %d", info->error);
>
> Yup, makes sense.
>
> Patrick
Thanks for the review.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH v4 0/7] refs/reftable: add consistency checks
2025-08-19 12:20 [PATCH 0/5] refs/reftable: add fsck checks Karthik Nayak
` (7 preceding siblings ...)
2025-09-18 8:11 ` [PATCH v3 0/8] refs/reftable: add consistency checks Karthik Nayak
@ 2025-09-26 7:25 ` Karthik Nayak
2025-09-26 7:25 ` [PATCH v4 1/7] refs: remove unused headers Karthik Nayak
` (7 more replies)
2025-10-06 14:22 ` [PATCH v5 " Karthik Nayak
2025-10-07 12:11 ` [PATCH v6 " Karthik Nayak
10 siblings, 8 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-09-26 7:25 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, ps, gitster, shejialuo
The reference subsystems allows for adding backend specific consistency
checks. These checks are run as part of 'git refs verify'.
While the files backend has some consistency checks added, the reftable
backend currently has none. This series first tightens the reftable
backend to make it a little more strict and then also adds the required
infrastructure and some simple consistency checks.
Since the reftable backend is treated as a library within the Git
codebase, we don't want to spillover our internal fsck implementation
into the library. At the same time, the fsck checks need to access
internal structures of the reftable library which aren't exposed outside
the library.
So we solve this by adding a 'reftable/fsck.[ch]' which implements and
exposes a checker for the reftable library and returns specific errors
as defined by the library. We then add glue code within
'refs/reftable-backend.c' to map these errors to errors which Git's fsck
implementation would understand. This allows us to separate concerns.
We add the following consistency checks:
1. Check for validating the reftable table name. This is treated as a
warning since the reftable specification only suggests a table name
but doesn't enforce it. Also there is a difference in the table name
used in Git vs that in jGit.
We tighten the reftable backend by raising a REFTABLE_FORMAT_ERROR error
when:
1. The 'tables.list' file doesn't have a trailing newline.
---
Changes in v4:
- The biggest change is to iterate over the tables in a reftable stack
for consistency checks instead of all files inside the REFTABLE_DIR.
This avoids all race conditions. Also, since we only check the tables
in a stack, it no longer makes sense to check file type.
- The discussion about update indices was concluded that tables indices
in a stack must be strictly monotonically increasing. While modifying
the code to do the same. I realized that we already have this check in
'reftable_addition_add()' where we check while adding a new table to
the stack: `wr->min_update_index < add->next_update_index`. So I've
dropped this patch from the series.
- Change parse_names() to accept the output string array as an argument
and return an error instead. This makes the flow a little easier to
understand.
- Link to v3: https://lore.kernel.org/r/20250918-228-reftable-introduce-consistency-checks-v3-0-271af03eb34d@gmail.com
Changes in v3:
- I took a long hiatus from this topic, mostly due to other priorities.
This has been rebased on top of '92c87bdc40 (The eighth batch,
2025-09-12)' since there were conflicts.
- Junio suggested that two of the consistency checks (trailing newlines,
sequential update indices for tables in stack) should actually be
checked during runtime. I have made that change in this version.
- I've cleaned up the code and modularized the 'reftable/fsck.c' code.
- Invalid table name emits a warning, since the reftable spec doesn't
enforce it but only makes a suggestion.
- Broken down the commits to make it easier to review.
- Link to v2: https://lore.kernel.org/r/20250902-228-reftable-introduce-consistency-checks-v2-0-4f96b3834779@gmail.com
Changes in v2:
- Ensured that 'struct reftable_fsck_info' is passed around as a
pointer, this provides a smaller footprint (pointer size vs struct
size).
- Run FSCK checks for other worktrees too, even if one of them fails.
- Separate messaging for table name vs table check and add additional
test.
- Use the relative path in messages used.
- Small style and typo fixes.
- Link to v1: https://lore.kernel.org/r/20250819-228-reftable-introduce-consistency-checks-v1-0-8b8f6879fa9e@gmail.com
---
Documentation/fsck-msgids.adoc | 6 +--
Makefile | 3 +-
fsck.h | 39 +++++++--------
meson.build | 1 +
refs.c | 4 ++
refs/debug.c | 1 -
refs/files-backend.c | 3 --
refs/reftable-backend.c | 58 ++++++++++++++++++++---
reftable/basics.c | 37 ++++++++++-----
reftable/basics.h | 5 +-
reftable/fsck.c | 100 +++++++++++++++++++++++++++++++++++++++
reftable/reftable-fsck.h | 40 ++++++++++++++++
reftable/stack.c | 7 +--
t/meson.build | 1 +
t/t0614-reftable-fsck.sh | 38 +++++++++++++++
t/unit-tests/u-reftable-basics.c | 24 ++++++++--
16 files changed, 308 insertions(+), 59 deletions(-)
Karthik Nayak (7):
refs: remove unused headers
refs: move consistency check msg to generic layer
reftable: check for trailing newline in 'tables.list'
Documentation/fsck-msgids: remove duplicate msg id
fsck: order 'fsck_msg_type' alphabetically
reftable: add code to facilitate consistency checks
refs/reftable: add fsck check for checking the table name
Range-diff versus v3:
1: 4522c10e6e = 1: b91194e060 refs: remove unused headers
2: 40a83fc6fa = 2: d48afbf588 refs: move consistency check msg to generic layer
3: df401e46f7 ! 3: cd7ca2a585 reftable: check for trailing newline in 'tables.list'
@@ Metadata
## Commit message ##
reftable: check for trailing newline in 'tables.list'
- In the reftable format, the 'tables.list' file contains a newline
- separated list of tables. While we parse this file, we do not check or
- care about trailing newlines. Tighten the parser in `parse_names()` to
- return an appropriate error if there is no trailing newline.
+ In the reftable format, the 'tables.list' file contains a
+ newline separated list of tables. While we parse this file, we do not
+ check or care about the last newline. Tighten the parser in
+ `parse_names()` to return an appropriate error if the last newline is
+ missing.
- This requires modification to `parse_names()` to accept a third argument
- which will hold the error value.
+ This requires modification to `parse_names()` to now return the error
+ while accepting the output as a third argument.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
@@ reftable/basics.c: size_t names_length(const char **names)
}
-char **parse_names(char *buf, int size)
-+char **parse_names(char *buf, int size, int *err)
++int parse_names(char *buf, int size, char ***out)
{
char **names = NULL;
size_t names_cap = 0;
-@@ reftable/basics.c: char **parse_names(char *buf, int size)
+ size_t names_len = 0;
+ char *p = buf;
+ char *end = buf + size;
++ int err = 0;
while (p < end) {
char *next = strchr(p, '\n');
- if (next && next < end) {
+- *next = 0;
+ if (!next) {
-+ *err = REFTABLE_FORMAT_ERROR;
++ err = REFTABLE_FORMAT_ERROR;
+ goto done;
+ } else if (next < end) {
- *next = 0;
++ *next = '\0';
} else {
next = end;
}
@@ reftable/basics.c: char **parse_names(char *buf, int size)
- names_cap))
- goto err;
+ names_cap)) {
-+ *err = REFTABLE_OUT_OF_MEMORY_ERROR;
++ err = REFTABLE_OUT_OF_MEMORY_ERROR;
+ goto done;
+ }
@@ reftable/basics.c: char **parse_names(char *buf, int size)
- if (!names[names_len++])
- goto err;
+ if (!names[names_len++]) {
-+ *err = REFTABLE_OUT_OF_MEMORY_ERROR;
++ err = REFTABLE_OUT_OF_MEMORY_ERROR;
+ goto done;
+ }
}
@@ reftable/basics.c: char **parse_names(char *buf, int size)
- if (REFTABLE_ALLOC_GROW(names, names_len + 1, names_cap))
- goto err;
+ if (REFTABLE_ALLOC_GROW(names, names_len + 1, names_cap)) {
-+ *err = REFTABLE_OUT_OF_MEMORY_ERROR;
++ err = REFTABLE_OUT_OF_MEMORY_ERROR;
+ goto done;
+ }
names[names_len] = NULL;
- return names;
-
+- return names;
+-
-err:
++ *out = names;
++ return 0;
+done:
for (size_t i = 0; i < names_len; i++)
reftable_free(names[i]);
reftable_free(names);
+- return NULL;
++ return err;
+ }
+
+ int names_equal(const char **a, const char **b)
## reftable/basics.h ##
@@ reftable/basics.h: void free_names(char **a);
@@ reftable/basics.h: void free_names(char **a);
- * without terminating '\0'. Empty names are discarded. Returns a `NULL`
- * pointer when allocations fail.
+ * without terminating '\0'. Empty names are discarded.
-+ *
-+ * Errors are assigned to the `err` variable.
*/
-char **parse_names(char *buf, int size);
-+char **parse_names(char *buf, int size, int *err);
++int parse_names(char *buf, int size, char ***out);
/* compares two NULL-terminated arrays of strings. */
int names_equal(const char **a, const char **b);
@@ reftable/stack.c: static int fd_read_lines(int fd, char ***namesp)
- *namesp = parse_names(buf, size);
- if (!*namesp) {
- err = REFTABLE_OUT_OF_MEMORY_ERROR;
-+ *namesp = parse_names(buf, size, &err);
-+ if (!*namesp)
- goto done;
+- goto done;
- }
-
+-
++ err = parse_names(buf, size, namesp);
done:
reftable_free(buf);
+ return err;
## t/unit-tests/u-reftable-basics.c ##
@@ t/unit-tests/u-reftable-basics.c: license that can be found in the LICENSE file or at
@@ t/unit-tests/u-reftable-basics.c: void test_reftable_basics__names_equal(void)
- char in2[] = "a\nb\nc";
- char **out = parse_names(in1, strlen(in1));
+ char in2[] = "a\nb\nc\n";
-+ int err = 0;
-+ char **out = parse_names(in1, strlen(in1), &err);
++ char **out = NULL;
++ int err = parse_names(in1, strlen(in1), &out);
+ cl_assert(err == 0);
cl_assert(out != NULL);
cl_assert_equal_s(out[0], "line");
@@ t/unit-tests/u-reftable-basics.c: void test_reftable_basics__names_equal(void)
free_names(out);
- out = parse_names(in2, strlen(in2));
-+ out = parse_names(in2, strlen(in2), &err);
++ out = NULL;
++ err = parse_names(in2, strlen(in2), &out);
+ cl_assert(err == 0);
cl_assert(out != NULL);
cl_assert_equal_s(out[0], "a");
@@ t/unit-tests/u-reftable-basics.c: void test_reftable_basics__parse_names(void)
+void test_reftable_basics__parse_names_missing_newline(void)
+{
+ char in1[] = "line\nline2";
-+ int err = 0;
-+ char **out = parse_names(in1, strlen(in1), &err);
++ char **out = NULL;
++ int err = parse_names(in1, strlen(in1), &out);
+ cl_assert(err == REFTABLE_FORMAT_ERROR);
+ cl_assert(out == NULL);
+}
@@ t/unit-tests/u-reftable-basics.c: void test_reftable_basics__parse_names(void)
{
char in[] = "a\n\nb\n";
- char **out = parse_names(in, strlen(in));
-+ int err = 0;
-+ char **out = parse_names(in, strlen(in), &err);
-+ cl_assert(err == 0);
++ char **out = NULL;
++ int err = parse_names(in, strlen(in), &out);
++ cl_assert(err == 0);
cl_assert(out != NULL);
cl_assert_equal_s(out[0], "a");
/* simply '\n' should be dropped as empty string */
4: 435707f26c < -: ---------- reftable: ensure tables in a stack use sequential update indices
5: ac6275ab87 = 4: e3e0c0b4ae Documentation/fsck-msgids: remove duplicate msg id
6: 6c02925af1 = 5: 24a8d93adc fsck: order 'fsck_msg_type' alphabetically
7: 1ada7bc89c ! 6: d83d763be1 reftable: add code to facilitate consistency checks
@@ Commit message
stack. The callee provides the function with callbacks to handle issue
and information reporting.
- The added check, goes over all files in the reftable directory and
- validates that they have the expected file type and a valid name. It
- raises specific errors for both.
+ The added check, goes over all tables in the reftable stack validates
+ that they have a valid name. It not, it raises an error.
While here, move 'reftable/error.o' in the Makefile to retain
lexicographic ordering.
@@ reftable/fsck.c (new)
@@
+#include "basics.h"
+#include "reftable-fsck.h"
++#include "reftable-table.h"
+#include "stack.h"
+
-+static bool valid_table_name(const char *name, uint64_t *min_update_index,
-+ uint64_t *max_update_index)
++static bool table_has_valid_name(const char *name)
+{
+ const char *ptr = name;
+ char *endptr;
@@ reftable/fsck.c (new)
+ /* strtoull doesn't set errno on success */
+ errno = 0;
+
-+ *min_update_index = strtoull(ptr, &endptr, 16);
-+ if (errno == EINVAL)
++ strtoull(ptr, &endptr, 16);
++ if (errno)
+ return false;
+ ptr = endptr;
+
-+ if (strncmp(ptr, "-", 1))
++ if (*ptr != '-')
+ return false;
+ ptr++;
+
-+ *max_update_index = strtoull(ptr, &endptr, 16);
-+ if (errno == EINVAL)
++ strtoull(ptr, &endptr, 16);
++ if (errno)
+ return false;
+ ptr = endptr;
+
@@ reftable/fsck.c (new)
+ ptr++;
+
+ strtoul(ptr, &endptr, 16);
-+ if (errno == EINVAL)
++ if (errno)
+ return false;
+ ptr = endptr;
+
@@ reftable/fsck.c (new)
+ return true;
+}
+
-+static int stack_check_all_files_in_dir(struct reftable_stack *stack,
-+ reftable_fsck_report_fn report_fn,
-+ void *cb_data)
++typedef int (*table_check_fn)(struct reftable_table *table,
++ reftable_fsck_report_fn report_fn,
++ void *cb_data);
++
++static int table_check_name(struct reftable_table *table,
++ reftable_fsck_report_fn report_fn,
++ void *cb_data)
+{
-+ DIR *dir = opendir(stack->reftable_dir);
-+ struct reftable_fsck_info info;
-+ struct dirent *d = NULL;
-+ uint64_t min, max;
-+ int err = 0;
++ if (!table_has_valid_name(table->name)) {
++ struct reftable_fsck_info info;
++
++ info.error = REFTABLE_FSCK_ERROR_TABLE_NAME;
++ info.msg = "invalid reftable table name";
++ info.path = table->name;
+
-+ if (!dir)
-+ return 0;
-+
-+ while ((d = readdir(dir))) {
-+ if (!strcmp(d->d_name, "tables.list"))
-+ continue;
-+
-+ if ((d->d_name[0] == '.' &&
-+ (d->d_name[1] == '\0' ||
-+ (d->d_name[1] == '.' && d->d_name[2] == '\0'))))
-+ continue;
-+
-+ if (d->d_type == DT_REG) {
-+ if (!valid_table_name(d->d_name, &min, &max)) {
-+ info.error = REFTABLE_FSCK_ERROR_TABLE_NAME;
-+ info.msg = "file with invalid table name";
-+ info.path = d->d_name;
-+
-+ err |= report_fn(&info, cb_data);
-+ }
-+ } else {
-+ info.error = REFTABLE_FSCK_ERROR_INVALID_FILE_TYPE;
-+ info.msg = "file with unexpected type";
-+ info.path = d->d_name;
-+
-+ err |= report_fn(&info, cb_data);
-+ }
++ return report_fn(&info, cb_data);
+ }
+
-+ closedir(dir);
-+ return err;
++ return 0;
+}
+
-+static int stack_checks(struct reftable_stack *stack,
++static int table_checks(struct reftable_table *table,
+ reftable_fsck_report_fn report_fn,
++ reftable_fsck_verbose_fn verbose_fn UNUSED,
+ void *cb_data)
+{
-+ struct reftable_buf msg = REFTABLE_BUF_INIT;
-+ char **names = NULL;
++ table_check_fn table_check_fns[] = {
++ table_check_name,
++ NULL,
++ };
+ int err = 0;
+
-+ if (stack == NULL)
-+ goto out;
++ for (size_t i = 0; table_check_fns[i]; i++)
++ err |= table_check_fns[i](table, report_fn, cb_data);
+
-+ err |= stack_check_all_files_in_dir(stack, report_fn, cb_data);
-+
-+out:
-+ free_names(names);
-+ reftable_buf_release(&msg);
+ return err;
+}
+
@@ reftable/fsck.c (new)
+ reftable_fsck_verbose_fn verbose_fn,
+ void *cb_data)
+{
-+ verbose_fn("Checking reftable: stack checks", cb_data);
-+ return stack_checks(stack, report_fn, cb_data);
++ struct reftable_buf msg = REFTABLE_BUF_INIT;
++ int err = 0;
++
++ for (size_t i = 0; i < stack->tables_len; i++) {
++ reftable_buf_reset(&msg);
++ reftable_buf_addstr(&msg, "Checking table: ");
++ reftable_buf_addstr(&msg, stack->tables[i]->name);
++ verbose_fn(msg.buf, cb_data);
++
++ err |= table_checks(stack->tables[i], report_fn, verbose_fn, cb_data);
++ }
++
++ reftable_buf_release(&msg);
++ return err;
+}
## reftable/reftable-fsck.h (new) ##
@@ reftable/reftable-fsck.h (new)
+#include "reftable-stack.h"
+
+enum reftable_fsck_error {
-+ /* Non regular file in the reftable directory */
-+ REFTABLE_FSCK_ERROR_INVALID_FILE_TYPE = 0,
+ /* Invalid table name */
-+ REFTABLE_FSCK_ERROR_TABLE_NAME,
++ REFTABLE_FSCK_ERROR_TABLE_NAME = 0,
+ /* Used for bounds checking, must be last */
-+ REFTABLE_FSCK_MAX_VALUE
++ REFTABLE_FSCK_MAX_VALUE,
+};
+
+/* Represents an individual error encountered during the FSCK checks. */
8: 77be84e23f ! 7: c49b7887d8 refs/reftable: add fsck check for checking the table name
@@ Commit message
${min_update_index}-${max_update_index}-${random}.ref as a naming
convention.
- So treat non-conformant file names as warnings. Introduce another check
- to check for file types, non-expected filetypes will be treated as
- errors.
+ So treat non-conformant file names as warnings.
+
+ While adding the fsck header to 'refs/reftable-backend.c', modify the
+ list to maintain lexicographical ordering.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
@@ Documentation/fsck-msgids.adoc
`badReferentName`::
(ERROR) The referent name of a symref is invalid.
-+`badReftableFiletype`::
-+ (ERROR) File with unexpected type in reftable directory.
-+
+`badReftableTableName`::
+ (WARN) A reftable table has an invalid name.
+
@@ Documentation/fsck-msgids.adoc
## fsck.h ##
-@@ fsck.h: enum fsck_msg_type {
- FUNC(BAD_PACKED_REF_HEADER, ERROR) \
- FUNC(BAD_PARENT_SHA1, ERROR) \
- FUNC(BAD_REFERENT_NAME, ERROR) \
-+ FUNC(BAD_REFTABLE_FILETYPE, ERROR) \
- FUNC(BAD_REF_CONTENT, ERROR) \
- FUNC(BAD_REF_FILETYPE, ERROR) \
- FUNC(BAD_REF_NAME, ERROR) \
@@ fsck.h: enum fsck_msg_type {
FUNC(UNKNOWN_TYPE, ERROR) \
FUNC(ZERO_PADDED_DATE, ERROR) \
@@ refs/reftable-backend.c: static int reftable_be_reflog_expire(struct ref_store *
+}
+
+static const enum fsck_msg_id fsck_msg_id_map[] = {
-+ [REFTABLE_FSCK_ERROR_INVALID_FILE_TYPE] = FSCK_MSG_BAD_REFTABLE_FILETYPE,
+ [REFTABLE_FSCK_ERROR_TABLE_NAME] = FSCK_MSG_BAD_REFTABLE_TABLE_NAME,
+};
+
@@ refs/reftable-backend.c: static int reftable_be_reflog_expire(struct ref_store *
+ enum fsck_msg_id msg_id;
+
+ if (info->error < 0 || info->error >= REFTABLE_FSCK_MAX_VALUE)
-+ BUG("unknown fsck error: %d", info->error);
++ BUG("unknown fsck error: %d", (int)info->error);
+
+ msg_id = fsck_msg_id_map[info->error];
+
+ if (!msg_id)
-+ BUG("fsck_msg_id value missing for reftable error: %d", info->error);
++ BUG("fsck_msg_id value missing for reftable error: %d", (int)info->error);
+
+ return fsck_report_ref(o, &report, msg_id, "%s", info->msg);
+}
@@ t/t0614-reftable-fsck.sh (new)
+ git refs verify 2>err &&
+ test_must_be_empty err &&
+
-+ touch ".git/reftable/$TABLE_NAME" &&
++ EXISTING_TABLE=$(head -n1 .git/reftable/tables.list) &&
++ mv ".git/reftable/$EXISTING_TABLE" ".git/reftable/$TABLE_NAME" &&
++ sed "s/${EXISTING_TABLE}/${TABLE_NAME}/g" .git/reftable/tables.list > tables.list &&
++ mv tables.list .git/reftable/tables.list &&
+
+ git refs verify 2>err &&
+ cat >expect <<-EOF &&
-+ warning: ${TABLE_NAME}: badReftableTableName: file with invalid table name
++ warning: ${TABLE_NAME}: badReftableTableName: invalid reftable table name
+ EOF
+ test_cmp expect err
+ )
+ '
+done
+
-+test_expect_success "invalid file type should be checked" '
-+ test_when_finished "rm -rf repo" &&
-+ git init repo &&
-+ (
-+ cd repo &&
-+ git commit --allow-empty -m initial &&
-+
-+ git refs verify 2>err &&
-+ test_must_be_empty err &&
-+
-+ mkdir ".git/reftable/foo" &&
-+
-+ test_must_fail git refs verify 2>err &&
-+ cat >expect <<-EOF &&
-+ error: foo: badReftableFiletype: file with unexpected type
-+ EOF
-+ test_cmp expect err
-+ )
-+'
-+
+test_done
base-commit: a483264b01b977f3e65a4419103c21e6af7412a2
change-id: 20250714-228-reftable-introduce-consistency-checks-379ded93c544
Thanks
- Karthik
^ permalink raw reply [flat|nested] 96+ messages in thread* [PATCH v4 1/7] refs: remove unused headers
2025-09-26 7:25 ` [PATCH v4 0/7] refs/reftable: add consistency checks Karthik Nayak
@ 2025-09-26 7:25 ` Karthik Nayak
2025-09-26 7:25 ` [PATCH v4 2/7] refs: move consistency check msg to generic layer Karthik Nayak
` (6 subsequent siblings)
7 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-09-26 7:25 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, ps, gitster, shejialuo
In the 'refs/' namespace, some of the included header files are not
needed, let's remove them.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
refs/debug.c | 1 -
refs/files-backend.c | 1 -
refs/reftable-backend.c | 1 -
3 files changed, 3 deletions(-)
diff --git a/refs/debug.c b/refs/debug.c
index 1cb955961e..697adbd0dc 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -1,7 +1,6 @@
#include "git-compat-util.h"
#include "hex.h"
#include "refs-internal.h"
-#include "string-list.h"
#include "trace.h"
static struct trace_key trace_refs = TRACE_KEY_INIT(REFS);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 1b3bf26add..d4fb033417 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -20,7 +20,6 @@
#include "../dir-iterator.h"
#include "../lockfile.h"
#include "../object.h"
-#include "../object-file.h"
#include "../path.h"
#include "../dir.h"
#include "../chdir-notify.h"
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 9e889da2ff..2152349cb9 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -11,7 +11,6 @@
#include "../hex.h"
#include "../iterator.h"
#include "../ident.h"
-#include "../lockfile.h"
#include "../object.h"
#include "../path.h"
#include "../refs.h"
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* [PATCH v4 2/7] refs: move consistency check msg to generic layer
2025-09-26 7:25 ` [PATCH v4 0/7] refs/reftable: add consistency checks Karthik Nayak
2025-09-26 7:25 ` [PATCH v4 1/7] refs: remove unused headers Karthik Nayak
@ 2025-09-26 7:25 ` Karthik Nayak
2025-09-26 7:25 ` [PATCH v4 3/7] reftable: check for trailing newline in 'tables.list' Karthik Nayak
` (5 subsequent siblings)
7 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-09-26 7:25 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, ps, gitster, shejialuo
The files-backend prints a message before the consistency checks run.
Move this to the generic layer so both the files and reftable backend
can benefit from this message.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
refs.c | 4 ++++
refs/files-backend.c | 2 --
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/refs.c b/refs.c
index 4ff55cf24f..4a7c394226 100644
--- a/refs.c
+++ b/refs.c
@@ -32,6 +32,7 @@
#include "commit.h"
#include "wildmatch.h"
#include "ident.h"
+#include "fsck.h"
/*
* List of all available backends
@@ -323,6 +324,9 @@ int check_refname_format(const char *refname, int flags)
int refs_fsck(struct ref_store *refs, struct fsck_options *o,
struct worktree *wt)
{
+ if (o->verbose)
+ fprintf_ln(stderr, _("Checking references consistency"));
+
return refs->be->fsck(refs, o, wt);
}
diff --git a/refs/files-backend.c b/refs/files-backend.c
index d4fb033417..603b1343d8 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3906,8 +3906,6 @@ static int files_fsck_refs(struct ref_store *ref_store,
NULL,
};
- if (o->verbose)
- fprintf_ln(stderr, _("Checking references consistency"));
return files_fsck_refs_dir(ref_store, o, "refs", wt, fsck_refs_fn);
}
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* [PATCH v4 3/7] reftable: check for trailing newline in 'tables.list'
2025-09-26 7:25 ` [PATCH v4 0/7] refs/reftable: add consistency checks Karthik Nayak
2025-09-26 7:25 ` [PATCH v4 1/7] refs: remove unused headers Karthik Nayak
2025-09-26 7:25 ` [PATCH v4 2/7] refs: move consistency check msg to generic layer Karthik Nayak
@ 2025-09-26 7:25 ` Karthik Nayak
2025-10-02 11:44 ` Patrick Steinhardt
2025-09-26 7:25 ` [PATCH v4 4/7] Documentation/fsck-msgids: remove duplicate msg id Karthik Nayak
` (4 subsequent siblings)
7 siblings, 1 reply; 96+ messages in thread
From: Karthik Nayak @ 2025-09-26 7:25 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, ps, gitster, shejialuo
In the reftable format, the 'tables.list' file contains a
newline separated list of tables. While we parse this file, we do not
check or care about the last newline. Tighten the parser in
`parse_names()` to return an appropriate error if the last newline is
missing.
This requires modification to `parse_names()` to now return the error
while accepting the output as a third argument.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
reftable/basics.c | 37 ++++++++++++++++++++++++-------------
reftable/basics.h | 5 ++---
reftable/stack.c | 7 +------
t/unit-tests/u-reftable-basics.c | 24 ++++++++++++++++++++----
4 files changed, 47 insertions(+), 26 deletions(-)
diff --git a/reftable/basics.c b/reftable/basics.c
index 9988ebd635..e969927b61 100644
--- a/reftable/basics.c
+++ b/reftable/basics.c
@@ -195,44 +195,55 @@ size_t names_length(const char **names)
return p - names;
}
-char **parse_names(char *buf, int size)
+int parse_names(char *buf, int size, char ***out)
{
char **names = NULL;
size_t names_cap = 0;
size_t names_len = 0;
char *p = buf;
char *end = buf + size;
+ int err = 0;
while (p < end) {
char *next = strchr(p, '\n');
- if (next && next < end) {
- *next = 0;
+ if (!next) {
+ err = REFTABLE_FORMAT_ERROR;
+ goto done;
+ } else if (next < end) {
+ *next = '\0';
} else {
next = end;
}
+
if (p < next) {
if (REFTABLE_ALLOC_GROW(names, names_len + 1,
- names_cap))
- goto err;
+ names_cap)) {
+ err = REFTABLE_OUT_OF_MEMORY_ERROR;
+ goto done;
+ }
names[names_len] = reftable_strdup(p);
- if (!names[names_len++])
- goto err;
+ if (!names[names_len++]) {
+ err = REFTABLE_OUT_OF_MEMORY_ERROR;
+ goto done;
+ }
}
p = next + 1;
}
- if (REFTABLE_ALLOC_GROW(names, names_len + 1, names_cap))
- goto err;
+ if (REFTABLE_ALLOC_GROW(names, names_len + 1, names_cap)) {
+ err = REFTABLE_OUT_OF_MEMORY_ERROR;
+ goto done;
+ }
names[names_len] = NULL;
- return names;
-
-err:
+ *out = names;
+ return 0;
+done:
for (size_t i = 0; i < names_len; i++)
reftable_free(names[i]);
reftable_free(names);
- return NULL;
+ return err;
}
int names_equal(const char **a, const char **b)
diff --git a/reftable/basics.h b/reftable/basics.h
index 7d22f96261..693db9524f 100644
--- a/reftable/basics.h
+++ b/reftable/basics.h
@@ -167,10 +167,9 @@ void free_names(char **a);
/*
* Parse a newline separated list of names. `size` is the length of the buffer,
- * without terminating '\0'. Empty names are discarded. Returns a `NULL`
- * pointer when allocations fail.
+ * without terminating '\0'. Empty names are discarded.
*/
-char **parse_names(char *buf, int size);
+int parse_names(char *buf, int size, char ***out);
/* compares two NULL-terminated arrays of strings. */
int names_equal(const char **a, const char **b);
diff --git a/reftable/stack.c b/reftable/stack.c
index f91ce50bcd..65d89820bd 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -109,12 +109,7 @@ static int fd_read_lines(int fd, char ***namesp)
}
buf[size] = 0;
- *namesp = parse_names(buf, size);
- if (!*namesp) {
- err = REFTABLE_OUT_OF_MEMORY_ERROR;
- goto done;
- }
-
+ err = parse_names(buf, size, namesp);
done:
reftable_free(buf);
return err;
diff --git a/t/unit-tests/u-reftable-basics.c b/t/unit-tests/u-reftable-basics.c
index a0471083e7..73566ed0eb 100644
--- a/t/unit-tests/u-reftable-basics.c
+++ b/t/unit-tests/u-reftable-basics.c
@@ -9,6 +9,7 @@ license that can be found in the LICENSE file or at
#include "unit-test.h"
#include "lib-reftable.h"
#include "reftable/basics.h"
+#include "reftable/reftable-error.h"
struct integer_needle_lesseq_args {
int needle;
@@ -79,14 +80,18 @@ void test_reftable_basics__names_equal(void)
void test_reftable_basics__parse_names(void)
{
char in1[] = "line\n";
- char in2[] = "a\nb\nc";
- char **out = parse_names(in1, strlen(in1));
+ char in2[] = "a\nb\nc\n";
+ char **out = NULL;
+ int err = parse_names(in1, strlen(in1), &out);
+ cl_assert(err == 0);
cl_assert(out != NULL);
cl_assert_equal_s(out[0], "line");
cl_assert(!out[1]);
free_names(out);
- out = parse_names(in2, strlen(in2));
+ out = NULL;
+ err = parse_names(in2, strlen(in2), &out);
+ cl_assert(err == 0);
cl_assert(out != NULL);
cl_assert_equal_s(out[0], "a");
cl_assert_equal_s(out[1], "b");
@@ -95,10 +100,21 @@ void test_reftable_basics__parse_names(void)
free_names(out);
}
+void test_reftable_basics__parse_names_missing_newline(void)
+{
+ char in1[] = "line\nline2";
+ char **out = NULL;
+ int err = parse_names(in1, strlen(in1), &out);
+ cl_assert(err == REFTABLE_FORMAT_ERROR);
+ cl_assert(out == NULL);
+}
+
void test_reftable_basics__parse_names_drop_empty_string(void)
{
char in[] = "a\n\nb\n";
- char **out = parse_names(in, strlen(in));
+ char **out = NULL;
+ int err = parse_names(in, strlen(in), &out);
+ cl_assert(err == 0);
cl_assert(out != NULL);
cl_assert_equal_s(out[0], "a");
/* simply '\n' should be dropped as empty string */
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* Re: [PATCH v4 3/7] reftable: check for trailing newline in 'tables.list'
2025-09-26 7:25 ` [PATCH v4 3/7] reftable: check for trailing newline in 'tables.list' Karthik Nayak
@ 2025-10-02 11:44 ` Patrick Steinhardt
2025-10-06 12:02 ` Karthik Nayak
0 siblings, 1 reply; 96+ messages in thread
From: Patrick Steinhardt @ 2025-10-02 11:44 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, gitster, shejialuo
On Fri, Sep 26, 2025 at 09:25:46AM +0200, Karthik Nayak wrote:
> diff --git a/reftable/basics.c b/reftable/basics.c
> index 9988ebd635..e969927b61 100644
> --- a/reftable/basics.c
> +++ b/reftable/basics.c
> @@ -195,44 +195,55 @@ size_t names_length(const char **names)
> return p - names;
> }
>
> -char **parse_names(char *buf, int size)
> +int parse_names(char *buf, int size, char ***out)
> {
Yup, this changed function signature reads a lot nicer to me now and is
more in line with our usual coding style.
> diff --git a/reftable/basics.h b/reftable/basics.h
> index 7d22f96261..693db9524f 100644
> --- a/reftable/basics.h
> +++ b/reftable/basics.h
> @@ -167,10 +167,9 @@ void free_names(char **a);
>
> /*
> * Parse a newline separated list of names. `size` is the length of the buffer,
> - * without terminating '\0'. Empty names are discarded. Returns a `NULL`
> - * pointer when allocations fail.
> + * without terminating '\0'. Empty names are discarded.
> */
> -char **parse_names(char *buf, int size);
> +int parse_names(char *buf, int size, char ***out);
Tiny nit, not worth a reroll: we may still want to document that a
return value of 0 means success, and that it otherwise returns a
reftable error code.
Patrick
^ permalink raw reply [flat|nested] 96+ messages in thread* Re: [PATCH v4 3/7] reftable: check for trailing newline in 'tables.list'
2025-10-02 11:44 ` Patrick Steinhardt
@ 2025-10-06 12:02 ` Karthik Nayak
0 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-10-06 12:02 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, gitster, shejialuo
[-- Attachment #1: Type: text/plain, Size: 1495 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
> On Fri, Sep 26, 2025 at 09:25:46AM +0200, Karthik Nayak wrote:
>> diff --git a/reftable/basics.c b/reftable/basics.c
>> index 9988ebd635..e969927b61 100644
>> --- a/reftable/basics.c
>> +++ b/reftable/basics.c
>> @@ -195,44 +195,55 @@ size_t names_length(const char **names)
>> return p - names;
>> }
>>
>> -char **parse_names(char *buf, int size)
>> +int parse_names(char *buf, int size, char ***out)
>> {
>
> Yup, this changed function signature reads a lot nicer to me now and is
> more in line with our usual coding style.
>
I have to agree with that. It would be so much nicer, if we could return
(value, err) in C. Anyways this seems a lot more consistent.
>> diff --git a/reftable/basics.h b/reftable/basics.h
>> index 7d22f96261..693db9524f 100644
>> --- a/reftable/basics.h
>> +++ b/reftable/basics.h
>> @@ -167,10 +167,9 @@ void free_names(char **a);
>>
>> /*
>> * Parse a newline separated list of names. `size` is the length of the buffer,
>> - * without terminating '\0'. Empty names are discarded. Returns a `NULL`
>> - * pointer when allocations fail.
>> + * without terminating '\0'. Empty names are discarded.
>> */
>> -char **parse_names(char *buf, int size);
>> +int parse_names(char *buf, int size, char ***out);
>
> Tiny nit, not worth a reroll: we may still want to document that a
> return value of 0 means success, and that it otherwise returns a
> reftable error code.
>
Yes, I'll add that in my re-roll.
> Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH v4 4/7] Documentation/fsck-msgids: remove duplicate msg id
2025-09-26 7:25 ` [PATCH v4 0/7] refs/reftable: add consistency checks Karthik Nayak
` (2 preceding siblings ...)
2025-09-26 7:25 ` [PATCH v4 3/7] reftable: check for trailing newline in 'tables.list' Karthik Nayak
@ 2025-09-26 7:25 ` Karthik Nayak
2025-09-26 7:25 ` [PATCH v4 5/7] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
` (3 subsequent siblings)
7 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-09-26 7:25 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, ps, gitster, shejialuo
The `gitmodulesLarge` is repeated twice. Remove the second duplicate.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Documentation/fsck-msgids.adoc | 3 ---
1 file changed, 3 deletions(-)
diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 0ba4f9a27e..1c912615f9 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -104,9 +104,6 @@
`gitmodulesParse`::
(INFO) Could not parse `.gitmodules` blob.
-`gitmodulesLarge`;
- (ERROR) `.gitmodules` blob is too large to parse.
-
`gitmodulesPath`::
(ERROR) `.gitmodules` path is invalid.
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* [PATCH v4 5/7] fsck: order 'fsck_msg_type' alphabetically
2025-09-26 7:25 ` [PATCH v4 0/7] refs/reftable: add consistency checks Karthik Nayak
` (3 preceding siblings ...)
2025-09-26 7:25 ` [PATCH v4 4/7] Documentation/fsck-msgids: remove duplicate msg id Karthik Nayak
@ 2025-09-26 7:25 ` Karthik Nayak
2025-09-26 7:25 ` [PATCH v4 6/7] reftable: add code to facilitate consistency checks Karthik Nayak
` (2 subsequent siblings)
7 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-09-26 7:25 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, ps, gitster, shejialuo
The list of 'fsck_msg_type' seem to be alphabetically ordered, but there
are a few small misses. Fix this by sorting the sub-sections of the
list to maintain alphabetical ordering.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
fsck.h | 38 +++++++++++++++++++-------------------
1 file changed, 19 insertions(+), 19 deletions(-)
diff --git a/fsck.h b/fsck.h
index dd7df3d5b3..6b0db235e0 100644
--- a/fsck.h
+++ b/fsck.h
@@ -33,15 +33,27 @@ enum fsck_msg_type {
FUNC(BAD_PACKED_REF_ENTRY, ERROR) \
FUNC(BAD_PACKED_REF_HEADER, ERROR) \
FUNC(BAD_PARENT_SHA1, ERROR) \
+ FUNC(BAD_REFERENT_NAME, ERROR) \
FUNC(BAD_REF_CONTENT, ERROR) \
FUNC(BAD_REF_FILETYPE, ERROR) \
FUNC(BAD_REF_NAME, ERROR) \
- FUNC(BAD_REFERENT_NAME, ERROR) \
FUNC(BAD_TIMEZONE, ERROR) \
FUNC(BAD_TREE, ERROR) \
FUNC(BAD_TREE_SHA1, ERROR) \
FUNC(BAD_TYPE, ERROR) \
FUNC(DUPLICATE_ENTRIES, ERROR) \
+ FUNC(GITATTRIBUTES_BLOB, ERROR) \
+ FUNC(GITATTRIBUTES_LARGE, ERROR) \
+ FUNC(GITATTRIBUTES_LINE_LENGTH, ERROR) \
+ FUNC(GITATTRIBUTES_MISSING, ERROR) \
+ FUNC(GITMODULES_BLOB, ERROR) \
+ FUNC(GITMODULES_LARGE, ERROR) \
+ FUNC(GITMODULES_MISSING, ERROR) \
+ FUNC(GITMODULES_NAME, ERROR) \
+ FUNC(GITMODULES_PATH, ERROR) \
+ FUNC(GITMODULES_SYMLINK, ERROR) \
+ FUNC(GITMODULES_UPDATE, ERROR) \
+ FUNC(GITMODULES_URL, ERROR) \
FUNC(MISSING_AUTHOR, ERROR) \
FUNC(MISSING_COMMITTER, ERROR) \
FUNC(MISSING_EMAIL, ERROR) \
@@ -60,39 +72,27 @@ enum fsck_msg_type {
FUNC(TREE_NOT_SORTED, ERROR) \
FUNC(UNKNOWN_TYPE, ERROR) \
FUNC(ZERO_PADDED_DATE, ERROR) \
- FUNC(GITMODULES_MISSING, ERROR) \
- FUNC(GITMODULES_BLOB, ERROR) \
- FUNC(GITMODULES_LARGE, ERROR) \
- FUNC(GITMODULES_NAME, ERROR) \
- FUNC(GITMODULES_SYMLINK, ERROR) \
- FUNC(GITMODULES_URL, ERROR) \
- FUNC(GITMODULES_PATH, ERROR) \
- FUNC(GITMODULES_UPDATE, ERROR) \
- FUNC(GITATTRIBUTES_MISSING, ERROR) \
- FUNC(GITATTRIBUTES_LARGE, ERROR) \
- FUNC(GITATTRIBUTES_LINE_LENGTH, ERROR) \
- FUNC(GITATTRIBUTES_BLOB, ERROR) \
/* warnings */ \
FUNC(EMPTY_NAME, WARN) \
FUNC(FULL_PATHNAME, WARN) \
FUNC(HAS_DOT, WARN) \
FUNC(HAS_DOTDOT, WARN) \
FUNC(HAS_DOTGIT, WARN) \
+ FUNC(LARGE_PATHNAME, WARN) \
FUNC(NULL_SHA1, WARN) \
- FUNC(ZERO_PADDED_FILEMODE, WARN) \
FUNC(NUL_IN_COMMIT, WARN) \
- FUNC(LARGE_PATHNAME, WARN) \
+ FUNC(ZERO_PADDED_FILEMODE, WARN) \
/* infos (reported as warnings, but ignored by default) */ \
FUNC(BAD_FILEMODE, INFO) \
+ FUNC(BAD_TAG_NAME, INFO) \
FUNC(EMPTY_PACKED_REFS_FILE, INFO) \
- FUNC(GITMODULES_PARSE, INFO) \
- FUNC(GITIGNORE_SYMLINK, INFO) \
FUNC(GITATTRIBUTES_SYMLINK, INFO) \
+ FUNC(GITIGNORE_SYMLINK, INFO) \
+ FUNC(GITMODULES_PARSE, INFO) \
FUNC(MAILMAP_SYMLINK, INFO) \
- FUNC(BAD_TAG_NAME, INFO) \
FUNC(MISSING_TAGGER_ENTRY, INFO) \
- FUNC(SYMLINK_REF, INFO) \
FUNC(REF_MISSING_NEWLINE, INFO) \
+ FUNC(SYMLINK_REF, INFO) \
FUNC(SYMREF_TARGET_IS_NOT_A_REF, INFO) \
FUNC(TRAILING_REF_CONTENT, INFO) \
/* ignored (elevated when requested) */ \
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* [PATCH v4 6/7] reftable: add code to facilitate consistency checks
2025-09-26 7:25 ` [PATCH v4 0/7] refs/reftable: add consistency checks Karthik Nayak
` (4 preceding siblings ...)
2025-09-26 7:25 ` [PATCH v4 5/7] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
@ 2025-09-26 7:25 ` Karthik Nayak
2025-10-02 11:44 ` Patrick Steinhardt
2025-09-26 7:25 ` [PATCH v4 7/7] refs/reftable: add fsck check for checking the table name Karthik Nayak
2025-09-26 21:08 ` [PATCH v4 0/7] refs/reftable: add consistency checks Junio C Hamano
7 siblings, 1 reply; 96+ messages in thread
From: Karthik Nayak @ 2025-09-26 7:25 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, ps, gitster, shejialuo
The `git refs verify` command is used to run consistency checks on the
reference backends. This command is also invoked when users run 'git
fsck'. While the files-backend has some fsck checks added, the reftable
backend lacks such checks. Let's add the required infrastructure and a
check to test for the files present in the reftable directory.
Since the reftable library is treated as an independent library we
should ensure that the library code works independently without
knowledge about Git's internals. To do this, add both 'reftable/fsck.c'
and 'reftable/reftable-fsck.h'. Which provide an entry point
'reftable_fsck_check' for running fsck checks over a provided reftable
stack. The callee provides the function with callbacks to handle issue
and information reporting.
The added check, goes over all tables in the reftable stack validates
that they have a valid name. It not, it raises an error.
While here, move 'reftable/error.o' in the Makefile to retain
lexicographic ordering.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Makefile | 3 +-
meson.build | 1 +
reftable/fsck.c | 100 +++++++++++++++++++++++++++++++++++++++++++++++
reftable/reftable-fsck.h | 40 +++++++++++++++++++
4 files changed, 143 insertions(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index 4c95affadb..03fbaf2b21 100644
--- a/Makefile
+++ b/Makefile
@@ -2732,9 +2732,10 @@ XDIFF_OBJS += xdiff/xutils.o
xdiff-objs: $(XDIFF_OBJS)
REFTABLE_OBJS += reftable/basics.o
-REFTABLE_OBJS += reftable/error.o
REFTABLE_OBJS += reftable/block.o
REFTABLE_OBJS += reftable/blocksource.o
+REFTABLE_OBJS += reftable/error.o
+REFTABLE_OBJS += reftable/fsck.o
REFTABLE_OBJS += reftable/iter.o
REFTABLE_OBJS += reftable/merged.o
REFTABLE_OBJS += reftable/pq.o
diff --git a/meson.build b/meson.build
index b3dfcc0497..8914252910 100644
--- a/meson.build
+++ b/meson.build
@@ -452,6 +452,7 @@ libgit_sources = [
'reftable/error.c',
'reftable/block.c',
'reftable/blocksource.c',
+ 'reftable/fsck.c',
'reftable/iter.c',
'reftable/merged.c',
'reftable/pq.c',
diff --git a/reftable/fsck.c b/reftable/fsck.c
new file mode 100644
index 0000000000..26b9115b14
--- /dev/null
+++ b/reftable/fsck.c
@@ -0,0 +1,100 @@
+#include "basics.h"
+#include "reftable-fsck.h"
+#include "reftable-table.h"
+#include "stack.h"
+
+static bool table_has_valid_name(const char *name)
+{
+ const char *ptr = name;
+ char *endptr;
+
+ /* strtoull doesn't set errno on success */
+ errno = 0;
+
+ strtoull(ptr, &endptr, 16);
+ if (errno)
+ return false;
+ ptr = endptr;
+
+ if (*ptr != '-')
+ return false;
+ ptr++;
+
+ strtoull(ptr, &endptr, 16);
+ if (errno)
+ return false;
+ ptr = endptr;
+
+ if (*ptr != '-')
+ return false;
+ ptr++;
+
+ strtoul(ptr, &endptr, 16);
+ if (errno)
+ return false;
+ ptr = endptr;
+
+ if (strcmp(ptr, ".ref") && strcmp(ptr, ".log"))
+ return false;
+
+ return true;
+}
+
+typedef int (*table_check_fn)(struct reftable_table *table,
+ reftable_fsck_report_fn report_fn,
+ void *cb_data);
+
+static int table_check_name(struct reftable_table *table,
+ reftable_fsck_report_fn report_fn,
+ void *cb_data)
+{
+ if (!table_has_valid_name(table->name)) {
+ struct reftable_fsck_info info;
+
+ info.error = REFTABLE_FSCK_ERROR_TABLE_NAME;
+ info.msg = "invalid reftable table name";
+ info.path = table->name;
+
+ return report_fn(&info, cb_data);
+ }
+
+ return 0;
+}
+
+static int table_checks(struct reftable_table *table,
+ reftable_fsck_report_fn report_fn,
+ reftable_fsck_verbose_fn verbose_fn UNUSED,
+ void *cb_data)
+{
+ table_check_fn table_check_fns[] = {
+ table_check_name,
+ NULL,
+ };
+ int err = 0;
+
+ for (size_t i = 0; table_check_fns[i]; i++)
+ err |= table_check_fns[i](table, report_fn, cb_data);
+
+ return err;
+}
+
+int reftable_fsck_check(struct reftable_stack *stack,
+ reftable_fsck_report_fn report_fn,
+ reftable_fsck_verbose_fn verbose_fn,
+ void *cb_data)
+{
+ struct reftable_buf msg = REFTABLE_BUF_INIT;
+ int err = 0;
+
+ for (size_t i = 0; i < stack->tables_len; i++) {
+ reftable_buf_reset(&msg);
+ reftable_buf_addstr(&msg, "Checking table: ");
+ reftable_buf_addstr(&msg, stack->tables[i]->name);
+ verbose_fn(msg.buf, cb_data);
+
+ err |= table_checks(stack->tables[i], report_fn, verbose_fn, cb_data);
+ }
+
+ reftable_buf_release(&msg);
+ return err;
+}
diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
new file mode 100644
index 0000000000..007a392cf9
--- /dev/null
+++ b/reftable/reftable-fsck.h
@@ -0,0 +1,40 @@
+#ifndef REFTABLE_FSCK_H
+#define REFTABLE_FSCK_H
+
+#include "reftable-stack.h"
+
+enum reftable_fsck_error {
+ /* Invalid table name */
+ REFTABLE_FSCK_ERROR_TABLE_NAME = 0,
+ /* Used for bounds checking, must be last */
+ REFTABLE_FSCK_MAX_VALUE,
+};
+
+/* Represents an individual error encountered during the FSCK checks. */
+struct reftable_fsck_info {
+ enum reftable_fsck_error error;
+ const char *msg;
+ const char *path;
+};
+
+typedef int reftable_fsck_report_fn(struct reftable_fsck_info *info,
+ void *cb_data);
+typedef void reftable_fsck_verbose_fn(const char *msg, void *cb_data);
+
+/*
+ * Given a reftable stack, perform consistency checks on the stack.
+ *
+ * If an issue is encountered, the issue is reported to the callee via the
+ * provided 'report_fn'. If the issue is non-recoverable the flow will not
+ * continue. If it is recoverable, the flow will continue and further issues
+ * will be reported as identified.
+ *
+ * The 'verbose_fn' will be invoked to provide verbose information about
+ * the progress and state of the consistency checks.
+ */
+int reftable_fsck_check(struct reftable_stack *stack,
+ reftable_fsck_report_fn report_fn,
+ reftable_fsck_verbose_fn verbose_fn,
+ void *cb_data);
+
+#endif /* REFTABLE_FSCK_H */
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* Re: [PATCH v4 6/7] reftable: add code to facilitate consistency checks
2025-09-26 7:25 ` [PATCH v4 6/7] reftable: add code to facilitate consistency checks Karthik Nayak
@ 2025-10-02 11:44 ` Patrick Steinhardt
0 siblings, 0 replies; 96+ messages in thread
From: Patrick Steinhardt @ 2025-10-02 11:44 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, gitster, shejialuo
On Fri, Sep 26, 2025 at 09:25:49AM +0200, Karthik Nayak wrote:
> diff --git a/reftable/fsck.c b/reftable/fsck.c
> new file mode 100644
> index 0000000000..26b9115b14
> --- /dev/null
> +++ b/reftable/fsck.c
> @@ -0,0 +1,100 @@
[snip]
> +static int table_checks(struct reftable_table *table,
> + reftable_fsck_report_fn report_fn,
> + reftable_fsck_verbose_fn verbose_fn UNUSED,
> + void *cb_data)
> +{
> + table_check_fn table_check_fns[] = {
> + table_check_name,
> + NULL,
> + };
> + int err = 0;
> +
> + for (size_t i = 0; table_check_fns[i]; i++)
> + err |= table_check_fns[i](table, report_fn, cb_data);
> +
> + return err;
> +}
Okay, good. We now only verify individual table names part of the stack,
and don't scan the directory anymore. Furthermore, it is easy to add
more tests by adding to the function array.
Patrick
^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH v4 7/7] refs/reftable: add fsck check for checking the table name
2025-09-26 7:25 ` [PATCH v4 0/7] refs/reftable: add consistency checks Karthik Nayak
` (5 preceding siblings ...)
2025-09-26 7:25 ` [PATCH v4 6/7] reftable: add code to facilitate consistency checks Karthik Nayak
@ 2025-09-26 7:25 ` Karthik Nayak
2025-10-02 11:44 ` Patrick Steinhardt
2025-09-26 21:08 ` [PATCH v4 0/7] refs/reftable: add consistency checks Junio C Hamano
7 siblings, 1 reply; 96+ messages in thread
From: Karthik Nayak @ 2025-09-26 7:25 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, ps, gitster, shejialuo
Add glue code in 'refs/reftable-backend.c' which calls the reftable
library to perform the fsck checks. Here we also map the reftable errors
to Git' fsck errors.
Introduce a check to validate table names for a given reftable stack.
Also add 'badReftableTableName' as a corresponding error within Git. The
reftable specification mentions:
It suggested to use
${min_update_index}-${max_update_index}-${random}.ref as a naming
convention.
So treat non-conformant file names as warnings.
While adding the fsck header to 'refs/reftable-backend.c', modify the
list to maintain lexicographical ordering.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Documentation/fsck-msgids.adoc | 3 +++
fsck.h | 1 +
refs/reftable-backend.c | 57 ++++++++++++++++++++++++++++++++++++++----
t/meson.build | 1 +
t/t0614-reftable-fsck.sh | 38 ++++++++++++++++++++++++++++
5 files changed, 95 insertions(+), 5 deletions(-)
diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 1c912615f9..81f11ba125 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -38,6 +38,9 @@
`badReferentName`::
(ERROR) The referent name of a symref is invalid.
+`badReftableTableName`::
+ (WARN) A reftable table has an invalid name.
+
`badTagName`::
(INFO) A tag has an invalid format.
diff --git a/fsck.h b/fsck.h
index 6b0db235e0..759df97655 100644
--- a/fsck.h
+++ b/fsck.h
@@ -73,6 +73,7 @@ enum fsck_msg_type {
FUNC(UNKNOWN_TYPE, ERROR) \
FUNC(ZERO_PADDED_DATE, ERROR) \
/* warnings */ \
+ FUNC(BAD_REFTABLE_TABLE_NAME, WARN) \
FUNC(EMPTY_NAME, WARN) \
FUNC(FULL_PATHNAME, WARN) \
FUNC(HAS_DOT, WARN) \
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 2152349cb9..b106fd8b53 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -6,6 +6,7 @@
#include "../config.h"
#include "../dir.h"
#include "../environment.h"
+#include "../fsck.h"
#include "../gettext.h"
#include "../hash.h"
#include "../hex.h"
@@ -15,10 +16,11 @@
#include "../path.h"
#include "../refs.h"
#include "../reftable/reftable-basics.h"
-#include "../reftable/reftable-stack.h"
-#include "../reftable/reftable-record.h"
#include "../reftable/reftable-error.h"
+#include "../reftable/reftable-fsck.h"
#include "../reftable/reftable-iterator.h"
+#include "../reftable/reftable-record.h"
+#include "../reftable/reftable-stack.h"
#include "../repo-settings.h"
#include "../setup.h"
#include "../strmap.h"
@@ -2707,11 +2709,56 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
return ret;
}
-static int reftable_be_fsck(struct ref_store *ref_store UNUSED,
- struct fsck_options *o UNUSED,
+static void reftable_fsck_verbose_handler(const char *msg, void *cb_data)
+{
+ struct fsck_options *o = cb_data;
+
+ if (o->verbose)
+ fprintf_ln(stderr, "%s", msg);
+}
+
+static const enum fsck_msg_id fsck_msg_id_map[] = {
+ [REFTABLE_FSCK_ERROR_TABLE_NAME] = FSCK_MSG_BAD_REFTABLE_TABLE_NAME,
+};
+
+static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
+ void *cb_data)
+{
+ struct fsck_ref_report report = { .path = info->path };
+ struct fsck_options *o = cb_data;
+ enum fsck_msg_id msg_id;
+
+ if (info->error < 0 || info->error >= REFTABLE_FSCK_MAX_VALUE)
+ BUG("unknown fsck error: %d", (int)info->error);
+
+ msg_id = fsck_msg_id_map[info->error];
+
+ if (!msg_id)
+ BUG("fsck_msg_id value missing for reftable error: %d", (int)info->error);
+
+ return fsck_report_ref(o, &report, msg_id, "%s", info->msg);
+}
+
+static int reftable_be_fsck(struct ref_store *ref_store, struct fsck_options *o,
struct worktree *wt UNUSED)
{
- return 0;
+ struct reftable_ref_store *refs;
+ struct strmap_entry *entry;
+ struct hashmap_iter iter;
+ int ret = 0;
+
+ refs = reftable_be_downcast(ref_store, REF_STORE_READ, "fsck");
+
+ ret |= reftable_fsck_check(refs->main_backend.stack, reftable_fsck_error_handler,
+ reftable_fsck_verbose_handler, o);
+
+ strmap_for_each_entry(&refs->worktree_backends, &iter, entry) {
+ struct reftable_backend *b = (struct reftable_backend *)entry->value;
+ ret |= reftable_fsck_check(b->stack, reftable_fsck_error_handler,
+ reftable_fsck_verbose_handler, o);
+ }
+
+ return ret;
}
struct ref_storage_be refs_be_reftable = {
diff --git a/t/meson.build b/t/meson.build
index 7974795fe4..ec1fc0b2a1 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -146,6 +146,7 @@ integration_tests = [
't0611-reftable-httpd.sh',
't0612-reftable-jgit-compatibility.sh',
't0613-reftable-write-options.sh',
+ 't0614-reftable-fsck.sh',
't1000-read-tree-m-3way.sh',
't1001-read-tree-m-2way.sh',
't1002-read-tree-m-u-2way.sh',
diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
new file mode 100755
index 0000000000..250d244e66
--- /dev/null
+++ b/t/t0614-reftable-fsck.sh
@@ -0,0 +1,38 @@
+#!/bin/sh
+
+test_description='Test reftable backend consistency check'
+
+GIT_TEST_DEFAULT_REF_FORMAT=reftable
+export GIT_TEST_DEFAULT_REF_FORMAT
+
+. ./test-lib.sh
+
+for TABLE_NAME in "foo-bar-e4d12d59.ref" \
+ "0x00000000zzzz-0x00000000zzzz-e4d12d59.ref" \
+ "0x000000000001-0x000000000002-e4d12d59.abc" \
+ "0x000000000001-0x000000000002-e4d12d59.refabc"; do
+ test_expect_success "table name $TABLE_NAME should be checked" '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git commit --allow-empty -m initial &&
+
+ git refs verify 2>err &&
+ test_must_be_empty err &&
+
+ EXISTING_TABLE=$(head -n1 .git/reftable/tables.list) &&
+ mv ".git/reftable/$EXISTING_TABLE" ".git/reftable/$TABLE_NAME" &&
+ sed "s/${EXISTING_TABLE}/${TABLE_NAME}/g" .git/reftable/tables.list > tables.list &&
+ mv tables.list .git/reftable/tables.list &&
+
+ git refs verify 2>err &&
+ cat >expect <<-EOF &&
+ warning: ${TABLE_NAME}: badReftableTableName: invalid reftable table name
+ EOF
+ test_cmp expect err
+ )
+ '
+done
+
+test_done
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* Re: [PATCH v4 7/7] refs/reftable: add fsck check for checking the table name
2025-09-26 7:25 ` [PATCH v4 7/7] refs/reftable: add fsck check for checking the table name Karthik Nayak
@ 2025-10-02 11:44 ` Patrick Steinhardt
2025-10-06 12:05 ` Karthik Nayak
0 siblings, 1 reply; 96+ messages in thread
From: Patrick Steinhardt @ 2025-10-02 11:44 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, gitster, shejialuo
On Fri, Sep 26, 2025 at 09:25:50AM +0200, Karthik Nayak wrote:
> diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
> new file mode 100755
> index 0000000000..250d244e66
> --- /dev/null
> +++ b/t/t0614-reftable-fsck.sh
> @@ -0,0 +1,38 @@
> +#!/bin/sh
> +
> +test_description='Test reftable backend consistency check'
> +
> +GIT_TEST_DEFAULT_REF_FORMAT=reftable
> +export GIT_TEST_DEFAULT_REF_FORMAT
> +
> +. ./test-lib.sh
> +
> +for TABLE_NAME in "foo-bar-e4d12d59.ref" \
> + "0x00000000zzzz-0x00000000zzzz-e4d12d59.ref" \
> + "0x000000000001-0x000000000002-e4d12d59.abc" \
> + "0x000000000001-0x000000000002-e4d12d59.refabc"; do
> + test_expect_success "table name $TABLE_NAME should be checked" '
> + test_when_finished "rm -rf repo" &&
> + git init repo &&
> + (
> + cd repo &&
> + git commit --allow-empty -m initial &&
> +
> + git refs verify 2>err &&
> + test_must_be_empty err &&
> +
> + EXISTING_TABLE=$(head -n1 .git/reftable/tables.list) &&
> + mv ".git/reftable/$EXISTING_TABLE" ".git/reftable/$TABLE_NAME" &&
> + sed "s/${EXISTING_TABLE}/${TABLE_NAME}/g" .git/reftable/tables.list > tables.list &&
> + mv tables.list .git/reftable/tables.list &&
> +
> + git refs verify 2>err &&
> + cat >expect <<-EOF &&
> + warning: ${TABLE_NAME}: badReftableTableName: invalid reftable table name
> + EOF
> + test_cmp expect err
> + )
> + '
> +done
> +
> +test_done
Nit: we don't have any test that verifies that `git refs verify` doesn't
complain with a well-formed stack.
Other than that this series looks good to me, thanks! I think we might
want to have one final reroll, but once that's out I think this should
be ready to be merged down.
Patrick
^ permalink raw reply [flat|nested] 96+ messages in thread* Re: [PATCH v4 7/7] refs/reftable: add fsck check for checking the table name
2025-10-02 11:44 ` Patrick Steinhardt
@ 2025-10-06 12:05 ` Karthik Nayak
0 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-10-06 12:05 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, gitster, shejialuo
[-- Attachment #1: Type: text/plain, Size: 2056 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
> On Fri, Sep 26, 2025 at 09:25:50AM +0200, Karthik Nayak wrote:
>> diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
>> new file mode 100755
>> index 0000000000..250d244e66
>> --- /dev/null
>> +++ b/t/t0614-reftable-fsck.sh
>> @@ -0,0 +1,38 @@
>> +#!/bin/sh
>> +
>> +test_description='Test reftable backend consistency check'
>> +
>> +GIT_TEST_DEFAULT_REF_FORMAT=reftable
>> +export GIT_TEST_DEFAULT_REF_FORMAT
>> +
>> +. ./test-lib.sh
>> +
>> +for TABLE_NAME in "foo-bar-e4d12d59.ref" \
>> + "0x00000000zzzz-0x00000000zzzz-e4d12d59.ref" \
>> + "0x000000000001-0x000000000002-e4d12d59.abc" \
>> + "0x000000000001-0x000000000002-e4d12d59.refabc"; do
>> + test_expect_success "table name $TABLE_NAME should be checked" '
>> + test_when_finished "rm -rf repo" &&
>> + git init repo &&
>> + (
>> + cd repo &&
>> + git commit --allow-empty -m initial &&
>> +
>> + git refs verify 2>err &&
>> + test_must_be_empty err &&
>> +
>> + EXISTING_TABLE=$(head -n1 .git/reftable/tables.list) &&
>> + mv ".git/reftable/$EXISTING_TABLE" ".git/reftable/$TABLE_NAME" &&
>> + sed "s/${EXISTING_TABLE}/${TABLE_NAME}/g" .git/reftable/tables.list > tables.list &&
>> + mv tables.list .git/reftable/tables.list &&
>> +
>> + git refs verify 2>err &&
>> + cat >expect <<-EOF &&
>> + warning: ${TABLE_NAME}: badReftableTableName: invalid reftable table name
>> + EOF
>> + test_cmp expect err
>> + )
>> + '
>> +done
>> +
>> +test_done
>
> Nit: we don't have any test that verifies that `git refs verify` doesn't
> complain with a well-formed stack.
The above test does run `git refs verify` on the repository before
modifying the 'tables.list' file. Do you mean a stack with > 1 tables? I
think that would be worthwhile. Let me do that.
>
> Other than that this series looks good to me, thanks! I think we might
> want to have one final reroll, but once that's out I think this should
> be ready to be merged down.
>
> Patrick
Really appreciate the quick and thorough reviews.
Thanks,
Karthik
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v4 0/7] refs/reftable: add consistency checks
2025-09-26 7:25 ` [PATCH v4 0/7] refs/reftable: add consistency checks Karthik Nayak
` (6 preceding siblings ...)
2025-09-26 7:25 ` [PATCH v4 7/7] refs/reftable: add fsck check for checking the table name Karthik Nayak
@ 2025-09-26 21:08 ` Junio C Hamano
7 siblings, 0 replies; 96+ messages in thread
From: Junio C Hamano @ 2025-09-26 21:08 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, ps, shejialuo
Karthik Nayak <karthik.188@gmail.com> writes:
> Changes in v4:
> - The biggest change is to iterate over the tables in a reftable stack
> for consistency checks instead of all files inside the REFTABLE_DIR.
> This avoids all race conditions. Also, since we only check the tables
> in a stack, it no longer makes sense to check file type.
Nice.
> - The discussion about update indices was concluded that tables indices
> in a stack must be strictly monotonically increasing. While modifying
> the code to do the same. I realized that we already have this check in
> 'reftable_addition_add()' where we check while adding a new table to
> the stack: `wr->min_update_index < add->next_update_index`. So I've
> dropped this patch from the series.
Great. Reading over patches and noticing that it is not needed is
the best kind of proofreading ;-)
> - Change parse_names() to accept the output string array as an argument
> and return an error instead. This makes the flow a little easier to
> understand.
Wonderful.
Will queue. Thanks.
^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH v5 0/7] refs/reftable: add consistency checks
2025-08-19 12:20 [PATCH 0/5] refs/reftable: add fsck checks Karthik Nayak
` (8 preceding siblings ...)
2025-09-26 7:25 ` [PATCH v4 0/7] refs/reftable: add consistency checks Karthik Nayak
@ 2025-10-06 14:22 ` Karthik Nayak
2025-10-06 14:22 ` [PATCH v5 1/7] refs: remove unused headers Karthik Nayak
` (7 more replies)
2025-10-07 12:11 ` [PATCH v6 " Karthik Nayak
10 siblings, 8 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-10-06 14:22 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, ps, gitster
The reference subsystems allows for adding backend specific consistency
checks. These checks are run as part of 'git refs verify'.
While the files backend has some consistency checks added, the reftable
backend currently has none. This series first tightens the reftable
backend to make it a little more strict and then also adds the required
infrastructure and some simple consistency checks.
Since the reftable backend is treated as a library within the Git
codebase, we don't want to spillover our internal fsck implementation
into the library. At the same time, the fsck checks need to access
internal structures of the reftable library which aren't exposed outside
the library.
So we solve this by adding a 'reftable/fsck.[ch]' which implements and
exposes a checker for the reftable library and returns specific errors
as defined by the library. We then add glue code within
'refs/reftable-backend.c' to map these errors to errors which Git's fsck
implementation would understand. This allows us to separate concerns.
We add the following consistency checks:
1. Check for validating the reftable table name. This is treated as a
warning since the reftable specification only suggests a table name
but doesn't enforce it. Also there is a difference in the table name
used in Git vs that in jGit.
We tighten the reftable backend by raising a REFTABLE_FORMAT_ERROR error
when:
1. The 'tables.list' file doesn't have a trailing newline.
---
Changes in v5:
- Added documentation around the return value of 'parse_names()'.
- Added a test to validate that 'git refs verify' doesn't barf against
a clean working repository with multiple reftable tables.
- Link to v4: https://lore.kernel.org/all/20250926-228-reftable-introduce-consistency-checks-v4-0-c96fd8551c0d@gmail.com
Changes in v4:
- The biggest change is to iterate over the tables in a reftable stack
for consistency checks instead of all files inside the REFTABLE_DIR.
This avoids all race conditions. Also, since we only check the tables
in a stack, it no longer makes sense to check file type.
- The discussion about update indices was concluded that tables indices
in a stack must be strictly monotonically increasing. While modifying
the code to do the same. I realized that we already have this check in
'reftable_addition_add()' where we check while adding a new table to
the stack: `wr->min_update_index < add->next_update_index`. So I've
dropped this patch from the series.
- Change parse_names() to accept the output string array as an argument
and return an error instead. This makes the flow a little easier to
understand.
- Link to v3: https://lore.kernel.org/r/20250918-228-reftable-introduce-consistency-checks-v3-0-271af03eb34d@gmail.com
Changes in v3:
- I took a long hiatus from this topic, mostly due to other priorities.
This has been rebased on top of '92c87bdc40 (The eighth batch,
2025-09-12)' since there were conflicts.
- Junio suggested that two of the consistency checks (trailing newlines,
sequential update indices for tables in stack) should actually be
checked during runtime. I have made that change in this version.
- I've cleaned up the code and modularized the 'reftable/fsck.c' code.
- Invalid table name emits a warning, since the reftable spec doesn't
enforce it but only makes a suggestion.
- Broken down the commits to make it easier to review.
- Link to v2: https://lore.kernel.org/r/20250902-228-reftable-introduce-consistency-checks-v2-0-4f96b3834779@gmail.com
Changes in v2:
- Ensured that 'struct reftable_fsck_info' is passed around as a
pointer, this provides a smaller footprint (pointer size vs struct
size).
- Run FSCK checks for other worktrees too, even if one of them fails.
- Separate messaging for table name vs table check and add additional
test.
- Use the relative path in messages used.
- Small style and typo fixes.
- Link to v1: https://lore.kernel.org/r/20250819-228-reftable-introduce-consistency-checks-v1-0-8b8f6879fa9e@gmail.com
---
Documentation/fsck-msgids.adoc | 6 +--
Makefile | 3 +-
fsck.h | 39 +++++++--------
meson.build | 1 +
refs.c | 4 ++
refs/debug.c | 1 -
refs/files-backend.c | 3 --
refs/reftable-backend.c | 58 ++++++++++++++++++++---
reftable/basics.c | 37 ++++++++++-----
reftable/basics.h | 7 +--
reftable/fsck.c | 100 +++++++++++++++++++++++++++++++++++++++
reftable/reftable-fsck.h | 40 ++++++++++++++++
reftable/stack.c | 7 +--
t/meson.build | 1 +
t/t0614-reftable-fsck.sh | 58 +++++++++++++++++++++++
t/unit-tests/u-reftable-basics.c | 24 ++++++++--
16 files changed, 330 insertions(+), 59 deletions(-)
Karthik Nayak (7):
refs: remove unused headers
refs: move consistency check msg to generic layer
reftable: check for trailing newline in 'tables.list'
Documentation/fsck-msgids: remove duplicate msg id
fsck: order 'fsck_msg_type' alphabetically
reftable: add code to facilitate consistency checks
refs/reftable: add fsck check for checking the table name
Range-diff versus v4:
1: 4e40ab1ff7 < -: ---------- refs/reftable: add consistency checks
2: b91194e060 = 1: 6e3766330b refs: remove unused headers
3: d48afbf588 = 2: e93c0deaf7 refs: move consistency check msg to generic layer
4: cd7ca2a585 ! 3: 7a282473a1 reftable: check for trailing newline in 'tables.list'
@@ reftable/basics.h: void free_names(char **a);
- * without terminating '\0'. Empty names are discarded. Returns a `NULL`
- * pointer when allocations fail.
+ * without terminating '\0'. Empty names are discarded.
++ *
++ * Returns 0 on success, a reftable error code on error.
*/
-char **parse_names(char *buf, int size);
+int parse_names(char *buf, int size, char ***out);
5: e3e0c0b4ae = 4: 4b47088232 Documentation/fsck-msgids: remove duplicate msg id
6: 24a8d93adc = 5: 112ae21321 fsck: order 'fsck_msg_type' alphabetically
7: d83d763be1 = 6: 3d1fc18260 reftable: add code to facilitate consistency checks
8: d86ecd5bed ! 7: 2b628e3623 refs/reftable: add fsck check for checking the table name
@@ Commit message
So treat non-conformant file names as warnings.
- While adding the fsck header to 'refs/reftable-backend.c', order the
- list of headers.
+ While adding the fsck header to 'refs/reftable-backend.c', modify the
+ list to maintain lexicographical ordering.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
@@ t/t0614-reftable-fsck.sh (new)
+
+. ./test-lib.sh
+
++test_expect_success "no errors reported on a well formed repository" '
++ test_when_finished "rm -rf repo" &&
++ git init repo &&
++ (
++ cd repo &&
++ git commit --allow-empty -m initial &&
++
++ for i in $(test_seq 20)
++ do
++ git update-ref branch-$i HEAD || return 1
++ done &&
++
++ # The repository should end up with multiple tables.
++ test_line_count ">" 1 .git/reftable/tables.list &&
++
++ git refs verify 2>err &&
++ test_must_be_empty err
++ )
++'
++
+for TABLE_NAME in "foo-bar-e4d12d59.ref" \
+ "0x00000000zzzz-0x00000000zzzz-e4d12d59.ref" \
+ "0x000000000001-0x000000000002-e4d12d59.abc" \
base-commit: a483264b01b977f3e65a4419103c21e6af7412a2
change-id: 20250714-228-reftable-introduce-consistency-checks-379ded93c544
Thanks
- Karthik
^ permalink raw reply [flat|nested] 96+ messages in thread* [PATCH v5 1/7] refs: remove unused headers
2025-10-06 14:22 ` [PATCH v5 " Karthik Nayak
@ 2025-10-06 14:22 ` Karthik Nayak
2025-10-06 14:23 ` [PATCH v5 2/7] refs: move consistency check msg to generic layer Karthik Nayak
` (6 subsequent siblings)
7 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-10-06 14:22 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, ps, gitster
In the 'refs/' namespace, some of the included header files are not
needed, let's remove them.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
refs/debug.c | 1 -
refs/files-backend.c | 1 -
refs/reftable-backend.c | 1 -
3 files changed, 3 deletions(-)
diff --git a/refs/debug.c b/refs/debug.c
index 1cb955961e..697adbd0dc 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -1,7 +1,6 @@
#include "git-compat-util.h"
#include "hex.h"
#include "refs-internal.h"
-#include "string-list.h"
#include "trace.h"
static struct trace_key trace_refs = TRACE_KEY_INIT(REFS);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 1b3bf26add..d4fb033417 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -20,7 +20,6 @@
#include "../dir-iterator.h"
#include "../lockfile.h"
#include "../object.h"
-#include "../object-file.h"
#include "../path.h"
#include "../dir.h"
#include "../chdir-notify.h"
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 9e889da2ff..2152349cb9 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -11,7 +11,6 @@
#include "../hex.h"
#include "../iterator.h"
#include "../ident.h"
-#include "../lockfile.h"
#include "../object.h"
#include "../path.h"
#include "../refs.h"
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* [PATCH v5 2/7] refs: move consistency check msg to generic layer
2025-10-06 14:22 ` [PATCH v5 " Karthik Nayak
2025-10-06 14:22 ` [PATCH v5 1/7] refs: remove unused headers Karthik Nayak
@ 2025-10-06 14:23 ` Karthik Nayak
2025-10-06 14:23 ` [PATCH v5 3/7] reftable: check for trailing newline in 'tables.list' Karthik Nayak
` (5 subsequent siblings)
7 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-10-06 14:23 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, ps, gitster
The files-backend prints a message before the consistency checks run.
Move this to the generic layer so both the files and reftable backend
can benefit from this message.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
refs.c | 4 ++++
refs/files-backend.c | 2 --
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/refs.c b/refs.c
index 4ff55cf24f..4a7c394226 100644
--- a/refs.c
+++ b/refs.c
@@ -32,6 +32,7 @@
#include "commit.h"
#include "wildmatch.h"
#include "ident.h"
+#include "fsck.h"
/*
* List of all available backends
@@ -323,6 +324,9 @@ int check_refname_format(const char *refname, int flags)
int refs_fsck(struct ref_store *refs, struct fsck_options *o,
struct worktree *wt)
{
+ if (o->verbose)
+ fprintf_ln(stderr, _("Checking references consistency"));
+
return refs->be->fsck(refs, o, wt);
}
diff --git a/refs/files-backend.c b/refs/files-backend.c
index d4fb033417..603b1343d8 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3906,8 +3906,6 @@ static int files_fsck_refs(struct ref_store *ref_store,
NULL,
};
- if (o->verbose)
- fprintf_ln(stderr, _("Checking references consistency"));
return files_fsck_refs_dir(ref_store, o, "refs", wt, fsck_refs_fn);
}
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* [PATCH v5 3/7] reftable: check for trailing newline in 'tables.list'
2025-10-06 14:22 ` [PATCH v5 " Karthik Nayak
2025-10-06 14:22 ` [PATCH v5 1/7] refs: remove unused headers Karthik Nayak
2025-10-06 14:23 ` [PATCH v5 2/7] refs: move consistency check msg to generic layer Karthik Nayak
@ 2025-10-06 14:23 ` Karthik Nayak
2025-10-06 14:23 ` [PATCH v5 4/7] Documentation/fsck-msgids: remove duplicate msg id Karthik Nayak
` (4 subsequent siblings)
7 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-10-06 14:23 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, ps, gitster
In the reftable format, the 'tables.list' file contains a
newline separated list of tables. While we parse this file, we do not
check or care about the last newline. Tighten the parser in
`parse_names()` to return an appropriate error if the last newline is
missing.
This requires modification to `parse_names()` to now return the error
while accepting the output as a third argument.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
reftable/basics.c | 37 ++++++++++++++++++++++++-------------
reftable/basics.h | 7 ++++---
reftable/stack.c | 7 +------
t/unit-tests/u-reftable-basics.c | 24 ++++++++++++++++++++----
4 files changed, 49 insertions(+), 26 deletions(-)
diff --git a/reftable/basics.c b/reftable/basics.c
index 9988ebd635..e969927b61 100644
--- a/reftable/basics.c
+++ b/reftable/basics.c
@@ -195,44 +195,55 @@ size_t names_length(const char **names)
return p - names;
}
-char **parse_names(char *buf, int size)
+int parse_names(char *buf, int size, char ***out)
{
char **names = NULL;
size_t names_cap = 0;
size_t names_len = 0;
char *p = buf;
char *end = buf + size;
+ int err = 0;
while (p < end) {
char *next = strchr(p, '\n');
- if (next && next < end) {
- *next = 0;
+ if (!next) {
+ err = REFTABLE_FORMAT_ERROR;
+ goto done;
+ } else if (next < end) {
+ *next = '\0';
} else {
next = end;
}
+
if (p < next) {
if (REFTABLE_ALLOC_GROW(names, names_len + 1,
- names_cap))
- goto err;
+ names_cap)) {
+ err = REFTABLE_OUT_OF_MEMORY_ERROR;
+ goto done;
+ }
names[names_len] = reftable_strdup(p);
- if (!names[names_len++])
- goto err;
+ if (!names[names_len++]) {
+ err = REFTABLE_OUT_OF_MEMORY_ERROR;
+ goto done;
+ }
}
p = next + 1;
}
- if (REFTABLE_ALLOC_GROW(names, names_len + 1, names_cap))
- goto err;
+ if (REFTABLE_ALLOC_GROW(names, names_len + 1, names_cap)) {
+ err = REFTABLE_OUT_OF_MEMORY_ERROR;
+ goto done;
+ }
names[names_len] = NULL;
- return names;
-
-err:
+ *out = names;
+ return 0;
+done:
for (size_t i = 0; i < names_len; i++)
reftable_free(names[i]);
reftable_free(names);
- return NULL;
+ return err;
}
int names_equal(const char **a, const char **b)
diff --git a/reftable/basics.h b/reftable/basics.h
index 7d22f96261..e4b83b2b03 100644
--- a/reftable/basics.h
+++ b/reftable/basics.h
@@ -167,10 +167,11 @@ void free_names(char **a);
/*
* Parse a newline separated list of names. `size` is the length of the buffer,
- * without terminating '\0'. Empty names are discarded. Returns a `NULL`
- * pointer when allocations fail.
+ * without terminating '\0'. Empty names are discarded.
+ *
+ * Returns 0 on success, a reftable error code on error.
*/
-char **parse_names(char *buf, int size);
+int parse_names(char *buf, int size, char ***out);
/* compares two NULL-terminated arrays of strings. */
int names_equal(const char **a, const char **b);
diff --git a/reftable/stack.c b/reftable/stack.c
index f91ce50bcd..65d89820bd 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -109,12 +109,7 @@ static int fd_read_lines(int fd, char ***namesp)
}
buf[size] = 0;
- *namesp = parse_names(buf, size);
- if (!*namesp) {
- err = REFTABLE_OUT_OF_MEMORY_ERROR;
- goto done;
- }
-
+ err = parse_names(buf, size, namesp);
done:
reftable_free(buf);
return err;
diff --git a/t/unit-tests/u-reftable-basics.c b/t/unit-tests/u-reftable-basics.c
index a0471083e7..73566ed0eb 100644
--- a/t/unit-tests/u-reftable-basics.c
+++ b/t/unit-tests/u-reftable-basics.c
@@ -9,6 +9,7 @@ license that can be found in the LICENSE file or at
#include "unit-test.h"
#include "lib-reftable.h"
#include "reftable/basics.h"
+#include "reftable/reftable-error.h"
struct integer_needle_lesseq_args {
int needle;
@@ -79,14 +80,18 @@ void test_reftable_basics__names_equal(void)
void test_reftable_basics__parse_names(void)
{
char in1[] = "line\n";
- char in2[] = "a\nb\nc";
- char **out = parse_names(in1, strlen(in1));
+ char in2[] = "a\nb\nc\n";
+ char **out = NULL;
+ int err = parse_names(in1, strlen(in1), &out);
+ cl_assert(err == 0);
cl_assert(out != NULL);
cl_assert_equal_s(out[0], "line");
cl_assert(!out[1]);
free_names(out);
- out = parse_names(in2, strlen(in2));
+ out = NULL;
+ err = parse_names(in2, strlen(in2), &out);
+ cl_assert(err == 0);
cl_assert(out != NULL);
cl_assert_equal_s(out[0], "a");
cl_assert_equal_s(out[1], "b");
@@ -95,10 +100,21 @@ void test_reftable_basics__parse_names(void)
free_names(out);
}
+void test_reftable_basics__parse_names_missing_newline(void)
+{
+ char in1[] = "line\nline2";
+ char **out = NULL;
+ int err = parse_names(in1, strlen(in1), &out);
+ cl_assert(err == REFTABLE_FORMAT_ERROR);
+ cl_assert(out == NULL);
+}
+
void test_reftable_basics__parse_names_drop_empty_string(void)
{
char in[] = "a\n\nb\n";
- char **out = parse_names(in, strlen(in));
+ char **out = NULL;
+ int err = parse_names(in, strlen(in), &out);
+ cl_assert(err == 0);
cl_assert(out != NULL);
cl_assert_equal_s(out[0], "a");
/* simply '\n' should be dropped as empty string */
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* [PATCH v5 4/7] Documentation/fsck-msgids: remove duplicate msg id
2025-10-06 14:22 ` [PATCH v5 " Karthik Nayak
` (2 preceding siblings ...)
2025-10-06 14:23 ` [PATCH v5 3/7] reftable: check for trailing newline in 'tables.list' Karthik Nayak
@ 2025-10-06 14:23 ` Karthik Nayak
2025-10-06 14:23 ` [PATCH v5 5/7] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
` (3 subsequent siblings)
7 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-10-06 14:23 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, ps, gitster
The `gitmodulesLarge` is repeated twice. Remove the second duplicate.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Documentation/fsck-msgids.adoc | 3 ---
1 file changed, 3 deletions(-)
diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 0ba4f9a27e..1c912615f9 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -104,9 +104,6 @@
`gitmodulesParse`::
(INFO) Could not parse `.gitmodules` blob.
-`gitmodulesLarge`;
- (ERROR) `.gitmodules` blob is too large to parse.
-
`gitmodulesPath`::
(ERROR) `.gitmodules` path is invalid.
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* [PATCH v5 5/7] fsck: order 'fsck_msg_type' alphabetically
2025-10-06 14:22 ` [PATCH v5 " Karthik Nayak
` (3 preceding siblings ...)
2025-10-06 14:23 ` [PATCH v5 4/7] Documentation/fsck-msgids: remove duplicate msg id Karthik Nayak
@ 2025-10-06 14:23 ` Karthik Nayak
2025-10-06 14:23 ` [PATCH v5 6/7] reftable: add code to facilitate consistency checks Karthik Nayak
` (2 subsequent siblings)
7 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-10-06 14:23 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, ps, gitster
The list of 'fsck_msg_type' seem to be alphabetically ordered, but there
are a few small misses. Fix this by sorting the sub-sections of the
list to maintain alphabetical ordering.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
fsck.h | 38 +++++++++++++++++++-------------------
1 file changed, 19 insertions(+), 19 deletions(-)
diff --git a/fsck.h b/fsck.h
index dd7df3d5b3..6b0db235e0 100644
--- a/fsck.h
+++ b/fsck.h
@@ -33,15 +33,27 @@ enum fsck_msg_type {
FUNC(BAD_PACKED_REF_ENTRY, ERROR) \
FUNC(BAD_PACKED_REF_HEADER, ERROR) \
FUNC(BAD_PARENT_SHA1, ERROR) \
+ FUNC(BAD_REFERENT_NAME, ERROR) \
FUNC(BAD_REF_CONTENT, ERROR) \
FUNC(BAD_REF_FILETYPE, ERROR) \
FUNC(BAD_REF_NAME, ERROR) \
- FUNC(BAD_REFERENT_NAME, ERROR) \
FUNC(BAD_TIMEZONE, ERROR) \
FUNC(BAD_TREE, ERROR) \
FUNC(BAD_TREE_SHA1, ERROR) \
FUNC(BAD_TYPE, ERROR) \
FUNC(DUPLICATE_ENTRIES, ERROR) \
+ FUNC(GITATTRIBUTES_BLOB, ERROR) \
+ FUNC(GITATTRIBUTES_LARGE, ERROR) \
+ FUNC(GITATTRIBUTES_LINE_LENGTH, ERROR) \
+ FUNC(GITATTRIBUTES_MISSING, ERROR) \
+ FUNC(GITMODULES_BLOB, ERROR) \
+ FUNC(GITMODULES_LARGE, ERROR) \
+ FUNC(GITMODULES_MISSING, ERROR) \
+ FUNC(GITMODULES_NAME, ERROR) \
+ FUNC(GITMODULES_PATH, ERROR) \
+ FUNC(GITMODULES_SYMLINK, ERROR) \
+ FUNC(GITMODULES_UPDATE, ERROR) \
+ FUNC(GITMODULES_URL, ERROR) \
FUNC(MISSING_AUTHOR, ERROR) \
FUNC(MISSING_COMMITTER, ERROR) \
FUNC(MISSING_EMAIL, ERROR) \
@@ -60,39 +72,27 @@ enum fsck_msg_type {
FUNC(TREE_NOT_SORTED, ERROR) \
FUNC(UNKNOWN_TYPE, ERROR) \
FUNC(ZERO_PADDED_DATE, ERROR) \
- FUNC(GITMODULES_MISSING, ERROR) \
- FUNC(GITMODULES_BLOB, ERROR) \
- FUNC(GITMODULES_LARGE, ERROR) \
- FUNC(GITMODULES_NAME, ERROR) \
- FUNC(GITMODULES_SYMLINK, ERROR) \
- FUNC(GITMODULES_URL, ERROR) \
- FUNC(GITMODULES_PATH, ERROR) \
- FUNC(GITMODULES_UPDATE, ERROR) \
- FUNC(GITATTRIBUTES_MISSING, ERROR) \
- FUNC(GITATTRIBUTES_LARGE, ERROR) \
- FUNC(GITATTRIBUTES_LINE_LENGTH, ERROR) \
- FUNC(GITATTRIBUTES_BLOB, ERROR) \
/* warnings */ \
FUNC(EMPTY_NAME, WARN) \
FUNC(FULL_PATHNAME, WARN) \
FUNC(HAS_DOT, WARN) \
FUNC(HAS_DOTDOT, WARN) \
FUNC(HAS_DOTGIT, WARN) \
+ FUNC(LARGE_PATHNAME, WARN) \
FUNC(NULL_SHA1, WARN) \
- FUNC(ZERO_PADDED_FILEMODE, WARN) \
FUNC(NUL_IN_COMMIT, WARN) \
- FUNC(LARGE_PATHNAME, WARN) \
+ FUNC(ZERO_PADDED_FILEMODE, WARN) \
/* infos (reported as warnings, but ignored by default) */ \
FUNC(BAD_FILEMODE, INFO) \
+ FUNC(BAD_TAG_NAME, INFO) \
FUNC(EMPTY_PACKED_REFS_FILE, INFO) \
- FUNC(GITMODULES_PARSE, INFO) \
- FUNC(GITIGNORE_SYMLINK, INFO) \
FUNC(GITATTRIBUTES_SYMLINK, INFO) \
+ FUNC(GITIGNORE_SYMLINK, INFO) \
+ FUNC(GITMODULES_PARSE, INFO) \
FUNC(MAILMAP_SYMLINK, INFO) \
- FUNC(BAD_TAG_NAME, INFO) \
FUNC(MISSING_TAGGER_ENTRY, INFO) \
- FUNC(SYMLINK_REF, INFO) \
FUNC(REF_MISSING_NEWLINE, INFO) \
+ FUNC(SYMLINK_REF, INFO) \
FUNC(SYMREF_TARGET_IS_NOT_A_REF, INFO) \
FUNC(TRAILING_REF_CONTENT, INFO) \
/* ignored (elevated when requested) */ \
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* [PATCH v5 6/7] reftable: add code to facilitate consistency checks
2025-10-06 14:22 ` [PATCH v5 " Karthik Nayak
` (4 preceding siblings ...)
2025-10-06 14:23 ` [PATCH v5 5/7] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
@ 2025-10-06 14:23 ` Karthik Nayak
2025-10-06 14:23 ` [PATCH v5 7/7] refs/reftable: add fsck check for checking the table name Karthik Nayak
2025-10-06 22:08 ` [PATCH v5 0/7] refs/reftable: add consistency checks Junio C Hamano
7 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-10-06 14:23 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, ps, gitster
The `git refs verify` command is used to run consistency checks on the
reference backends. This command is also invoked when users run 'git
fsck'. While the files-backend has some fsck checks added, the reftable
backend lacks such checks. Let's add the required infrastructure and a
check to test for the files present in the reftable directory.
Since the reftable library is treated as an independent library we
should ensure that the library code works independently without
knowledge about Git's internals. To do this, add both 'reftable/fsck.c'
and 'reftable/reftable-fsck.h'. Which provide an entry point
'reftable_fsck_check' for running fsck checks over a provided reftable
stack. The callee provides the function with callbacks to handle issue
and information reporting.
The added check, goes over all tables in the reftable stack validates
that they have a valid name. It not, it raises an error.
While here, move 'reftable/error.o' in the Makefile to retain
lexicographic ordering.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Makefile | 3 +-
meson.build | 1 +
reftable/fsck.c | 100 +++++++++++++++++++++++++++++++++++++++++++++++
reftable/reftable-fsck.h | 40 +++++++++++++++++++
4 files changed, 143 insertions(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index 4c95affadb..03fbaf2b21 100644
--- a/Makefile
+++ b/Makefile
@@ -2732,9 +2732,10 @@ XDIFF_OBJS += xdiff/xutils.o
xdiff-objs: $(XDIFF_OBJS)
REFTABLE_OBJS += reftable/basics.o
-REFTABLE_OBJS += reftable/error.o
REFTABLE_OBJS += reftable/block.o
REFTABLE_OBJS += reftable/blocksource.o
+REFTABLE_OBJS += reftable/error.o
+REFTABLE_OBJS += reftable/fsck.o
REFTABLE_OBJS += reftable/iter.o
REFTABLE_OBJS += reftable/merged.o
REFTABLE_OBJS += reftable/pq.o
diff --git a/meson.build b/meson.build
index b3dfcc0497..8914252910 100644
--- a/meson.build
+++ b/meson.build
@@ -452,6 +452,7 @@ libgit_sources = [
'reftable/error.c',
'reftable/block.c',
'reftable/blocksource.c',
+ 'reftable/fsck.c',
'reftable/iter.c',
'reftable/merged.c',
'reftable/pq.c',
diff --git a/reftable/fsck.c b/reftable/fsck.c
new file mode 100644
index 0000000000..26b9115b14
--- /dev/null
+++ b/reftable/fsck.c
@@ -0,0 +1,100 @@
+#include "basics.h"
+#include "reftable-fsck.h"
+#include "reftable-table.h"
+#include "stack.h"
+
+static bool table_has_valid_name(const char *name)
+{
+ const char *ptr = name;
+ char *endptr;
+
+ /* strtoull doesn't set errno on success */
+ errno = 0;
+
+ strtoull(ptr, &endptr, 16);
+ if (errno)
+ return false;
+ ptr = endptr;
+
+ if (*ptr != '-')
+ return false;
+ ptr++;
+
+ strtoull(ptr, &endptr, 16);
+ if (errno)
+ return false;
+ ptr = endptr;
+
+ if (*ptr != '-')
+ return false;
+ ptr++;
+
+ strtoul(ptr, &endptr, 16);
+ if (errno)
+ return false;
+ ptr = endptr;
+
+ if (strcmp(ptr, ".ref") && strcmp(ptr, ".log"))
+ return false;
+
+ return true;
+}
+
+typedef int (*table_check_fn)(struct reftable_table *table,
+ reftable_fsck_report_fn report_fn,
+ void *cb_data);
+
+static int table_check_name(struct reftable_table *table,
+ reftable_fsck_report_fn report_fn,
+ void *cb_data)
+{
+ if (!table_has_valid_name(table->name)) {
+ struct reftable_fsck_info info;
+
+ info.error = REFTABLE_FSCK_ERROR_TABLE_NAME;
+ info.msg = "invalid reftable table name";
+ info.path = table->name;
+
+ return report_fn(&info, cb_data);
+ }
+
+ return 0;
+}
+
+static int table_checks(struct reftable_table *table,
+ reftable_fsck_report_fn report_fn,
+ reftable_fsck_verbose_fn verbose_fn UNUSED,
+ void *cb_data)
+{
+ table_check_fn table_check_fns[] = {
+ table_check_name,
+ NULL,
+ };
+ int err = 0;
+
+ for (size_t i = 0; table_check_fns[i]; i++)
+ err |= table_check_fns[i](table, report_fn, cb_data);
+
+ return err;
+}
+
+int reftable_fsck_check(struct reftable_stack *stack,
+ reftable_fsck_report_fn report_fn,
+ reftable_fsck_verbose_fn verbose_fn,
+ void *cb_data)
+{
+ struct reftable_buf msg = REFTABLE_BUF_INIT;
+ int err = 0;
+
+ for (size_t i = 0; i < stack->tables_len; i++) {
+ reftable_buf_reset(&msg);
+ reftable_buf_addstr(&msg, "Checking table: ");
+ reftable_buf_addstr(&msg, stack->tables[i]->name);
+ verbose_fn(msg.buf, cb_data);
+
+ err |= table_checks(stack->tables[i], report_fn, verbose_fn, cb_data);
+ }
+
+ reftable_buf_release(&msg);
+ return err;
+}
diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
new file mode 100644
index 0000000000..007a392cf9
--- /dev/null
+++ b/reftable/reftable-fsck.h
@@ -0,0 +1,40 @@
+#ifndef REFTABLE_FSCK_H
+#define REFTABLE_FSCK_H
+
+#include "reftable-stack.h"
+
+enum reftable_fsck_error {
+ /* Invalid table name */
+ REFTABLE_FSCK_ERROR_TABLE_NAME = 0,
+ /* Used for bounds checking, must be last */
+ REFTABLE_FSCK_MAX_VALUE,
+};
+
+/* Represents an individual error encountered during the FSCK checks. */
+struct reftable_fsck_info {
+ enum reftable_fsck_error error;
+ const char *msg;
+ const char *path;
+};
+
+typedef int reftable_fsck_report_fn(struct reftable_fsck_info *info,
+ void *cb_data);
+typedef void reftable_fsck_verbose_fn(const char *msg, void *cb_data);
+
+/*
+ * Given a reftable stack, perform consistency checks on the stack.
+ *
+ * If an issue is encountered, the issue is reported to the callee via the
+ * provided 'report_fn'. If the issue is non-recoverable the flow will not
+ * continue. If it is recoverable, the flow will continue and further issues
+ * will be reported as identified.
+ *
+ * The 'verbose_fn' will be invoked to provide verbose information about
+ * the progress and state of the consistency checks.
+ */
+int reftable_fsck_check(struct reftable_stack *stack,
+ reftable_fsck_report_fn report_fn,
+ reftable_fsck_verbose_fn verbose_fn,
+ void *cb_data);
+
+#endif /* REFTABLE_FSCK_H */
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* [PATCH v5 7/7] refs/reftable: add fsck check for checking the table name
2025-10-06 14:22 ` [PATCH v5 " Karthik Nayak
` (5 preceding siblings ...)
2025-10-06 14:23 ` [PATCH v5 6/7] reftable: add code to facilitate consistency checks Karthik Nayak
@ 2025-10-06 14:23 ` Karthik Nayak
2025-10-07 2:32 ` Jeff King
2025-10-06 22:08 ` [PATCH v5 0/7] refs/reftable: add consistency checks Junio C Hamano
7 siblings, 1 reply; 96+ messages in thread
From: Karthik Nayak @ 2025-10-06 14:23 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, ps, gitster
Add glue code in 'refs/reftable-backend.c' which calls the reftable
library to perform the fsck checks. Here we also map the reftable errors
to Git' fsck errors.
Introduce a check to validate table names for a given reftable stack.
Also add 'badReftableTableName' as a corresponding error within Git. The
reftable specification mentions:
It suggested to use
${min_update_index}-${max_update_index}-${random}.ref as a naming
convention.
So treat non-conformant file names as warnings.
While adding the fsck header to 'refs/reftable-backend.c', modify the
list to maintain lexicographical ordering.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Documentation/fsck-msgids.adoc | 3 +++
fsck.h | 1 +
refs/reftable-backend.c | 57 +++++++++++++++++++++++++++++++++++++----
t/meson.build | 1 +
t/t0614-reftable-fsck.sh | 58 ++++++++++++++++++++++++++++++++++++++++++
5 files changed, 115 insertions(+), 5 deletions(-)
diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 1c912615f9..81f11ba125 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -38,6 +38,9 @@
`badReferentName`::
(ERROR) The referent name of a symref is invalid.
+`badReftableTableName`::
+ (WARN) A reftable table has an invalid name.
+
`badTagName`::
(INFO) A tag has an invalid format.
diff --git a/fsck.h b/fsck.h
index 6b0db235e0..759df97655 100644
--- a/fsck.h
+++ b/fsck.h
@@ -73,6 +73,7 @@ enum fsck_msg_type {
FUNC(UNKNOWN_TYPE, ERROR) \
FUNC(ZERO_PADDED_DATE, ERROR) \
/* warnings */ \
+ FUNC(BAD_REFTABLE_TABLE_NAME, WARN) \
FUNC(EMPTY_NAME, WARN) \
FUNC(FULL_PATHNAME, WARN) \
FUNC(HAS_DOT, WARN) \
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 2152349cb9..b106fd8b53 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -6,6 +6,7 @@
#include "../config.h"
#include "../dir.h"
#include "../environment.h"
+#include "../fsck.h"
#include "../gettext.h"
#include "../hash.h"
#include "../hex.h"
@@ -15,10 +16,11 @@
#include "../path.h"
#include "../refs.h"
#include "../reftable/reftable-basics.h"
-#include "../reftable/reftable-stack.h"
-#include "../reftable/reftable-record.h"
#include "../reftable/reftable-error.h"
+#include "../reftable/reftable-fsck.h"
#include "../reftable/reftable-iterator.h"
+#include "../reftable/reftable-record.h"
+#include "../reftable/reftable-stack.h"
#include "../repo-settings.h"
#include "../setup.h"
#include "../strmap.h"
@@ -2707,11 +2709,56 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
return ret;
}
-static int reftable_be_fsck(struct ref_store *ref_store UNUSED,
- struct fsck_options *o UNUSED,
+static void reftable_fsck_verbose_handler(const char *msg, void *cb_data)
+{
+ struct fsck_options *o = cb_data;
+
+ if (o->verbose)
+ fprintf_ln(stderr, "%s", msg);
+}
+
+static const enum fsck_msg_id fsck_msg_id_map[] = {
+ [REFTABLE_FSCK_ERROR_TABLE_NAME] = FSCK_MSG_BAD_REFTABLE_TABLE_NAME,
+};
+
+static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
+ void *cb_data)
+{
+ struct fsck_ref_report report = { .path = info->path };
+ struct fsck_options *o = cb_data;
+ enum fsck_msg_id msg_id;
+
+ if (info->error < 0 || info->error >= REFTABLE_FSCK_MAX_VALUE)
+ BUG("unknown fsck error: %d", (int)info->error);
+
+ msg_id = fsck_msg_id_map[info->error];
+
+ if (!msg_id)
+ BUG("fsck_msg_id value missing for reftable error: %d", (int)info->error);
+
+ return fsck_report_ref(o, &report, msg_id, "%s", info->msg);
+}
+
+static int reftable_be_fsck(struct ref_store *ref_store, struct fsck_options *o,
struct worktree *wt UNUSED)
{
- return 0;
+ struct reftable_ref_store *refs;
+ struct strmap_entry *entry;
+ struct hashmap_iter iter;
+ int ret = 0;
+
+ refs = reftable_be_downcast(ref_store, REF_STORE_READ, "fsck");
+
+ ret |= reftable_fsck_check(refs->main_backend.stack, reftable_fsck_error_handler,
+ reftable_fsck_verbose_handler, o);
+
+ strmap_for_each_entry(&refs->worktree_backends, &iter, entry) {
+ struct reftable_backend *b = (struct reftable_backend *)entry->value;
+ ret |= reftable_fsck_check(b->stack, reftable_fsck_error_handler,
+ reftable_fsck_verbose_handler, o);
+ }
+
+ return ret;
}
struct ref_storage_be refs_be_reftable = {
diff --git a/t/meson.build b/t/meson.build
index 7974795fe4..ec1fc0b2a1 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -146,6 +146,7 @@ integration_tests = [
't0611-reftable-httpd.sh',
't0612-reftable-jgit-compatibility.sh',
't0613-reftable-write-options.sh',
+ 't0614-reftable-fsck.sh',
't1000-read-tree-m-3way.sh',
't1001-read-tree-m-2way.sh',
't1002-read-tree-m-u-2way.sh',
diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
new file mode 100755
index 0000000000..a5be279ab3
--- /dev/null
+++ b/t/t0614-reftable-fsck.sh
@@ -0,0 +1,58 @@
+#!/bin/sh
+
+test_description='Test reftable backend consistency check'
+
+GIT_TEST_DEFAULT_REF_FORMAT=reftable
+export GIT_TEST_DEFAULT_REF_FORMAT
+
+. ./test-lib.sh
+
+test_expect_success "no errors reported on a well formed repository" '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git commit --allow-empty -m initial &&
+
+ for i in $(test_seq 20)
+ do
+ git update-ref branch-$i HEAD || return 1
+ done &&
+
+ # The repository should end up with multiple tables.
+ test_line_count ">" 1 .git/reftable/tables.list &&
+
+ git refs verify 2>err &&
+ test_must_be_empty err
+ )
+'
+
+for TABLE_NAME in "foo-bar-e4d12d59.ref" \
+ "0x00000000zzzz-0x00000000zzzz-e4d12d59.ref" \
+ "0x000000000001-0x000000000002-e4d12d59.abc" \
+ "0x000000000001-0x000000000002-e4d12d59.refabc"; do
+ test_expect_success "table name $TABLE_NAME should be checked" '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git commit --allow-empty -m initial &&
+
+ git refs verify 2>err &&
+ test_must_be_empty err &&
+
+ EXISTING_TABLE=$(head -n1 .git/reftable/tables.list) &&
+ mv ".git/reftable/$EXISTING_TABLE" ".git/reftable/$TABLE_NAME" &&
+ sed "s/${EXISTING_TABLE}/${TABLE_NAME}/g" .git/reftable/tables.list > tables.list &&
+ mv tables.list .git/reftable/tables.list &&
+
+ git refs verify 2>err &&
+ cat >expect <<-EOF &&
+ warning: ${TABLE_NAME}: badReftableTableName: invalid reftable table name
+ EOF
+ test_cmp expect err
+ )
+ '
+done
+
+test_done
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* Re: [PATCH v5 7/7] refs/reftable: add fsck check for checking the table name
2025-10-06 14:23 ` [PATCH v5 7/7] refs/reftable: add fsck check for checking the table name Karthik Nayak
@ 2025-10-07 2:32 ` Jeff King
2025-10-07 8:45 ` Karthik Nayak
0 siblings, 1 reply; 96+ messages in thread
From: Jeff King @ 2025-10-07 2:32 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, ps, gitster
On Mon, Oct 06, 2025 at 04:23:05PM +0200, Karthik Nayak wrote:
> +test_expect_success "no errors reported on a well formed repository" '
> + test_when_finished "rm -rf repo" &&
> + git init repo &&
> + (
> + cd repo &&
> + git commit --allow-empty -m initial &&
> +
> + for i in $(test_seq 20)
> + do
> + git update-ref branch-$i HEAD || return 1
> + done &&
Did you mean refs/heads/branch-$i here? As it is written, it creates a
root ref, and the name does not conform to the usual rules (all-caps,
and ending in _HEAD). There are some holes in our checks, which is why
it doesn't barf yet, but I have a series to fix that which I hope to
send out later this week.
> + # The repository should end up with multiple tables.
> + test_line_count ">" 1 .git/reftable/tables.list &&
> +
> + git refs verify 2>err &&
> + test_must_be_empty err
> + )
Arguably this verify command should be complaining about the broken
names, too.
-Peff
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v5 7/7] refs/reftable: add fsck check for checking the table name
2025-10-07 2:32 ` Jeff King
@ 2025-10-07 8:45 ` Karthik Nayak
0 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-10-07 8:45 UTC (permalink / raw)
To: Jeff King; +Cc: git, ps, gitster
[-- Attachment #1: Type: text/plain, Size: 1539 bytes --]
Jeff King <peff@peff.net> writes:
> On Mon, Oct 06, 2025 at 04:23:05PM +0200, Karthik Nayak wrote:
>
>> +test_expect_success "no errors reported on a well formed repository" '
>> + test_when_finished "rm -rf repo" &&
>> + git init repo &&
>> + (
>> + cd repo &&
>> + git commit --allow-empty -m initial &&
>> +
>> + for i in $(test_seq 20)
>> + do
>> + git update-ref branch-$i HEAD || return 1
>> + done &&
>
> Did you mean refs/heads/branch-$i here? As it is written, it creates a
> root ref, and the name does not conform to the usual rules (all-caps,
> and ending in _HEAD). There are some holes in our checks, which is why
> it doesn't barf yet, but I have a series to fix that which I hope to
> send out later this week.
>
Yeah, this was definitely a miss on my side. It works because currently
we haven't yet added reference level checks to reftables.
This series only adds stack/table level checks.
>> + # The repository should end up with multiple tables.
>> + test_line_count ">" 1 .git/reftable/tables.list &&
>> +
>> + git refs verify 2>err &&
>> + test_must_be_empty err
>> + )
>
> Arguably this verify command should be complaining about the broken
> names, too.
>
Yes, eventually it will when we implement reference level checks. Since
that's missing, it currently doesn't barf.
It does work as-is and we could leave it at that, until we actually
implement the reference level checks. But I think a quick re-roll will
avoid future confusion.
> -Peff
Thanks for the review. Looking forward to your series.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v5 0/7] refs/reftable: add consistency checks
2025-10-06 14:22 ` [PATCH v5 " Karthik Nayak
` (6 preceding siblings ...)
2025-10-06 14:23 ` [PATCH v5 7/7] refs/reftable: add fsck check for checking the table name Karthik Nayak
@ 2025-10-06 22:08 ` Junio C Hamano
2025-10-07 8:47 ` Karthik Nayak
7 siblings, 1 reply; 96+ messages in thread
From: Junio C Hamano @ 2025-10-06 22:08 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, ps
Karthik Nayak <karthik.188@gmail.com> writes:
> The reference subsystems allows for adding backend specific consistency
> checks. These checks are run as part of 'git refs verify'.
>
> While the files backend has some consistency checks added, the reftable
> backend currently has none. This series first tightens the reftable
> backend to make it a little more strict and then also adds the required
> infrastructure and some simple consistency checks.
>
> Since the reftable backend is treated as a library within the Git
> codebase, we don't want to spillover our internal fsck implementation
> into the library. At the same time, the fsck checks need to access
> internal structures of the reftable library which aren't exposed outside
> the library.
>
> So we solve this by adding a 'reftable/fsck.[ch]' which implements and
> exposes a checker for the reftable library and returns specific errors
> as defined by the library. We then add glue code within
> 'refs/reftable-backend.c' to map these errors to errors which Git's fsck
> implementation would understand. This allows us to separate concerns.
>
> We add the following consistency checks:
>
> 1. Check for validating the reftable table name. This is treated as a
> warning since the reftable specification only suggests a table name
> but doesn't enforce it. Also there is a difference in the table name
> used in Git vs that in jGit.
>
> We tighten the reftable backend by raising a REFTABLE_FORMAT_ERROR error
> when:
>
> 1. The 'tables.list' file doesn't have a trailing newline.
>
> ---
> Changes in v5:
> - Added documentation around the return value of 'parse_names()'.
> - Added a test to validate that 'git refs verify' doesn't barf against
> a clean working repository with multiple reftable tables.
> - Link to v4: https://lore.kernel.org/all/20250926-228-reftable-introduce-consistency-checks-v4-0-c96fd8551c0d@gmail.com
Looking good. Shall we declare victory and mark the topic for
'next' now?
Thanks.
^ permalink raw reply [flat|nested] 96+ messages in thread* Re: [PATCH v5 0/7] refs/reftable: add consistency checks
2025-10-06 22:08 ` [PATCH v5 0/7] refs/reftable: add consistency checks Junio C Hamano
@ 2025-10-07 8:47 ` Karthik Nayak
2025-10-07 15:11 ` Junio C Hamano
0 siblings, 1 reply; 96+ messages in thread
From: Karthik Nayak @ 2025-10-07 8:47 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, ps
[-- Attachment #1: Type: text/plain, Size: 2423 bytes --]
Junio C Hamano <gitster@pobox.com> writes:
> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> The reference subsystems allows for adding backend specific consistency
>> checks. These checks are run as part of 'git refs verify'.
>>
>> While the files backend has some consistency checks added, the reftable
>> backend currently has none. This series first tightens the reftable
>> backend to make it a little more strict and then also adds the required
>> infrastructure and some simple consistency checks.
>>
>> Since the reftable backend is treated as a library within the Git
>> codebase, we don't want to spillover our internal fsck implementation
>> into the library. At the same time, the fsck checks need to access
>> internal structures of the reftable library which aren't exposed outside
>> the library.
>>
>> So we solve this by adding a 'reftable/fsck.[ch]' which implements and
>> exposes a checker for the reftable library and returns specific errors
>> as defined by the library. We then add glue code within
>> 'refs/reftable-backend.c' to map these errors to errors which Git's fsck
>> implementation would understand. This allows us to separate concerns.
>>
>> We add the following consistency checks:
>>
>> 1. Check for validating the reftable table name. This is treated as a
>> warning since the reftable specification only suggests a table name
>> but doesn't enforce it. Also there is a difference in the table name
>> used in Git vs that in jGit.
>>
>> We tighten the reftable backend by raising a REFTABLE_FORMAT_ERROR error
>> when:
>>
>> 1. The 'tables.list' file doesn't have a trailing newline.
>>
>> ---
>> Changes in v5:
>> - Added documentation around the return value of 'parse_names()'.
>> - Added a test to validate that 'git refs verify' doesn't barf against
>> a clean working repository with multiple reftable tables.
>> - Link to v4: https://lore.kernel.org/all/20250926-228-reftable-introduce-consistency-checks-v4-0-c96fd8551c0d@gmail.com
>
> Looking good. Shall we declare victory and mark the topic for
> 'next' now?
>
> Thanks.
Peff pointed out a mistake in my test, where I create root refs instead
of branches. This works without issues as we don't yet have reference
level checks on reftables. While it is good as is, I do think it is
confusing, so will send in a new version with a fix. Let's hold out for
that and we can merge that to 'next'.
Thanks,
Karthik
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: [PATCH v5 0/7] refs/reftable: add consistency checks
2025-10-07 8:47 ` Karthik Nayak
@ 2025-10-07 15:11 ` Junio C Hamano
0 siblings, 0 replies; 96+ messages in thread
From: Junio C Hamano @ 2025-10-07 15:11 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, ps
Karthik Nayak <karthik.188@gmail.com> writes:
> Peff pointed out a mistake in my test, where I create root refs instead
> of branches. This works without issues as we don't yet have reference
> level checks on reftables. While it is good as is, I do think it is
> confusing, so will send in a new version with a fix. Let's hold out for
> that and we can merge that to 'next'.
Yup, I saw that exchange on the "oops you missed refs/heads/!",
which I also missed. Thanks, let me mark it to expect an update.
^ permalink raw reply [flat|nested] 96+ messages in thread
* [PATCH v6 0/7] refs/reftable: add consistency checks
2025-08-19 12:20 [PATCH 0/5] refs/reftable: add fsck checks Karthik Nayak
` (9 preceding siblings ...)
2025-10-06 14:22 ` [PATCH v5 " Karthik Nayak
@ 2025-10-07 12:11 ` Karthik Nayak
2025-10-07 12:11 ` [PATCH v6 1/7] refs: remove unused headers Karthik Nayak
` (7 more replies)
10 siblings, 8 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-10-07 12:11 UTC (permalink / raw)
To: git; +Cc: ps, gitster, peff, Karthik Nayak
The reference subsystems allows for adding backend specific consistency
checks. These checks are run as part of 'git refs verify'.
While the files backend has some consistency checks added, the reftable
backend currently has none. This series first tightens the reftable
backend to make it a little more strict and then also adds the required
infrastructure and some simple consistency checks.
Since the reftable backend is treated as a library within the Git
codebase, we don't want to spillover our internal fsck implementation
into the library. At the same time, the fsck checks need to access
internal structures of the reftable library which aren't exposed outside
the library.
So we solve this by adding a 'reftable/fsck.[ch]' which implements and
exposes a checker for the reftable library and returns specific errors
as defined by the library. We then add glue code within
'refs/reftable-backend.c' to map these errors to errors which Git's fsck
implementation would understand. This allows us to separate concerns.
We add the following consistency checks:
1. Check for validating the reftable table name. This is treated as a
warning since the reftable specification only suggests a table name
but doesn't enforce it. Also there is a difference in the table name
used in Git vs that in jGit.
We tighten the reftable backend by raising a REFTABLE_FORMAT_ERROR error
when:
1. The 'tables.list' file doesn't have a trailing newline.
---
Changes in v6:
- In t/t0614-reftable-fsck.sh, create branches instead of root refs.
This worked becuase we don't have reference level checks still
implemented for reftables. Let's avoid confusion of a breaking test
when we add reference level checks.
- Link to v5: https://lore.kernel.org/r/20251006-228-reftable-introduce-consistency-checks-v5-0-f196d386214f@gmail.com
Changes in v5:
- Added documentation around the return value of 'parse_names()'.
- Added a test to validate that 'git refs verify' doesn't barf against
a clean working repository with multiple reftable tables.
- Link to v4: https://lore.kernel.org/all/20250926-228-reftable-introduce-consistency-checks-v4-0-c96fd8551c0d@gmail.com
Changes in v4:
- The biggest change is to iterate over the tables in a reftable stack
for consistency checks instead of all files inside the REFTABLE_DIR.
This avoids all race conditions. Also, since we only check the tables
in a stack, it no longer makes sense to check file type.
- The discussion about update indices was concluded that tables indices
in a stack must be strictly monotonically increasing. While modifying
the code to do the same. I realized that we already have this check in
'reftable_addition_add()' where we check while adding a new table to
the stack: `wr->min_update_index < add->next_update_index`. So I've
dropped this patch from the series.
- Change parse_names() to accept the output string array as an argument
and return an error instead. This makes the flow a little easier to
understand.
- Link to v3: https://lore.kernel.org/r/20250918-228-reftable-introduce-consistency-checks-v3-0-271af03eb34d@gmail.com
Changes in v3:
- I took a long hiatus from this topic, mostly due to other priorities.
This has been rebased on top of '92c87bdc40 (The eighth batch,
2025-09-12)' since there were conflicts.
- Junio suggested that two of the consistency checks (trailing newlines,
sequential update indices for tables in stack) should actually be
checked during runtime. I have made that change in this version.
- I've cleaned up the code and modularized the 'reftable/fsck.c' code.
- Invalid table name emits a warning, since the reftable spec doesn't
enforce it but only makes a suggestion.
- Broken down the commits to make it easier to review.
- Link to v2: https://lore.kernel.org/r/20250902-228-reftable-introduce-consistency-checks-v2-0-4f96b3834779@gmail.com
Changes in v2:
- Ensured that 'struct reftable_fsck_info' is passed around as a
pointer, this provides a smaller footprint (pointer size vs struct
size).
- Run FSCK checks for other worktrees too, even if one of them fails.
- Separate messaging for table name vs table check and add additional
test.
- Use the relative path in messages used.
- Small style and typo fixes.
- Link to v1: https://lore.kernel.org/r/20250819-228-reftable-introduce-consistency-checks-v1-0-8b8f6879fa9e@gmail.com
---
Documentation/fsck-msgids.adoc | 6 +--
Makefile | 3 +-
fsck.h | 39 +++++++--------
meson.build | 1 +
refs.c | 4 ++
refs/debug.c | 1 -
refs/files-backend.c | 3 --
refs/reftable-backend.c | 58 ++++++++++++++++++++---
reftable/basics.c | 37 ++++++++++-----
reftable/basics.h | 7 +--
reftable/fsck.c | 100 +++++++++++++++++++++++++++++++++++++++
reftable/reftable-fsck.h | 40 ++++++++++++++++
reftable/stack.c | 7 +--
t/meson.build | 1 +
t/t0614-reftable-fsck.sh | 58 +++++++++++++++++++++++
t/unit-tests/u-reftable-basics.c | 24 ++++++++--
16 files changed, 330 insertions(+), 59 deletions(-)
Karthik Nayak (7):
refs: remove unused headers
refs: move consistency check msg to generic layer
reftable: check for trailing newline in 'tables.list'
Documentation/fsck-msgids: remove duplicate msg id
fsck: order 'fsck_msg_type' alphabetically
reftable: add code to facilitate consistency checks
refs/reftable: add fsck check for checking the table name
Range-diff versus v5:
1: 85480cbb60 = 1: 671a79a3af refs: remove unused headers
2: b8fdad314a = 2: dbf9df8d3c refs: move consistency check msg to generic layer
3: 4ce029ed8e = 3: dbc478dbe6 reftable: check for trailing newline in 'tables.list'
4: 50655b2272 = 4: 062d66f7ed Documentation/fsck-msgids: remove duplicate msg id
5: 0b4c2295d9 = 5: a70974a39c fsck: order 'fsck_msg_type' alphabetically
6: 2abcaa9b23 = 6: a1dea4335e reftable: add code to facilitate consistency checks
7: 1f59191f22 ! 7: dcd172827b refs/reftable: add fsck check for checking the table name
@@ t/t0614-reftable-fsck.sh (new)
+
+ for i in $(test_seq 20)
+ do
-+ git update-ref branch-$i HEAD || return 1
++ git update-ref refs/heads/branch-$i HEAD || return 1
+ done &&
+
+ # The repository should end up with multiple tables.
base-commit: a483264b01b977f3e65a4419103c21e6af7412a2
change-id: 20250714-228-reftable-introduce-consistency-checks-379ded93c544
Thanks
- Karthik
^ permalink raw reply [flat|nested] 96+ messages in thread* [PATCH v6 1/7] refs: remove unused headers
2025-10-07 12:11 ` [PATCH v6 " Karthik Nayak
@ 2025-10-07 12:11 ` Karthik Nayak
2025-10-07 12:11 ` [PATCH v6 2/7] refs: move consistency check msg to generic layer Karthik Nayak
` (6 subsequent siblings)
7 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-10-07 12:11 UTC (permalink / raw)
To: git; +Cc: ps, gitster, peff, Karthik Nayak
In the 'refs/' namespace, some of the included header files are not
needed, let's remove them.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
refs/debug.c | 1 -
refs/files-backend.c | 1 -
refs/reftable-backend.c | 1 -
3 files changed, 3 deletions(-)
diff --git a/refs/debug.c b/refs/debug.c
index 1cb955961e..697adbd0dc 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -1,7 +1,6 @@
#include "git-compat-util.h"
#include "hex.h"
#include "refs-internal.h"
-#include "string-list.h"
#include "trace.h"
static struct trace_key trace_refs = TRACE_KEY_INIT(REFS);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 1b3bf26add..d4fb033417 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -20,7 +20,6 @@
#include "../dir-iterator.h"
#include "../lockfile.h"
#include "../object.h"
-#include "../object-file.h"
#include "../path.h"
#include "../dir.h"
#include "../chdir-notify.h"
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 9e889da2ff..2152349cb9 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -11,7 +11,6 @@
#include "../hex.h"
#include "../iterator.h"
#include "../ident.h"
-#include "../lockfile.h"
#include "../object.h"
#include "../path.h"
#include "../refs.h"
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* [PATCH v6 2/7] refs: move consistency check msg to generic layer
2025-10-07 12:11 ` [PATCH v6 " Karthik Nayak
2025-10-07 12:11 ` [PATCH v6 1/7] refs: remove unused headers Karthik Nayak
@ 2025-10-07 12:11 ` Karthik Nayak
2025-10-07 12:11 ` [PATCH v6 3/7] reftable: check for trailing newline in 'tables.list' Karthik Nayak
` (5 subsequent siblings)
7 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-10-07 12:11 UTC (permalink / raw)
To: git; +Cc: ps, gitster, peff, Karthik Nayak
The files-backend prints a message before the consistency checks run.
Move this to the generic layer so both the files and reftable backend
can benefit from this message.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
refs.c | 4 ++++
refs/files-backend.c | 2 --
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/refs.c b/refs.c
index 4ff55cf24f..4a7c394226 100644
--- a/refs.c
+++ b/refs.c
@@ -32,6 +32,7 @@
#include "commit.h"
#include "wildmatch.h"
#include "ident.h"
+#include "fsck.h"
/*
* List of all available backends
@@ -323,6 +324,9 @@ int check_refname_format(const char *refname, int flags)
int refs_fsck(struct ref_store *refs, struct fsck_options *o,
struct worktree *wt)
{
+ if (o->verbose)
+ fprintf_ln(stderr, _("Checking references consistency"));
+
return refs->be->fsck(refs, o, wt);
}
diff --git a/refs/files-backend.c b/refs/files-backend.c
index d4fb033417..603b1343d8 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3906,8 +3906,6 @@ static int files_fsck_refs(struct ref_store *ref_store,
NULL,
};
- if (o->verbose)
- fprintf_ln(stderr, _("Checking references consistency"));
return files_fsck_refs_dir(ref_store, o, "refs", wt, fsck_refs_fn);
}
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* [PATCH v6 3/7] reftable: check for trailing newline in 'tables.list'
2025-10-07 12:11 ` [PATCH v6 " Karthik Nayak
2025-10-07 12:11 ` [PATCH v6 1/7] refs: remove unused headers Karthik Nayak
2025-10-07 12:11 ` [PATCH v6 2/7] refs: move consistency check msg to generic layer Karthik Nayak
@ 2025-10-07 12:11 ` Karthik Nayak
2025-10-07 12:11 ` [PATCH v6 4/7] Documentation/fsck-msgids: remove duplicate msg id Karthik Nayak
` (4 subsequent siblings)
7 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-10-07 12:11 UTC (permalink / raw)
To: git; +Cc: ps, gitster, peff, Karthik Nayak
In the reftable format, the 'tables.list' file contains a
newline separated list of tables. While we parse this file, we do not
check or care about the last newline. Tighten the parser in
`parse_names()` to return an appropriate error if the last newline is
missing.
This requires modification to `parse_names()` to now return the error
while accepting the output as a third argument.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
reftable/basics.c | 37 ++++++++++++++++++++++++-------------
reftable/basics.h | 7 ++++---
reftable/stack.c | 7 +------
t/unit-tests/u-reftable-basics.c | 24 ++++++++++++++++++++----
4 files changed, 49 insertions(+), 26 deletions(-)
diff --git a/reftable/basics.c b/reftable/basics.c
index 9988ebd635..e969927b61 100644
--- a/reftable/basics.c
+++ b/reftable/basics.c
@@ -195,44 +195,55 @@ size_t names_length(const char **names)
return p - names;
}
-char **parse_names(char *buf, int size)
+int parse_names(char *buf, int size, char ***out)
{
char **names = NULL;
size_t names_cap = 0;
size_t names_len = 0;
char *p = buf;
char *end = buf + size;
+ int err = 0;
while (p < end) {
char *next = strchr(p, '\n');
- if (next && next < end) {
- *next = 0;
+ if (!next) {
+ err = REFTABLE_FORMAT_ERROR;
+ goto done;
+ } else if (next < end) {
+ *next = '\0';
} else {
next = end;
}
+
if (p < next) {
if (REFTABLE_ALLOC_GROW(names, names_len + 1,
- names_cap))
- goto err;
+ names_cap)) {
+ err = REFTABLE_OUT_OF_MEMORY_ERROR;
+ goto done;
+ }
names[names_len] = reftable_strdup(p);
- if (!names[names_len++])
- goto err;
+ if (!names[names_len++]) {
+ err = REFTABLE_OUT_OF_MEMORY_ERROR;
+ goto done;
+ }
}
p = next + 1;
}
- if (REFTABLE_ALLOC_GROW(names, names_len + 1, names_cap))
- goto err;
+ if (REFTABLE_ALLOC_GROW(names, names_len + 1, names_cap)) {
+ err = REFTABLE_OUT_OF_MEMORY_ERROR;
+ goto done;
+ }
names[names_len] = NULL;
- return names;
-
-err:
+ *out = names;
+ return 0;
+done:
for (size_t i = 0; i < names_len; i++)
reftable_free(names[i]);
reftable_free(names);
- return NULL;
+ return err;
}
int names_equal(const char **a, const char **b)
diff --git a/reftable/basics.h b/reftable/basics.h
index 7d22f96261..e4b83b2b03 100644
--- a/reftable/basics.h
+++ b/reftable/basics.h
@@ -167,10 +167,11 @@ void free_names(char **a);
/*
* Parse a newline separated list of names. `size` is the length of the buffer,
- * without terminating '\0'. Empty names are discarded. Returns a `NULL`
- * pointer when allocations fail.
+ * without terminating '\0'. Empty names are discarded.
+ *
+ * Returns 0 on success, a reftable error code on error.
*/
-char **parse_names(char *buf, int size);
+int parse_names(char *buf, int size, char ***out);
/* compares two NULL-terminated arrays of strings. */
int names_equal(const char **a, const char **b);
diff --git a/reftable/stack.c b/reftable/stack.c
index f91ce50bcd..65d89820bd 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -109,12 +109,7 @@ static int fd_read_lines(int fd, char ***namesp)
}
buf[size] = 0;
- *namesp = parse_names(buf, size);
- if (!*namesp) {
- err = REFTABLE_OUT_OF_MEMORY_ERROR;
- goto done;
- }
-
+ err = parse_names(buf, size, namesp);
done:
reftable_free(buf);
return err;
diff --git a/t/unit-tests/u-reftable-basics.c b/t/unit-tests/u-reftable-basics.c
index a0471083e7..73566ed0eb 100644
--- a/t/unit-tests/u-reftable-basics.c
+++ b/t/unit-tests/u-reftable-basics.c
@@ -9,6 +9,7 @@ license that can be found in the LICENSE file or at
#include "unit-test.h"
#include "lib-reftable.h"
#include "reftable/basics.h"
+#include "reftable/reftable-error.h"
struct integer_needle_lesseq_args {
int needle;
@@ -79,14 +80,18 @@ void test_reftable_basics__names_equal(void)
void test_reftable_basics__parse_names(void)
{
char in1[] = "line\n";
- char in2[] = "a\nb\nc";
- char **out = parse_names(in1, strlen(in1));
+ char in2[] = "a\nb\nc\n";
+ char **out = NULL;
+ int err = parse_names(in1, strlen(in1), &out);
+ cl_assert(err == 0);
cl_assert(out != NULL);
cl_assert_equal_s(out[0], "line");
cl_assert(!out[1]);
free_names(out);
- out = parse_names(in2, strlen(in2));
+ out = NULL;
+ err = parse_names(in2, strlen(in2), &out);
+ cl_assert(err == 0);
cl_assert(out != NULL);
cl_assert_equal_s(out[0], "a");
cl_assert_equal_s(out[1], "b");
@@ -95,10 +100,21 @@ void test_reftable_basics__parse_names(void)
free_names(out);
}
+void test_reftable_basics__parse_names_missing_newline(void)
+{
+ char in1[] = "line\nline2";
+ char **out = NULL;
+ int err = parse_names(in1, strlen(in1), &out);
+ cl_assert(err == REFTABLE_FORMAT_ERROR);
+ cl_assert(out == NULL);
+}
+
void test_reftable_basics__parse_names_drop_empty_string(void)
{
char in[] = "a\n\nb\n";
- char **out = parse_names(in, strlen(in));
+ char **out = NULL;
+ int err = parse_names(in, strlen(in), &out);
+ cl_assert(err == 0);
cl_assert(out != NULL);
cl_assert_equal_s(out[0], "a");
/* simply '\n' should be dropped as empty string */
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* [PATCH v6 4/7] Documentation/fsck-msgids: remove duplicate msg id
2025-10-07 12:11 ` [PATCH v6 " Karthik Nayak
` (2 preceding siblings ...)
2025-10-07 12:11 ` [PATCH v6 3/7] reftable: check for trailing newline in 'tables.list' Karthik Nayak
@ 2025-10-07 12:11 ` Karthik Nayak
2025-10-07 12:11 ` [PATCH v6 5/7] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
` (3 subsequent siblings)
7 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-10-07 12:11 UTC (permalink / raw)
To: git; +Cc: ps, gitster, peff, Karthik Nayak
The `gitmodulesLarge` is repeated twice. Remove the second duplicate.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Documentation/fsck-msgids.adoc | 3 ---
1 file changed, 3 deletions(-)
diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 0ba4f9a27e..1c912615f9 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -104,9 +104,6 @@
`gitmodulesParse`::
(INFO) Could not parse `.gitmodules` blob.
-`gitmodulesLarge`;
- (ERROR) `.gitmodules` blob is too large to parse.
-
`gitmodulesPath`::
(ERROR) `.gitmodules` path is invalid.
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* [PATCH v6 5/7] fsck: order 'fsck_msg_type' alphabetically
2025-10-07 12:11 ` [PATCH v6 " Karthik Nayak
` (3 preceding siblings ...)
2025-10-07 12:11 ` [PATCH v6 4/7] Documentation/fsck-msgids: remove duplicate msg id Karthik Nayak
@ 2025-10-07 12:11 ` Karthik Nayak
2025-10-07 12:11 ` [PATCH v6 6/7] reftable: add code to facilitate consistency checks Karthik Nayak
` (2 subsequent siblings)
7 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-10-07 12:11 UTC (permalink / raw)
To: git; +Cc: ps, gitster, peff, Karthik Nayak
The list of 'fsck_msg_type' seem to be alphabetically ordered, but there
are a few small misses. Fix this by sorting the sub-sections of the
list to maintain alphabetical ordering.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
fsck.h | 38 +++++++++++++++++++-------------------
1 file changed, 19 insertions(+), 19 deletions(-)
diff --git a/fsck.h b/fsck.h
index dd7df3d5b3..6b0db235e0 100644
--- a/fsck.h
+++ b/fsck.h
@@ -33,15 +33,27 @@ enum fsck_msg_type {
FUNC(BAD_PACKED_REF_ENTRY, ERROR) \
FUNC(BAD_PACKED_REF_HEADER, ERROR) \
FUNC(BAD_PARENT_SHA1, ERROR) \
+ FUNC(BAD_REFERENT_NAME, ERROR) \
FUNC(BAD_REF_CONTENT, ERROR) \
FUNC(BAD_REF_FILETYPE, ERROR) \
FUNC(BAD_REF_NAME, ERROR) \
- FUNC(BAD_REFERENT_NAME, ERROR) \
FUNC(BAD_TIMEZONE, ERROR) \
FUNC(BAD_TREE, ERROR) \
FUNC(BAD_TREE_SHA1, ERROR) \
FUNC(BAD_TYPE, ERROR) \
FUNC(DUPLICATE_ENTRIES, ERROR) \
+ FUNC(GITATTRIBUTES_BLOB, ERROR) \
+ FUNC(GITATTRIBUTES_LARGE, ERROR) \
+ FUNC(GITATTRIBUTES_LINE_LENGTH, ERROR) \
+ FUNC(GITATTRIBUTES_MISSING, ERROR) \
+ FUNC(GITMODULES_BLOB, ERROR) \
+ FUNC(GITMODULES_LARGE, ERROR) \
+ FUNC(GITMODULES_MISSING, ERROR) \
+ FUNC(GITMODULES_NAME, ERROR) \
+ FUNC(GITMODULES_PATH, ERROR) \
+ FUNC(GITMODULES_SYMLINK, ERROR) \
+ FUNC(GITMODULES_UPDATE, ERROR) \
+ FUNC(GITMODULES_URL, ERROR) \
FUNC(MISSING_AUTHOR, ERROR) \
FUNC(MISSING_COMMITTER, ERROR) \
FUNC(MISSING_EMAIL, ERROR) \
@@ -60,39 +72,27 @@ enum fsck_msg_type {
FUNC(TREE_NOT_SORTED, ERROR) \
FUNC(UNKNOWN_TYPE, ERROR) \
FUNC(ZERO_PADDED_DATE, ERROR) \
- FUNC(GITMODULES_MISSING, ERROR) \
- FUNC(GITMODULES_BLOB, ERROR) \
- FUNC(GITMODULES_LARGE, ERROR) \
- FUNC(GITMODULES_NAME, ERROR) \
- FUNC(GITMODULES_SYMLINK, ERROR) \
- FUNC(GITMODULES_URL, ERROR) \
- FUNC(GITMODULES_PATH, ERROR) \
- FUNC(GITMODULES_UPDATE, ERROR) \
- FUNC(GITATTRIBUTES_MISSING, ERROR) \
- FUNC(GITATTRIBUTES_LARGE, ERROR) \
- FUNC(GITATTRIBUTES_LINE_LENGTH, ERROR) \
- FUNC(GITATTRIBUTES_BLOB, ERROR) \
/* warnings */ \
FUNC(EMPTY_NAME, WARN) \
FUNC(FULL_PATHNAME, WARN) \
FUNC(HAS_DOT, WARN) \
FUNC(HAS_DOTDOT, WARN) \
FUNC(HAS_DOTGIT, WARN) \
+ FUNC(LARGE_PATHNAME, WARN) \
FUNC(NULL_SHA1, WARN) \
- FUNC(ZERO_PADDED_FILEMODE, WARN) \
FUNC(NUL_IN_COMMIT, WARN) \
- FUNC(LARGE_PATHNAME, WARN) \
+ FUNC(ZERO_PADDED_FILEMODE, WARN) \
/* infos (reported as warnings, but ignored by default) */ \
FUNC(BAD_FILEMODE, INFO) \
+ FUNC(BAD_TAG_NAME, INFO) \
FUNC(EMPTY_PACKED_REFS_FILE, INFO) \
- FUNC(GITMODULES_PARSE, INFO) \
- FUNC(GITIGNORE_SYMLINK, INFO) \
FUNC(GITATTRIBUTES_SYMLINK, INFO) \
+ FUNC(GITIGNORE_SYMLINK, INFO) \
+ FUNC(GITMODULES_PARSE, INFO) \
FUNC(MAILMAP_SYMLINK, INFO) \
- FUNC(BAD_TAG_NAME, INFO) \
FUNC(MISSING_TAGGER_ENTRY, INFO) \
- FUNC(SYMLINK_REF, INFO) \
FUNC(REF_MISSING_NEWLINE, INFO) \
+ FUNC(SYMLINK_REF, INFO) \
FUNC(SYMREF_TARGET_IS_NOT_A_REF, INFO) \
FUNC(TRAILING_REF_CONTENT, INFO) \
/* ignored (elevated when requested) */ \
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* [PATCH v6 6/7] reftable: add code to facilitate consistency checks
2025-10-07 12:11 ` [PATCH v6 " Karthik Nayak
` (4 preceding siblings ...)
2025-10-07 12:11 ` [PATCH v6 5/7] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
@ 2025-10-07 12:11 ` Karthik Nayak
2025-10-07 12:11 ` [PATCH v6 7/7] refs/reftable: add fsck check for checking the table name Karthik Nayak
2025-10-07 13:26 ` [PATCH v6 0/7] refs/reftable: add consistency checks Patrick Steinhardt
7 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-10-07 12:11 UTC (permalink / raw)
To: git; +Cc: ps, gitster, peff, Karthik Nayak
The `git refs verify` command is used to run consistency checks on the
reference backends. This command is also invoked when users run 'git
fsck'. While the files-backend has some fsck checks added, the reftable
backend lacks such checks. Let's add the required infrastructure and a
check to test for the files present in the reftable directory.
Since the reftable library is treated as an independent library we
should ensure that the library code works independently without
knowledge about Git's internals. To do this, add both 'reftable/fsck.c'
and 'reftable/reftable-fsck.h'. Which provide an entry point
'reftable_fsck_check' for running fsck checks over a provided reftable
stack. The callee provides the function with callbacks to handle issue
and information reporting.
The added check, goes over all tables in the reftable stack validates
that they have a valid name. It not, it raises an error.
While here, move 'reftable/error.o' in the Makefile to retain
lexicographic ordering.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Makefile | 3 +-
meson.build | 1 +
reftable/fsck.c | 100 +++++++++++++++++++++++++++++++++++++++++++++++
reftable/reftable-fsck.h | 40 +++++++++++++++++++
4 files changed, 143 insertions(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index 4c95affadb..03fbaf2b21 100644
--- a/Makefile
+++ b/Makefile
@@ -2732,9 +2732,10 @@ XDIFF_OBJS += xdiff/xutils.o
xdiff-objs: $(XDIFF_OBJS)
REFTABLE_OBJS += reftable/basics.o
-REFTABLE_OBJS += reftable/error.o
REFTABLE_OBJS += reftable/block.o
REFTABLE_OBJS += reftable/blocksource.o
+REFTABLE_OBJS += reftable/error.o
+REFTABLE_OBJS += reftable/fsck.o
REFTABLE_OBJS += reftable/iter.o
REFTABLE_OBJS += reftable/merged.o
REFTABLE_OBJS += reftable/pq.o
diff --git a/meson.build b/meson.build
index b3dfcc0497..8914252910 100644
--- a/meson.build
+++ b/meson.build
@@ -452,6 +452,7 @@ libgit_sources = [
'reftable/error.c',
'reftable/block.c',
'reftable/blocksource.c',
+ 'reftable/fsck.c',
'reftable/iter.c',
'reftable/merged.c',
'reftable/pq.c',
diff --git a/reftable/fsck.c b/reftable/fsck.c
new file mode 100644
index 0000000000..26b9115b14
--- /dev/null
+++ b/reftable/fsck.c
@@ -0,0 +1,100 @@
+#include "basics.h"
+#include "reftable-fsck.h"
+#include "reftable-table.h"
+#include "stack.h"
+
+static bool table_has_valid_name(const char *name)
+{
+ const char *ptr = name;
+ char *endptr;
+
+ /* strtoull doesn't set errno on success */
+ errno = 0;
+
+ strtoull(ptr, &endptr, 16);
+ if (errno)
+ return false;
+ ptr = endptr;
+
+ if (*ptr != '-')
+ return false;
+ ptr++;
+
+ strtoull(ptr, &endptr, 16);
+ if (errno)
+ return false;
+ ptr = endptr;
+
+ if (*ptr != '-')
+ return false;
+ ptr++;
+
+ strtoul(ptr, &endptr, 16);
+ if (errno)
+ return false;
+ ptr = endptr;
+
+ if (strcmp(ptr, ".ref") && strcmp(ptr, ".log"))
+ return false;
+
+ return true;
+}
+
+typedef int (*table_check_fn)(struct reftable_table *table,
+ reftable_fsck_report_fn report_fn,
+ void *cb_data);
+
+static int table_check_name(struct reftable_table *table,
+ reftable_fsck_report_fn report_fn,
+ void *cb_data)
+{
+ if (!table_has_valid_name(table->name)) {
+ struct reftable_fsck_info info;
+
+ info.error = REFTABLE_FSCK_ERROR_TABLE_NAME;
+ info.msg = "invalid reftable table name";
+ info.path = table->name;
+
+ return report_fn(&info, cb_data);
+ }
+
+ return 0;
+}
+
+static int table_checks(struct reftable_table *table,
+ reftable_fsck_report_fn report_fn,
+ reftable_fsck_verbose_fn verbose_fn UNUSED,
+ void *cb_data)
+{
+ table_check_fn table_check_fns[] = {
+ table_check_name,
+ NULL,
+ };
+ int err = 0;
+
+ for (size_t i = 0; table_check_fns[i]; i++)
+ err |= table_check_fns[i](table, report_fn, cb_data);
+
+ return err;
+}
+
+int reftable_fsck_check(struct reftable_stack *stack,
+ reftable_fsck_report_fn report_fn,
+ reftable_fsck_verbose_fn verbose_fn,
+ void *cb_data)
+{
+ struct reftable_buf msg = REFTABLE_BUF_INIT;
+ int err = 0;
+
+ for (size_t i = 0; i < stack->tables_len; i++) {
+ reftable_buf_reset(&msg);
+ reftable_buf_addstr(&msg, "Checking table: ");
+ reftable_buf_addstr(&msg, stack->tables[i]->name);
+ verbose_fn(msg.buf, cb_data);
+
+ err |= table_checks(stack->tables[i], report_fn, verbose_fn, cb_data);
+ }
+
+ reftable_buf_release(&msg);
+ return err;
+}
diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
new file mode 100644
index 0000000000..007a392cf9
--- /dev/null
+++ b/reftable/reftable-fsck.h
@@ -0,0 +1,40 @@
+#ifndef REFTABLE_FSCK_H
+#define REFTABLE_FSCK_H
+
+#include "reftable-stack.h"
+
+enum reftable_fsck_error {
+ /* Invalid table name */
+ REFTABLE_FSCK_ERROR_TABLE_NAME = 0,
+ /* Used for bounds checking, must be last */
+ REFTABLE_FSCK_MAX_VALUE,
+};
+
+/* Represents an individual error encountered during the FSCK checks. */
+struct reftable_fsck_info {
+ enum reftable_fsck_error error;
+ const char *msg;
+ const char *path;
+};
+
+typedef int reftable_fsck_report_fn(struct reftable_fsck_info *info,
+ void *cb_data);
+typedef void reftable_fsck_verbose_fn(const char *msg, void *cb_data);
+
+/*
+ * Given a reftable stack, perform consistency checks on the stack.
+ *
+ * If an issue is encountered, the issue is reported to the callee via the
+ * provided 'report_fn'. If the issue is non-recoverable the flow will not
+ * continue. If it is recoverable, the flow will continue and further issues
+ * will be reported as identified.
+ *
+ * The 'verbose_fn' will be invoked to provide verbose information about
+ * the progress and state of the consistency checks.
+ */
+int reftable_fsck_check(struct reftable_stack *stack,
+ reftable_fsck_report_fn report_fn,
+ reftable_fsck_verbose_fn verbose_fn,
+ void *cb_data);
+
+#endif /* REFTABLE_FSCK_H */
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* [PATCH v6 7/7] refs/reftable: add fsck check for checking the table name
2025-10-07 12:11 ` [PATCH v6 " Karthik Nayak
` (5 preceding siblings ...)
2025-10-07 12:11 ` [PATCH v6 6/7] reftable: add code to facilitate consistency checks Karthik Nayak
@ 2025-10-07 12:11 ` Karthik Nayak
2025-10-07 13:26 ` [PATCH v6 0/7] refs/reftable: add consistency checks Patrick Steinhardt
7 siblings, 0 replies; 96+ messages in thread
From: Karthik Nayak @ 2025-10-07 12:11 UTC (permalink / raw)
To: git; +Cc: ps, gitster, peff, Karthik Nayak
Add glue code in 'refs/reftable-backend.c' which calls the reftable
library to perform the fsck checks. Here we also map the reftable errors
to Git' fsck errors.
Introduce a check to validate table names for a given reftable stack.
Also add 'badReftableTableName' as a corresponding error within Git. The
reftable specification mentions:
It suggested to use
${min_update_index}-${max_update_index}-${random}.ref as a naming
convention.
So treat non-conformant file names as warnings.
While adding the fsck header to 'refs/reftable-backend.c', modify the
list to maintain lexicographical ordering.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Documentation/fsck-msgids.adoc | 3 +++
fsck.h | 1 +
refs/reftable-backend.c | 57 +++++++++++++++++++++++++++++++++++++----
t/meson.build | 1 +
t/t0614-reftable-fsck.sh | 58 ++++++++++++++++++++++++++++++++++++++++++
5 files changed, 115 insertions(+), 5 deletions(-)
diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 1c912615f9..81f11ba125 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -38,6 +38,9 @@
`badReferentName`::
(ERROR) The referent name of a symref is invalid.
+`badReftableTableName`::
+ (WARN) A reftable table has an invalid name.
+
`badTagName`::
(INFO) A tag has an invalid format.
diff --git a/fsck.h b/fsck.h
index 6b0db235e0..759df97655 100644
--- a/fsck.h
+++ b/fsck.h
@@ -73,6 +73,7 @@ enum fsck_msg_type {
FUNC(UNKNOWN_TYPE, ERROR) \
FUNC(ZERO_PADDED_DATE, ERROR) \
/* warnings */ \
+ FUNC(BAD_REFTABLE_TABLE_NAME, WARN) \
FUNC(EMPTY_NAME, WARN) \
FUNC(FULL_PATHNAME, WARN) \
FUNC(HAS_DOT, WARN) \
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 2152349cb9..b106fd8b53 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -6,6 +6,7 @@
#include "../config.h"
#include "../dir.h"
#include "../environment.h"
+#include "../fsck.h"
#include "../gettext.h"
#include "../hash.h"
#include "../hex.h"
@@ -15,10 +16,11 @@
#include "../path.h"
#include "../refs.h"
#include "../reftable/reftable-basics.h"
-#include "../reftable/reftable-stack.h"
-#include "../reftable/reftable-record.h"
#include "../reftable/reftable-error.h"
+#include "../reftable/reftable-fsck.h"
#include "../reftable/reftable-iterator.h"
+#include "../reftable/reftable-record.h"
+#include "../reftable/reftable-stack.h"
#include "../repo-settings.h"
#include "../setup.h"
#include "../strmap.h"
@@ -2707,11 +2709,56 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
return ret;
}
-static int reftable_be_fsck(struct ref_store *ref_store UNUSED,
- struct fsck_options *o UNUSED,
+static void reftable_fsck_verbose_handler(const char *msg, void *cb_data)
+{
+ struct fsck_options *o = cb_data;
+
+ if (o->verbose)
+ fprintf_ln(stderr, "%s", msg);
+}
+
+static const enum fsck_msg_id fsck_msg_id_map[] = {
+ [REFTABLE_FSCK_ERROR_TABLE_NAME] = FSCK_MSG_BAD_REFTABLE_TABLE_NAME,
+};
+
+static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
+ void *cb_data)
+{
+ struct fsck_ref_report report = { .path = info->path };
+ struct fsck_options *o = cb_data;
+ enum fsck_msg_id msg_id;
+
+ if (info->error < 0 || info->error >= REFTABLE_FSCK_MAX_VALUE)
+ BUG("unknown fsck error: %d", (int)info->error);
+
+ msg_id = fsck_msg_id_map[info->error];
+
+ if (!msg_id)
+ BUG("fsck_msg_id value missing for reftable error: %d", (int)info->error);
+
+ return fsck_report_ref(o, &report, msg_id, "%s", info->msg);
+}
+
+static int reftable_be_fsck(struct ref_store *ref_store, struct fsck_options *o,
struct worktree *wt UNUSED)
{
- return 0;
+ struct reftable_ref_store *refs;
+ struct strmap_entry *entry;
+ struct hashmap_iter iter;
+ int ret = 0;
+
+ refs = reftable_be_downcast(ref_store, REF_STORE_READ, "fsck");
+
+ ret |= reftable_fsck_check(refs->main_backend.stack, reftable_fsck_error_handler,
+ reftable_fsck_verbose_handler, o);
+
+ strmap_for_each_entry(&refs->worktree_backends, &iter, entry) {
+ struct reftable_backend *b = (struct reftable_backend *)entry->value;
+ ret |= reftable_fsck_check(b->stack, reftable_fsck_error_handler,
+ reftable_fsck_verbose_handler, o);
+ }
+
+ return ret;
}
struct ref_storage_be refs_be_reftable = {
diff --git a/t/meson.build b/t/meson.build
index 7974795fe4..ec1fc0b2a1 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -146,6 +146,7 @@ integration_tests = [
't0611-reftable-httpd.sh',
't0612-reftable-jgit-compatibility.sh',
't0613-reftable-write-options.sh',
+ 't0614-reftable-fsck.sh',
't1000-read-tree-m-3way.sh',
't1001-read-tree-m-2way.sh',
't1002-read-tree-m-u-2way.sh',
diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
new file mode 100755
index 0000000000..85cc47d67e
--- /dev/null
+++ b/t/t0614-reftable-fsck.sh
@@ -0,0 +1,58 @@
+#!/bin/sh
+
+test_description='Test reftable backend consistency check'
+
+GIT_TEST_DEFAULT_REF_FORMAT=reftable
+export GIT_TEST_DEFAULT_REF_FORMAT
+
+. ./test-lib.sh
+
+test_expect_success "no errors reported on a well formed repository" '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git commit --allow-empty -m initial &&
+
+ for i in $(test_seq 20)
+ do
+ git update-ref refs/heads/branch-$i HEAD || return 1
+ done &&
+
+ # The repository should end up with multiple tables.
+ test_line_count ">" 1 .git/reftable/tables.list &&
+
+ git refs verify 2>err &&
+ test_must_be_empty err
+ )
+'
+
+for TABLE_NAME in "foo-bar-e4d12d59.ref" \
+ "0x00000000zzzz-0x00000000zzzz-e4d12d59.ref" \
+ "0x000000000001-0x000000000002-e4d12d59.abc" \
+ "0x000000000001-0x000000000002-e4d12d59.refabc"; do
+ test_expect_success "table name $TABLE_NAME should be checked" '
+ test_when_finished "rm -rf repo" &&
+ git init repo &&
+ (
+ cd repo &&
+ git commit --allow-empty -m initial &&
+
+ git refs verify 2>err &&
+ test_must_be_empty err &&
+
+ EXISTING_TABLE=$(head -n1 .git/reftable/tables.list) &&
+ mv ".git/reftable/$EXISTING_TABLE" ".git/reftable/$TABLE_NAME" &&
+ sed "s/${EXISTING_TABLE}/${TABLE_NAME}/g" .git/reftable/tables.list > tables.list &&
+ mv tables.list .git/reftable/tables.list &&
+
+ git refs verify 2>err &&
+ cat >expect <<-EOF &&
+ warning: ${TABLE_NAME}: badReftableTableName: invalid reftable table name
+ EOF
+ test_cmp expect err
+ )
+ '
+done
+
+test_done
--
2.51.0
^ permalink raw reply related [flat|nested] 96+ messages in thread* Re: [PATCH v6 0/7] refs/reftable: add consistency checks
2025-10-07 12:11 ` [PATCH v6 " Karthik Nayak
` (6 preceding siblings ...)
2025-10-07 12:11 ` [PATCH v6 7/7] refs/reftable: add fsck check for checking the table name Karthik Nayak
@ 2025-10-07 13:26 ` Patrick Steinhardt
2025-10-07 16:25 ` Junio C Hamano
7 siblings, 1 reply; 96+ messages in thread
From: Patrick Steinhardt @ 2025-10-07 13:26 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, gitster, peff
On Tue, Oct 07, 2025 at 02:11:24PM +0200, Karthik Nayak wrote:
> Changes in v6:
> - In t/t0614-reftable-fsck.sh, create branches instead of root refs.
> This worked becuase we don't have reference level checks still
> implemented for reftables. Let's avoid confusion of a breaking test
> when we add reference level checks.
> - Link to v5: https://lore.kernel.org/r/20251006-228-reftable-introduce-consistency-checks-v5-0-f196d386214f@gmail.com
Thanks, this version looks good to me!
Patrick
^ permalink raw reply [flat|nested] 96+ messages in thread