git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] refs/reftable: add fsck checks
@ 2025-08-19 12:20 Karthik Nayak
  2025-08-19 12:21 ` [PATCH 1/5] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
                   ` (6 more replies)
  0 siblings, 7 replies; 28+ messages in thread
From: Karthik Nayak @ 2025-08-19 12:20 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak

This series adds the required infrastructure and also some fsck checks
for the reftable backend.

Since the reftable backend is treated as a library within the Git
codebase, we don't want to spillover our internal fsck implementation
into the library. At the same time, the fsck checks need to access
internal structures of the reftable library which aren't exposed outside
the library.

So we solve this by adding a 'reftable/fsck.[ch]' which implements and
exposes a checker for the reftable library and returns specific errors
as defined by the library. We then add glue code within
'refs/reftable-backend.c' to map these errors to errors which Git's fsck
implementation would understand. This allows us to separate concerns.

This series then adds some checks on the stack ('reftable/tables.list')
level of reftable, namely:
1. The table name is as per the spec
2. The number of tables are consistent
3. The tables.list has a newline at the end of file
4. The table names follow correct index sequences

I also plan to send in follow up series's which will implement further
checks and go into deeper layers (tables, block, references).

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/fsck-msgids.adoc |  15 +++-
 Makefile                       |   1 +
 fsck.h                         | 154 +++++++++++++++++++++--------------------
 meson.build                    |   1 +
 refs/reftable-backend.c        |  70 +++++++++++++++++--
 reftable/fsck.c                | 132 +++++++++++++++++++++++++++++++++++
 reftable/reftable-fsck.h       |  44 ++++++++++++
 t/meson.build                  |   3 +-
 t/t0614-reftable-fsck.sh       | 138 ++++++++++++++++++++++++++++++++++++
 9 files changed, 473 insertions(+), 85 deletions(-)

Karthik Nayak (5):
      fsck: order 'fsck_msg_type' alphabetically
      refs/reftable: add fsck check for checking the table name
      refs/reftable: add fsck check for number of tables
      refs/reftable: add fsck check for trailing newline
      refs/reftable: add fsck check for incorrect update index



base-commit: c44beea485f0f2feaf460e2ac87fdd5608d63cf0
change-id: 20250714-228-reftable-introduce-consistency-checks-379ded93c544

Thanks
- Karthik


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/5] fsck: order 'fsck_msg_type' alphabetically
  2025-08-19 12:20 [PATCH 0/5] refs/reftable: add fsck checks Karthik Nayak
@ 2025-08-19 12:21 ` Karthik Nayak
  2025-08-19 12:21 ` [PATCH 2/5] refs/reftable: add fsck check for checking the table name Karthik Nayak
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Karthik Nayak @ 2025-08-19 12:21 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak

The list of 'fsck_msg_type' seem to be alphabetically ordered, but there
are a few small misses. Fix this by sorting the sub-sections of the
list to maintain alphabetical ordering. Also fix a clang-format issue
where the escaped newlines are not aligned.

While here, remove a duplicate instance of 'gitmodulesLarge' in the
'fsck-msgids' documentation.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/fsck-msgids.adoc |   3 -
 fsck.h                         | 150 ++++++++++++++++++++---------------------
 2 files changed, 75 insertions(+), 78 deletions(-)

diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 0ba4f9a27e..1c912615f9 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -104,9 +104,6 @@
 `gitmodulesParse`::
 	(INFO) Could not parse `.gitmodules` blob.
 
-`gitmodulesLarge`;
-	(ERROR) `.gitmodules` blob is too large to parse.
-
 `gitmodulesPath`::
 	(ERROR) `.gitmodules` path is invalid.
 
diff --git a/fsck.h b/fsck.h
index dd7df3d5b3..559ad57807 100644
--- a/fsck.h
+++ b/fsck.h
@@ -20,82 +20,82 @@ enum fsck_msg_type {
  * two in sync.
  */
 
-#define FOREACH_FSCK_MSG_ID(FUNC) \
-	/* fatal errors */ \
-	FUNC(NUL_IN_HEADER, FATAL) \
-	FUNC(UNTERMINATED_HEADER, FATAL) \
-	/* errors */ \
-	FUNC(BAD_DATE, ERROR) \
-	FUNC(BAD_DATE_OVERFLOW, ERROR) \
-	FUNC(BAD_EMAIL, ERROR) \
-	FUNC(BAD_NAME, ERROR) \
-	FUNC(BAD_OBJECT_SHA1, ERROR) \
-	FUNC(BAD_PACKED_REF_ENTRY, ERROR) \
-	FUNC(BAD_PACKED_REF_HEADER, ERROR) \
-	FUNC(BAD_PARENT_SHA1, ERROR) \
-	FUNC(BAD_REF_CONTENT, ERROR) \
-	FUNC(BAD_REF_FILETYPE, ERROR) \
-	FUNC(BAD_REF_NAME, ERROR) \
-	FUNC(BAD_REFERENT_NAME, ERROR) \
-	FUNC(BAD_TIMEZONE, ERROR) \
-	FUNC(BAD_TREE, ERROR) \
-	FUNC(BAD_TREE_SHA1, ERROR) \
-	FUNC(BAD_TYPE, ERROR) \
-	FUNC(DUPLICATE_ENTRIES, ERROR) \
-	FUNC(MISSING_AUTHOR, ERROR) \
-	FUNC(MISSING_COMMITTER, ERROR) \
-	FUNC(MISSING_EMAIL, ERROR) \
-	FUNC(MISSING_NAME_BEFORE_EMAIL, ERROR) \
-	FUNC(MISSING_OBJECT, ERROR) \
-	FUNC(MISSING_SPACE_BEFORE_DATE, ERROR) \
-	FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR) \
-	FUNC(MISSING_TAG, ERROR) \
-	FUNC(MISSING_TAG_ENTRY, ERROR) \
-	FUNC(MISSING_TREE, ERROR) \
-	FUNC(MISSING_TYPE, ERROR) \
-	FUNC(MISSING_TYPE_ENTRY, ERROR) \
-	FUNC(MULTIPLE_AUTHORS, ERROR) \
-	FUNC(PACKED_REF_ENTRY_NOT_TERMINATED, ERROR) \
-	FUNC(PACKED_REF_UNSORTED, ERROR) \
-	FUNC(TREE_NOT_SORTED, ERROR) \
-	FUNC(UNKNOWN_TYPE, ERROR) \
-	FUNC(ZERO_PADDED_DATE, ERROR) \
-	FUNC(GITMODULES_MISSING, ERROR) \
-	FUNC(GITMODULES_BLOB, ERROR) \
-	FUNC(GITMODULES_LARGE, ERROR) \
-	FUNC(GITMODULES_NAME, ERROR) \
-	FUNC(GITMODULES_SYMLINK, ERROR) \
-	FUNC(GITMODULES_URL, ERROR) \
-	FUNC(GITMODULES_PATH, ERROR) \
-	FUNC(GITMODULES_UPDATE, ERROR) \
-	FUNC(GITATTRIBUTES_MISSING, ERROR) \
-	FUNC(GITATTRIBUTES_LARGE, ERROR) \
-	FUNC(GITATTRIBUTES_LINE_LENGTH, ERROR) \
-	FUNC(GITATTRIBUTES_BLOB, ERROR) \
-	/* warnings */ \
-	FUNC(EMPTY_NAME, WARN) \
-	FUNC(FULL_PATHNAME, WARN) \
-	FUNC(HAS_DOT, WARN) \
-	FUNC(HAS_DOTDOT, WARN) \
-	FUNC(HAS_DOTGIT, WARN) \
-	FUNC(NULL_SHA1, WARN) \
-	FUNC(ZERO_PADDED_FILEMODE, WARN) \
-	FUNC(NUL_IN_COMMIT, WARN) \
-	FUNC(LARGE_PATHNAME, WARN) \
+#define FOREACH_FSCK_MSG_ID(FUNC)                                  \
+	/* fatal errors */                                         \
+	FUNC(NUL_IN_HEADER, FATAL)                                 \
+	FUNC(UNTERMINATED_HEADER, FATAL)                           \
+	/* errors */                                               \
+	FUNC(BAD_DATE, ERROR)                                      \
+	FUNC(BAD_DATE_OVERFLOW, ERROR)                             \
+	FUNC(BAD_EMAIL, ERROR)                                     \
+	FUNC(BAD_NAME, ERROR)                                      \
+	FUNC(BAD_OBJECT_SHA1, ERROR)                               \
+	FUNC(BAD_PACKED_REF_ENTRY, ERROR)                          \
+	FUNC(BAD_PACKED_REF_HEADER, ERROR)                         \
+	FUNC(BAD_PARENT_SHA1, ERROR)                               \
+	FUNC(BAD_REFERENT_NAME, ERROR)                             \
+	FUNC(BAD_REF_CONTENT, ERROR)                               \
+	FUNC(BAD_REF_FILETYPE, ERROR)                              \
+	FUNC(BAD_REF_NAME, ERROR)                                  \
+	FUNC(BAD_TIMEZONE, ERROR)                                  \
+	FUNC(BAD_TREE, ERROR)                                      \
+	FUNC(BAD_TREE_SHA1, ERROR)                                 \
+	FUNC(BAD_TYPE, ERROR)                                      \
+	FUNC(DUPLICATE_ENTRIES, ERROR)                             \
+	FUNC(GITATTRIBUTES_BLOB, ERROR)                            \
+	FUNC(GITATTRIBUTES_LARGE, ERROR)                           \
+	FUNC(GITATTRIBUTES_LINE_LENGTH, ERROR)                     \
+	FUNC(GITATTRIBUTES_MISSING, ERROR)                         \
+	FUNC(GITMODULES_BLOB, ERROR)                               \
+	FUNC(GITMODULES_LARGE, ERROR)                              \
+	FUNC(GITMODULES_MISSING, ERROR)                            \
+	FUNC(GITMODULES_NAME, ERROR)                               \
+	FUNC(GITMODULES_PATH, ERROR)                               \
+	FUNC(GITMODULES_SYMLINK, ERROR)                            \
+	FUNC(GITMODULES_UPDATE, ERROR)                             \
+	FUNC(GITMODULES_URL, ERROR)                                \
+	FUNC(MISSING_AUTHOR, ERROR)                                \
+	FUNC(MISSING_COMMITTER, ERROR)                             \
+	FUNC(MISSING_EMAIL, ERROR)                                 \
+	FUNC(MISSING_NAME_BEFORE_EMAIL, ERROR)                     \
+	FUNC(MISSING_OBJECT, ERROR)                                \
+	FUNC(MISSING_SPACE_BEFORE_DATE, ERROR)                     \
+	FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR)                    \
+	FUNC(MISSING_TAG, ERROR)                                   \
+	FUNC(MISSING_TAG_ENTRY, ERROR)                             \
+	FUNC(MISSING_TREE, ERROR)                                  \
+	FUNC(MISSING_TYPE, ERROR)                                  \
+	FUNC(MISSING_TYPE_ENTRY, ERROR)                            \
+	FUNC(MULTIPLE_AUTHORS, ERROR)                              \
+	FUNC(PACKED_REF_ENTRY_NOT_TERMINATED, ERROR)               \
+	FUNC(PACKED_REF_UNSORTED, ERROR)                           \
+	FUNC(TREE_NOT_SORTED, ERROR)                               \
+	FUNC(UNKNOWN_TYPE, ERROR)                                  \
+	FUNC(ZERO_PADDED_DATE, ERROR)                              \
+	/* warnings */                                             \
+	FUNC(EMPTY_NAME, WARN)                                     \
+	FUNC(FULL_PATHNAME, WARN)                                  \
+	FUNC(HAS_DOT, WARN)                                        \
+	FUNC(HAS_DOTDOT, WARN)                                     \
+	FUNC(HAS_DOTGIT, WARN)                                     \
+	FUNC(LARGE_PATHNAME, WARN)                                 \
+	FUNC(NULL_SHA1, WARN)                                      \
+	FUNC(NUL_IN_COMMIT, WARN)                                  \
+	FUNC(ZERO_PADDED_FILEMODE, WARN)                           \
 	/* infos (reported as warnings, but ignored by default) */ \
-	FUNC(BAD_FILEMODE, INFO) \
-	FUNC(EMPTY_PACKED_REFS_FILE, INFO) \
-	FUNC(GITMODULES_PARSE, INFO) \
-	FUNC(GITIGNORE_SYMLINK, INFO) \
-	FUNC(GITATTRIBUTES_SYMLINK, INFO) \
-	FUNC(MAILMAP_SYMLINK, INFO) \
-	FUNC(BAD_TAG_NAME, INFO) \
-	FUNC(MISSING_TAGGER_ENTRY, INFO) \
-	FUNC(SYMLINK_REF, INFO) \
-	FUNC(REF_MISSING_NEWLINE, INFO) \
-	FUNC(SYMREF_TARGET_IS_NOT_A_REF, INFO) \
-	FUNC(TRAILING_REF_CONTENT, INFO) \
-	/* ignored (elevated when requested) */ \
+	FUNC(BAD_FILEMODE, INFO)                                   \
+	FUNC(BAD_TAG_NAME, INFO)                                   \
+	FUNC(EMPTY_PACKED_REFS_FILE, INFO)                         \
+	FUNC(GITATTRIBUTES_SYMLINK, INFO)                          \
+	FUNC(GITIGNORE_SYMLINK, INFO)                              \
+	FUNC(GITMODULES_PARSE, INFO)                               \
+	FUNC(MAILMAP_SYMLINK, INFO)                                \
+	FUNC(MISSING_TAGGER_ENTRY, INFO)                           \
+	FUNC(REF_MISSING_NEWLINE, INFO)                            \
+	FUNC(SYMLINK_REF, INFO)                                    \
+	FUNC(SYMREF_TARGET_IS_NOT_A_REF, INFO)                     \
+	FUNC(TRAILING_REF_CONTENT, INFO)                           \
+	/* ignored (elevated when requested) */                    \
 	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
 
 #define MSG_ID(id, msg_type) FSCK_MSG_##id,

-- 
2.50.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/5] refs/reftable: add fsck check for checking the table name
  2025-08-19 12:20 [PATCH 0/5] refs/reftable: add fsck checks Karthik Nayak
  2025-08-19 12:21 ` [PATCH 1/5] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
@ 2025-08-19 12:21 ` Karthik Nayak
  2025-08-26 16:21   ` shejialuo
  2025-08-19 12:21 ` [PATCH 3/5] refs/reftable: add fsck check for number of tables Karthik Nayak
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Karthik Nayak @ 2025-08-19 12:21 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak

The `git refs verify` command is used to run fsck checks on the
reference backends. This command is also invoked when users run 'git
fsck'. While the files-backend has some fsck checks added, the reftable
backend lacks such checks. Let's add the required infrastructure and a
check to test for the table names in the 'tables.list' of reftables.

For the infrastructure, since the reftable library is treated as an
independent library we should ensure that the library code works
independently without knowledge about Git's internals. To do this,
add both 'reftable/fsck.c' and 'reftable/reftable-fsck.h'. Which
provide an entry point 'reftable_fsck_check' for running fsck checks
over a provided reftable stack. The callee provides the function with
callbacks to handle issue and information reporting.

Add glue code in 'refs/reftable-backend.c' which calls the reftable
library to perform the fsck checks. Here we also map the reftable errors
to Git' fsck errors.

Introduce a check to validate table names for a given reftable stack.
Also add 'badReftableTableName' as a corresponding error within Git. Add
a test to check for this behavior.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/fsck-msgids.adoc |  3 +++
 Makefile                       |  1 +
 fsck.h                         |  1 +
 meson.build                    |  1 +
 refs/reftable-backend.c        | 61 +++++++++++++++++++++++++++++++++++++-----
 reftable/fsck.c                | 50 ++++++++++++++++++++++++++++++++++
 reftable/reftable-fsck.h       | 38 ++++++++++++++++++++++++++
 t/meson.build                  |  3 ++-
 t/t0614-reftable-fsck.sh       | 35 ++++++++++++++++++++++++
 9 files changed, 186 insertions(+), 7 deletions(-)

diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 1c912615f9..784ddc0df5 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -38,6 +38,9 @@
 `badReferentName`::
 	(ERROR) The referent name of a symref is invalid.
 
+`badReftableTableName`::
+	(ERROR) A reftable table has an invalid name.
+
 `badTagName`::
 	(INFO) A tag has an invalid format.
 
diff --git a/Makefile b/Makefile
index e11340c1ae..f2ddcc8d7c 100644
--- a/Makefile
+++ b/Makefile
@@ -2733,6 +2733,7 @@ REFTABLE_OBJS += reftable/error.o
 REFTABLE_OBJS += reftable/block.o
 REFTABLE_OBJS += reftable/blocksource.o
 REFTABLE_OBJS += reftable/iter.o
+REFTABLE_OBJS += reftable/fsck.o
 REFTABLE_OBJS += reftable/merged.o
 REFTABLE_OBJS += reftable/pq.o
 REFTABLE_OBJS += reftable/record.o
diff --git a/fsck.h b/fsck.h
index 559ad57807..5901f944a1 100644
--- a/fsck.h
+++ b/fsck.h
@@ -34,6 +34,7 @@ enum fsck_msg_type {
 	FUNC(BAD_PACKED_REF_HEADER, ERROR)                         \
 	FUNC(BAD_PARENT_SHA1, ERROR)                               \
 	FUNC(BAD_REFERENT_NAME, ERROR)                             \
+	FUNC(BAD_REFTABLE_TABLE_NAME, ERROR)                       \
 	FUNC(BAD_REF_CONTENT, ERROR)                               \
 	FUNC(BAD_REF_FILETYPE, ERROR)                              \
 	FUNC(BAD_REF_NAME, ERROR)                                  \
diff --git a/meson.build b/meson.build
index 5dd299b496..82879fbfaa 100644
--- a/meson.build
+++ b/meson.build
@@ -452,6 +452,7 @@ libgit_sources = [
   'reftable/error.c',
   'reftable/block.c',
   'reftable/blocksource.c',
+  'reftable/fsck.c',
   'reftable/iter.c',
   'reftable/merged.c',
   'reftable/pq.c',
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 8dae1e1112..ccd12052f2 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -6,20 +6,21 @@
 #include "../config.h"
 #include "../dir.h"
 #include "../environment.h"
+#include "../fsck.h"
 #include "../gettext.h"
 #include "../hash.h"
 #include "../hex.h"
 #include "../iterator.h"
 #include "../ident.h"
-#include "../lockfile.h"
 #include "../object.h"
 #include "../path.h"
 #include "../refs.h"
 #include "../reftable/reftable-basics.h"
-#include "../reftable/reftable-stack.h"
-#include "../reftable/reftable-record.h"
 #include "../reftable/reftable-error.h"
+#include "../reftable/reftable-fsck.h"
 #include "../reftable/reftable-iterator.h"
+#include "../reftable/reftable-record.h"
+#include "../reftable/reftable-stack.h"
 #include "../repo-settings.h"
 #include "../setup.h"
 #include "../strmap.h"
@@ -2675,11 +2676,59 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
 	return ret;
 }
 
-static int reftable_be_fsck(struct ref_store *ref_store UNUSED,
-			    struct fsck_options *o UNUSED,
+static void reftable_fsck_verbose_handler(const char *msg, void *cb_data)
+{
+	struct fsck_options *o = cb_data;
+
+	if (o->verbose)
+		fprintf_ln(stderr, "%s", _(msg));
+}
+
+static int reftable_fsck_error_handler(struct reftable_fsck_info info,
+				       void *cb_data)
+{
+	struct fsck_options *o = cb_data;
+	struct fsck_ref_report report = { .path = info.path };
+	enum fsck_msg_id msg_id;
+
+	switch (info.error) {
+	case REFTABLE_FSCK_ERROR_TABLE_NAME:
+		msg_id = FSCK_MSG_BAD_REFTABLE_TABLE_NAME;
+		break;
+	default:
+		BUG("unknown fsck error: %d", info.error);
+	}
+
+	return fsck_report_ref(o, &report, msg_id, "%s", info.msg);
+}
+
+static int reftable_be_fsck(struct ref_store *ref_store, struct fsck_options *o,
 			    struct worktree *wt UNUSED)
 {
-	return 0;
+	struct reftable_ref_store *refs;
+	struct strmap_entry *entry;
+	struct hashmap_iter iter;
+	int ret = 0;
+
+	refs = reftable_be_downcast(ref_store, REF_STORE_READ, "fsck");
+
+	if (o->verbose)
+		fprintf_ln(stderr, _("Checking references consistency"));
+
+	ret = reftable_fsck_check(refs->main_backend.stack, reftable_fsck_error_handler,
+				  reftable_fsck_verbose_handler, o);
+	if (!ret)
+		return ret;
+
+	strmap_for_each_entry(&refs->worktree_backends, &iter, entry) {
+		struct reftable_backend *b = (struct reftable_backend *)entry->value;
+		ret = reftable_fsck_check(b->stack, reftable_fsck_error_handler,
+					  reftable_fsck_verbose_handler, o);
+		if (!ret)
+			return ret;
+	}
+
+	return ret;
 }
 
 struct ref_storage_be refs_be_reftable = {
diff --git a/reftable/fsck.c b/reftable/fsck.c
new file mode 100644
index 0000000000..22ec3c26e9
--- /dev/null
+++ b/reftable/fsck.c
@@ -0,0 +1,50 @@
+#include "basics.h"
+#include "reftable-fsck.h"
+#include "stack.h"
+
+int reftable_fsck_check(struct reftable_stack *stack,
+			reftable_fsck_report_fn report_fn,
+			reftable_fsck_verbose_fn verbose_fn,
+			void *cb_data)
+{
+	char **names = NULL;
+	uint64_t min, max;
+	int err = 0;
+
+	if (stack == NULL)
+		goto out;
+
+	err = read_lines(stack->list_file, &names);
+	if (err < 0)
+		goto out;
+
+	verbose_fn("Checking reftable table names", cb_data);
+
+	for (size_t i = 0; names[i]; i++) {
+		struct reftable_fsck_info info = {
+			.error = REFTABLE_FSCK_ERROR_TABLE_NAME,
+			.path = names[i],
+			.msg = "invalid reftable name"
+		};
+		uint32_t rnd;
+		/*
+		 * We want to match the tail '.ref'. One extra byte to ensure
+		 * that there is no unexpected extra character and one byte for
+		 * the null terminator added by sscanf.
+		 */
+		char tail[6];
+
+		if (sscanf(names[i], "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x%5s",
+			   &min, &max, &rnd, tail) != 4) {
+			err = report_fn(info, cb_data);
+		}
+
+		if (strcmp(tail, ".ref")) {
+			err = report_fn(info, cb_data);
+		}
+	}
+
+out:
+	free_names(names);
+	return err;
+}
diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
new file mode 100644
index 0000000000..087430d979
--- /dev/null
+++ b/reftable/reftable-fsck.h
@@ -0,0 +1,38 @@
+#ifndef REFTABLE_FSCK_H
+#define REFTABLE_FSCK_H
+
+#include "reftable-stack.h"
+
+enum reftable_fsck_error {
+	/* Invalid table name */
+	REFTABLE_FSCK_ERROR_TABLE_NAME = -1,
+};
+
+/* Represents an individual error encounctered during the FSCK checks. */
+struct reftable_fsck_info {
+	enum reftable_fsck_error error;
+	const char *msg;
+	const char *path;
+};
+
+typedef int reftable_fsck_report_fn(struct reftable_fsck_info info,
+				    void *cb_data);
+typedef void reftable_fsck_verbose_fn(const char *msg, void *cb_data);
+
+/*
+ * Given a reftable stack, perform FSCK check on the stack.
+ *
+ * If an issue is encountered, the issue is reported to the callee via the
+ * provided 'report_fn'. If the issue is non-recoverable the flow will not
+ * conitnue. If it is recoverable, the flow will continue and further issues
+ * will be reported as identified.
+ *
+ * The 'verbose_fn' will be invoked to provide verbose information about
+ * the progress and state of the FSCK checks.
+ */
+int reftable_fsck_check(struct reftable_stack *stack,
+			reftable_fsck_report_fn report_fn,
+			reftable_fsck_verbose_fn verbose_fn,
+			void *cb_data);
+
+#endif /* REFTABLE_FSCK_H */
diff --git a/t/meson.build b/t/meson.build
index bbeba1a8d5..a8eb44eb30 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -145,6 +145,7 @@ integration_tests = [
   't0611-reftable-httpd.sh',
   't0612-reftable-jgit-compatibility.sh',
   't0613-reftable-write-options.sh',
+  't0614-reftable-fsck.sh',
   't1000-read-tree-m-3way.sh',
   't1001-read-tree-m-2way.sh',
   't1002-read-tree-m-u-2way.sh',
@@ -1214,4 +1215,4 @@ if perl.found() and time.found()
       timeout: 0,
     )
   endforeach
-endif
\ No newline at end of file
+endif
diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
new file mode 100755
index 0000000000..0d11871b1c
--- /dev/null
+++ b/t/t0614-reftable-fsck.sh
@@ -0,0 +1,35 @@
+#!/bin/sh
+
+test_description='Test reftable backend consistency check'
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+GIT_TEST_DEFAULT_REF_FORMAT=reftable
+export GIT_TEST_DEFAULT_REF_FORMAT
+
+. ./test-lib.sh
+
+test_expect_success 'table name should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git commit --allow-empty -m initial &&
+
+		git refs verify 2>err &&
+		test_must_be_empty err &&
+
+		TABLE_NAME=$(cat .git/reftable/tables.list | head -n1) &&
+		sed "1s/$/extra/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
+		mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
+		mv .git/reftable/${TABLE_NAME} .git/reftable/${TABLE_NAME}extra &&
+
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: ${TABLE_NAME}extra: badReftableTableName: invalid reftable name
+		EOF
+		test_cmp expect err
+	)
+'
+
+test_done

-- 
2.50.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/5] refs/reftable: add fsck check for number of tables
  2025-08-19 12:20 [PATCH 0/5] refs/reftable: add fsck checks Karthik Nayak
  2025-08-19 12:21 ` [PATCH 1/5] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
  2025-08-19 12:21 ` [PATCH 2/5] refs/reftable: add fsck check for checking the table name Karthik Nayak
@ 2025-08-19 12:21 ` Karthik Nayak
  2025-08-26 16:33   ` shejialuo
  2025-08-26 16:44   ` shejialuo
  2025-08-19 12:21 ` [PATCH 4/5] refs/reftable: add fsck check for trailing newline Karthik Nayak
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 28+ messages in thread
From: Karthik Nayak @ 2025-08-19 12:21 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak

Introduce a reftable fsck check to check that the number of files in the
reftable directory matches the number of files listed in 'tables.list'.
We do this by iterating over the files in the reftable directory and
counting all the files present excluding the 'tables.list'. This is also
exposed over Git's fsck checks as a 'badReftableStackCount' error.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/fsck-msgids.adoc |  3 +++
 fsck.h                         |  1 +
 refs/reftable-backend.c        |  3 +++
 reftable/fsck.c                | 34 ++++++++++++++++++++++++++++++++++
 reftable/reftable-fsck.h       |  2 ++
 t/t0614-reftable-fsck.sh       | 20 ++++++++++++++++++++
 6 files changed, 63 insertions(+)

diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 784ddc0df5..707e2fc50a 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -38,6 +38,9 @@
 `badReferentName`::
 	(ERROR) The referent name of a symref is invalid.
 
+`badReftableStackCount`::
+	(ERROR) Mismatch in number of tables.
+
 `badReftableTableName`::
 	(ERROR) A reftable table has an invalid name.
 
diff --git a/fsck.h b/fsck.h
index 5901f944a1..256effc4f8 100644
--- a/fsck.h
+++ b/fsck.h
@@ -34,6 +34,7 @@ enum fsck_msg_type {
 	FUNC(BAD_PACKED_REF_HEADER, ERROR)                         \
 	FUNC(BAD_PARENT_SHA1, ERROR)                               \
 	FUNC(BAD_REFERENT_NAME, ERROR)                             \
+	FUNC(BAD_REFTABLE_STACK_COUNT, ERROR)                      \
 	FUNC(BAD_REFTABLE_TABLE_NAME, ERROR)                       \
 	FUNC(BAD_REF_CONTENT, ERROR)                               \
 	FUNC(BAD_REF_FILETYPE, ERROR)                              \
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index ccd12052f2..616f4ee0f3 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -2695,6 +2695,9 @@ static int reftable_fsck_error_handler(struct reftable_fsck_info info,
 	case REFTABLE_FSCK_ERROR_TABLE_NAME:
 		msg_id = FSCK_MSG_BAD_REFTABLE_TABLE_NAME;
 		break;
+	case REFTABLE_FSCK_ERROR_STACK_COUNT:
+		msg_id = FSCK_MSG_BAD_REFTABLE_STACK_COUNT;
+		break;
 	default:
 		BUG("unknown fsck error: %d", info.error);
 	}
diff --git a/reftable/fsck.c b/reftable/fsck.c
index 22ec3c26e9..e92a630276 100644
--- a/reftable/fsck.c
+++ b/reftable/fsck.c
@@ -2,6 +2,28 @@
 #include "reftable-fsck.h"
 #include "stack.h"
 
+static int reftable_fsck_valid_stack_count(struct reftable_stack *st)
+{
+	DIR *dir = opendir(st->reftable_dir);
+	struct dirent *d = NULL;
+	unsigned int count = 0;
+
+	if (!dir)
+		return 0;
+
+	while ((d = readdir(dir))) {
+		if (!strcmp(d->d_name, "tables.list"))
+			continue;
+
+		if (d->d_type == DT_REG)
+			count++;
+	}
+
+	closedir(dir);
+
+	return count == st->tables_len;
+}
+
 int reftable_fsck_check(struct reftable_stack *stack,
 			reftable_fsck_report_fn report_fn,
 			reftable_fsck_verbose_fn verbose_fn,
@@ -44,6 +66,18 @@ int reftable_fsck_check(struct reftable_stack *stack,
 		}
 	}
 
+	verbose_fn("Checking reftable tables count", cb_data);
+
+	if (!reftable_fsck_valid_stack_count(stack)) {
+		struct reftable_fsck_info info = {
+			.error = REFTABLE_FSCK_ERROR_STACK_COUNT,
+			.path = stack->list_file,
+			.msg = "mismatch in number of tables"
+		};
+
+		err = report_fn(info, cb_data);
+	}
+
 out:
 	free_names(names);
 	return err;
diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
index 087430d979..888c3968b7 100644
--- a/reftable/reftable-fsck.h
+++ b/reftable/reftable-fsck.h
@@ -6,6 +6,8 @@
 enum reftable_fsck_error {
 	/* Invalid table name */
 	REFTABLE_FSCK_ERROR_TABLE_NAME = -1,
+	/* Incorrect number of tables present */
+	REFTABLE_FSCK_ERROR_STACK_COUNT = -2,
 };
 
 /* Represents an individual error encounctered during the FSCK checks. */
diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
index 0d11871b1c..a351fed562 100755
--- a/t/t0614-reftable-fsck.sh
+++ b/t/t0614-reftable-fsck.sh
@@ -32,4 +32,24 @@ test_expect_success 'table name should be checked' '
 	)
 '
 
+test_expect_success 'table count should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git commit --allow-empty -m initial &&
+
+		git refs verify 2>err &&
+		test_must_be_empty err &&
+
+		touch .git/reftable/0x000000002812-0x000000002813-c830a596.ref &&
+
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: $(pwd)/.git/reftable/tables.list: badReftableStackCount: mismatch in number of tables
+		EOF
+		test_cmp expect err
+	)
+'
+
 test_done

-- 
2.50.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/5] refs/reftable: add fsck check for trailing newline
  2025-08-19 12:20 [PATCH 0/5] refs/reftable: add fsck checks Karthik Nayak
                   ` (2 preceding siblings ...)
  2025-08-19 12:21 ` [PATCH 3/5] refs/reftable: add fsck check for number of tables Karthik Nayak
@ 2025-08-19 12:21 ` Karthik Nayak
  2025-08-19 12:21 ` [PATCH 5/5] refs/reftable: add fsck check for incorrect update index Karthik Nayak
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Karthik Nayak @ 2025-08-19 12:21 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak

Introduce a fsck check for the reftable backend, which checks if the
'tables.list' contains a newline. The reftable backend writes a trailing
newline when writing the 'tables.list', but it doesn't check for it when
reading the file. A missing newline however indicates that the file was
manually tampered with, so let's raise this as an error to the user.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/fsck-msgids.adoc |  3 +++
 fsck.h                         |  1 +
 refs/reftable-backend.c        |  3 +++
 reftable/fsck.c                | 36 ++++++++++++++++++++++++++++++++++++
 reftable/reftable-fsck.h       |  2 ++
 t/t0614-reftable-fsck.sh       | 21 +++++++++++++++++++++
 6 files changed, 66 insertions(+)

diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 707e2fc50a..1432b1de06 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -41,6 +41,9 @@
 `badReftableStackCount`::
 	(ERROR) Mismatch in number of tables.
 
+`badReftableStackListNewline`::
+	(ERROR) Reftable stack list missing trailing newline.
+
 `badReftableTableName`::
 	(ERROR) A reftable table has an invalid name.
 
diff --git a/fsck.h b/fsck.h
index 256effc4f8..33432bae79 100644
--- a/fsck.h
+++ b/fsck.h
@@ -35,6 +35,7 @@ enum fsck_msg_type {
 	FUNC(BAD_PARENT_SHA1, ERROR)                               \
 	FUNC(BAD_REFERENT_NAME, ERROR)                             \
 	FUNC(BAD_REFTABLE_STACK_COUNT, ERROR)                      \
+	FUNC(BAD_REFTABLE_STACK_LIST_NEWLINE, ERROR)               \
 	FUNC(BAD_REFTABLE_TABLE_NAME, ERROR)                       \
 	FUNC(BAD_REF_CONTENT, ERROR)                               \
 	FUNC(BAD_REF_FILETYPE, ERROR)                              \
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 616f4ee0f3..0087afa3ac 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -2698,6 +2698,9 @@ static int reftable_fsck_error_handler(struct reftable_fsck_info info,
 	case REFTABLE_FSCK_ERROR_STACK_COUNT:
 		msg_id = FSCK_MSG_BAD_REFTABLE_STACK_COUNT;
 		break;
+	case REFTABLE_FSCK_ERROR_STACK_LIST_MISSING_NEWLINE:
+		msg_id = FSCK_MSG_BAD_REFTABLE_STACK_LIST_NEWLINE;
+		break;
 	default:
 		BUG("unknown fsck error: %d", info.error);
 	}
diff --git a/reftable/fsck.c b/reftable/fsck.c
index e92a630276..b4898fd2cd 100644
--- a/reftable/fsck.c
+++ b/reftable/fsck.c
@@ -1,7 +1,31 @@
 #include "basics.h"
+#include "reftable-error.h"
 #include "reftable-fsck.h"
 #include "stack.h"
 
+static int reftable_fsck_stack_contains_newline(const char *list_file)
+{
+	FILE *f = fopen(list_file, "r");
+	int c = 0;
+
+	if (f == NULL) {
+		if (errno == ENOENT)
+			return 0;
+		return REFTABLE_IO_ERROR;
+	}
+
+	if (fseek(f, 0, SEEK_END) == 0) {
+		long size = ftell(f);
+		if (size <= 0)
+			return REFTABLE_IO_ERROR;
+		fseek(f, -1, SEEK_END);
+		c = fgetc(f);
+	}
+	fclose(f);
+
+	return c == '\n';
+}
+
 static int reftable_fsck_valid_stack_count(struct reftable_stack *st)
 {
 	DIR *dir = opendir(st->reftable_dir);
@@ -66,6 +90,18 @@ int reftable_fsck_check(struct reftable_stack *stack,
 		}
 	}
 
+	verbose_fn("Checking trailing newline in stack list", cb_data);
+
+	if (!reftable_fsck_stack_contains_newline(stack->list_file)) {
+		struct reftable_fsck_info info = {
+			.error = REFTABLE_FSCK_ERROR_STACK_LIST_MISSING_NEWLINE,
+			.path = stack->list_file,
+			.msg = "trailing newline missing in stack list"
+		};
+
+		err = report_fn(info, cb_data);
+	}
+
 	verbose_fn("Checking reftable tables count", cb_data);
 
 	if (!reftable_fsck_valid_stack_count(stack)) {
diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
index 888c3968b7..8e6cb6c7d2 100644
--- a/reftable/reftable-fsck.h
+++ b/reftable/reftable-fsck.h
@@ -8,6 +8,8 @@ enum reftable_fsck_error {
 	REFTABLE_FSCK_ERROR_TABLE_NAME = -1,
 	/* Incorrect number of tables present */
 	REFTABLE_FSCK_ERROR_STACK_COUNT = -2,
+	/* Newline missing at the end of the stack list */
+	REFTABLE_FSCK_ERROR_STACK_LIST_MISSING_NEWLINE = -3,
 };
 
 /* Represents an individual error encounctered during the FSCK checks. */
diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
index a351fed562..937c5dd37a 100755
--- a/t/t0614-reftable-fsck.sh
+++ b/t/t0614-reftable-fsck.sh
@@ -52,4 +52,25 @@ test_expect_success 'table count should be checked' '
 	)
 '
 
+test_expect_success 'stack list must contain trailing newline' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git commit --allow-empty -m initial &&
+
+		git refs verify 2>err &&
+		test_must_be_empty err &&
+
+		printf "%s" "$(cat .git/reftable/tables.list)" >.git/reftable/tables.list.tmp &&
+		mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
+
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: $(pwd)/.git/reftable/tables.list: badReftableStackListNewline: trailing newline missing in stack list
+		EOF
+		test_cmp expect err
+	)
+'
+
 test_done

-- 
2.50.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/5] refs/reftable: add fsck check for incorrect update index
  2025-08-19 12:20 [PATCH 0/5] refs/reftable: add fsck checks Karthik Nayak
                   ` (3 preceding siblings ...)
  2025-08-19 12:21 ` [PATCH 4/5] refs/reftable: add fsck check for trailing newline Karthik Nayak
@ 2025-08-19 12:21 ` Karthik Nayak
  2025-08-26 16:39 ` [PATCH 0/5] refs/reftable: add fsck checks shejialuo
  2025-09-02  7:05 ` [PATCH v2 " Karthik Nayak
  6 siblings, 0 replies; 28+ messages in thread
From: Karthik Nayak @ 2025-08-19 12:21 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak

Introduce a fsck check for the reftable backend, which checks if the
tables in 'tables.list' contain sequential update index. The tables in
the reftable backend should contain sequential update index. This fsck
check ensures that.

We must note that the reftable backend itself doesn't check to ensure
this and it also doesn't check to ensure that the index in the table
name matches the index in the header or the table. The latter is not
implemented in this fsck check either and will be added in a future
patch where we add fsck checks for internals of a table.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/fsck-msgids.adoc |  3 ++
 fsck.h                         |  1 +
 refs/reftable-backend.c        |  3 ++
 reftable/fsck.c                | 14 +++++++++-
 reftable/reftable-fsck.h       |  2 ++
 t/t0614-reftable-fsck.sh       | 62 ++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 1432b1de06..982d51876c 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -47,6 +47,9 @@
 `badReftableTableName`::
 	(ERROR) A reftable table has an invalid name.
 
+`badReftableUpdateIndex`::
+	(ERROR) Incorrect update index found for table.
+
 `badTagName`::
 	(INFO) A tag has an invalid format.
 
diff --git a/fsck.h b/fsck.h
index 33432bae79..60e9b84183 100644
--- a/fsck.h
+++ b/fsck.h
@@ -37,6 +37,7 @@ enum fsck_msg_type {
 	FUNC(BAD_REFTABLE_STACK_COUNT, ERROR)                      \
 	FUNC(BAD_REFTABLE_STACK_LIST_NEWLINE, ERROR)               \
 	FUNC(BAD_REFTABLE_TABLE_NAME, ERROR)                       \
+	FUNC(BAD_REFTABLE_UPDATE_INDEX, ERROR)                     \
 	FUNC(BAD_REF_CONTENT, ERROR)                               \
 	FUNC(BAD_REF_FILETYPE, ERROR)                              \
 	FUNC(BAD_REF_NAME, ERROR)                                  \
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 0087afa3ac..d5993238db 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -2701,6 +2701,9 @@ static int reftable_fsck_error_handler(struct reftable_fsck_info info,
 	case REFTABLE_FSCK_ERROR_STACK_LIST_MISSING_NEWLINE:
 		msg_id = FSCK_MSG_BAD_REFTABLE_STACK_LIST_NEWLINE;
 		break;
+	case REFTABLE_FSCK_ERROR_UPDATE_INDEX:
+		msg_id = FSCK_MSG_BAD_REFTABLE_UPDATE_INDEX;
+		break;
 	default:
 		BUG("unknown fsck error: %d", info.error);
 	}
diff --git a/reftable/fsck.c b/reftable/fsck.c
index b4898fd2cd..a6551b9a3c 100644
--- a/reftable/fsck.c
+++ b/reftable/fsck.c
@@ -53,8 +53,8 @@ int reftable_fsck_check(struct reftable_stack *stack,
 			reftable_fsck_verbose_fn verbose_fn,
 			void *cb_data)
 {
+	uint64_t min, max, prev_max = 0;
 	char **names = NULL;
-	uint64_t min, max;
 	int err = 0;
 
 	if (stack == NULL)
@@ -85,9 +85,21 @@ int reftable_fsck_check(struct reftable_stack *stack,
 			err = report_fn(info, cb_data);
 		}
 
+		if (min != (prev_max + 1) || max < min) {
+			struct reftable_fsck_info info = {
+				.error = REFTABLE_FSCK_ERROR_UPDATE_INDEX,
+				.path = names[i],
+				.msg = "incorrect update index in table name"
+			};
+
+			err = report_fn(info, cb_data);
+		}
+
 		if (strcmp(tail, ".ref")) {
 			err = report_fn(info, cb_data);
 		}
+
+		prev_max = max;
 	}
 
 	verbose_fn("Checking trailing newline in stack list", cb_data);
diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
index 8e6cb6c7d2..49437280bb 100644
--- a/reftable/reftable-fsck.h
+++ b/reftable/reftable-fsck.h
@@ -10,6 +10,8 @@ enum reftable_fsck_error {
 	REFTABLE_FSCK_ERROR_STACK_COUNT = -2,
 	/* Newline missing at the end of the stack list */
 	REFTABLE_FSCK_ERROR_STACK_LIST_MISSING_NEWLINE = -3,
+	/* Incorrect update index for table */
+	REFTABLE_FSCK_ERROR_UPDATE_INDEX = -4,
 };
 
 /* Represents an individual error encounctered during the FSCK checks. */
diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
index 937c5dd37a..bdcbd65a9f 100755
--- a/t/t0614-reftable-fsck.sh
+++ b/t/t0614-reftable-fsck.sh
@@ -73,4 +73,66 @@ test_expect_success 'stack list must contain trailing newline' '
 	)
 '
 
+test_expect_success 'table update index should be sequential between tables' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git commit --allow-empty -m initial &&
+
+		# Lock the existing table to disable auto-compaction
+		CUR_TABLE=$(cat .git/reftable/tables.list | tail -n1) &&
+		touch .git/reftable/${CUR_TABLE}.lock &&
+		git update-ref refs/heads/sample @ &&
+		rm .git/reftable/${CUR_TABLE}.lock &&
+
+		git refs verify 2>err &&
+		test_must_be_empty err &&
+
+		TABLE_NAME=$(cat .git/reftable/tables.list | tail -n1) &&
+		NEW_TABLE_NAME=$(echo ${TABLE_NAME} | sed "s/0003/0009/g") &&
+
+		sed "2s/.*/${NEW_TABLE_NAME}/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
+		mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
+		mv .git/reftable/${TABLE_NAME} .git/reftable/${NEW_TABLE_NAME} &&
+
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: ${NEW_TABLE_NAME}: badReftableUpdateIndex: incorrect update index in table name
+		EOF
+		test_cmp expect err
+	)
+'
+
+test_expect_success 'table update index should be sequential within a table' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git commit --allow-empty -m initial &&
+
+		# Lock the existing table to disable auto-compaction
+		CUR_TABLE=$(cat .git/reftable/tables.list | tail -n1) &&
+		touch .git/reftable/${CUR_TABLE}.lock &&
+		git update-ref refs/heads/sample @ &&
+		rm .git/reftable/${CUR_TABLE}.lock &&
+
+		git refs verify 2>err &&
+		test_must_be_empty err &&
+
+		TABLE_NAME=$(cat .git/reftable/tables.list | tail -n1) &&
+		NEW_TABLE_NAME=$(echo ${TABLE_NAME} | sed "s/\(.*\)0003/\10002/") &&
+
+		sed "2s/.*/${NEW_TABLE_NAME}/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
+		mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
+		mv .git/reftable/${TABLE_NAME} .git/reftable/${NEW_TABLE_NAME} &&
+
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: ${NEW_TABLE_NAME}: badReftableUpdateIndex: incorrect update index in table name
+		EOF
+		test_cmp expect err
+	)
+'
+
 test_done

-- 
2.50.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] refs/reftable: add fsck check for checking the table name
  2025-08-19 12:21 ` [PATCH 2/5] refs/reftable: add fsck check for checking the table name Karthik Nayak
@ 2025-08-26 16:21   ` shejialuo
  2025-09-01 13:33     ` Karthik Nayak
  0 siblings, 1 reply; 28+ messages in thread
From: shejialuo @ 2025-08-26 16:21 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git

On Tue, Aug 19, 2025 at 02:21:01PM +0200, Karthik Nayak wrote:
> The `git refs verify` command is used to run fsck checks on the
> reference backends. This command is also invoked when users run 'git
> fsck'. While the files-backend has some fsck checks added, the reftable
> backend lacks such checks. Let's add the required infrastructure and a
> check to test for the table names in the 'tables.list' of reftables.
> 
> For the infrastructure, since the reftable library is treated as an
> independent library we should ensure that the library code works
> independently without knowledge about Git's internals. To do this,
> add both 'reftable/fsck.c' and 'reftable/reftable-fsck.h'. Which

A design question here, we name the "fsck.c" for the source code but for
the header, we use "reftable-fsck.h", it is a little strange. Why not
just "fsck.h" instead of "reftable-fsck.h".

> provide an entry point 'reftable_fsck_check' for running fsck checks
> over a provided reftable stack. The callee provides the function with
> callbacks to handle issue and information reporting.
> 
> Add glue code in 'refs/reftable-backend.c' which calls the reftable
> library to perform the fsck checks. Here we also map the reftable errors
> to Git' fsck errors.
> 
> Introduce a check to validate table names for a given reftable stack.
> Also add 'badReftableTableName' as a corresponding error within Git. Add
> a test to check for this behavior.
> 
> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
> ---
>  Documentation/fsck-msgids.adoc |  3 +++
>  Makefile                       |  1 +
>  fsck.h                         |  1 +
>  meson.build                    |  1 +
>  refs/reftable-backend.c        | 61 +++++++++++++++++++++++++++++++++++++-----
>  reftable/fsck.c                | 50 ++++++++++++++++++++++++++++++++++
>  reftable/reftable-fsck.h       | 38 ++++++++++++++++++++++++++
>  t/meson.build                  |  3 ++-
>  t/t0614-reftable-fsck.sh       | 35 ++++++++++++++++++++++++
>  9 files changed, 186 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
> index 1c912615f9..784ddc0df5 100644
> --- a/Documentation/fsck-msgids.adoc
> +++ b/Documentation/fsck-msgids.adoc
> @@ -38,6 +38,9 @@
>  `badReferentName`::
>  	(ERROR) The referent name of a symref is invalid.
>  
> +`badReftableTableName`::
> +	(ERROR) A reftable table has an invalid name.
> +

When reading this, I feel a little strange. `Reftable` already indicates
it is a table. Should we simply say like the following:

    A reftable has an invalid table name

>  `badTagName`::
>  	(INFO) A tag has an invalid format.
>  
> diff --git a/Makefile b/Makefile
> index e11340c1ae..f2ddcc8d7c 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -2733,6 +2733,7 @@ REFTABLE_OBJS += reftable/error.o
>  REFTABLE_OBJS += reftable/block.o
>  REFTABLE_OBJS += reftable/blocksource.o
>  REFTABLE_OBJS += reftable/iter.o
> +REFTABLE_OBJS += reftable/fsck.o
>  REFTABLE_OBJS += reftable/merged.o
>  REFTABLE_OBJS += reftable/pq.o
>  REFTABLE_OBJS += reftable/record.o
> diff --git a/fsck.h b/fsck.h
> index 559ad57807..5901f944a1 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -34,6 +34,7 @@ enum fsck_msg_type {
>  	FUNC(BAD_PACKED_REF_HEADER, ERROR)                         \
>  	FUNC(BAD_PARENT_SHA1, ERROR)                               \
>  	FUNC(BAD_REFERENT_NAME, ERROR)                             \
> +	FUNC(BAD_REFTABLE_TABLE_NAME, ERROR)                       \
>  	FUNC(BAD_REF_CONTENT, ERROR)                               \
>  	FUNC(BAD_REF_FILETYPE, ERROR)                              \
>  	FUNC(BAD_REF_NAME, ERROR)                                  \
> diff --git a/meson.build b/meson.build
> index 5dd299b496..82879fbfaa 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -452,6 +452,7 @@ libgit_sources = [
>    'reftable/error.c',
>    'reftable/block.c',
>    'reftable/blocksource.c',
> +  'reftable/fsck.c',
>    'reftable/iter.c',
>    'reftable/merged.c',
>    'reftable/pq.c',
> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index 8dae1e1112..ccd12052f2 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -6,20 +6,21 @@
>  #include "../config.h"
>  #include "../dir.h"
>  #include "../environment.h"
> +#include "../fsck.h"
>  #include "../gettext.h"
>  #include "../hash.h"
>  #include "../hex.h"
>  #include "../iterator.h"
>  #include "../ident.h"
> -#include "../lockfile.h"

Here, we delete this header file. Is the reason that we don't need this
header file anymore?

>  #include "../object.h"
>  #include "../path.h"
>  #include "../refs.h"
>  #include "../reftable/reftable-basics.h"
> -#include "../reftable/reftable-stack.h"
> -#include "../reftable/reftable-record.h"
>  #include "../reftable/reftable-error.h"
> +#include "../reftable/reftable-fsck.h"
>  #include "../reftable/reftable-iterator.h"
> +#include "../reftable/reftable-record.h"
> +#include "../reftable/reftable-stack.h"
>  #include "../repo-settings.h"
>  #include "../setup.h"
>  #include "../strmap.h"
> @@ -2675,11 +2676,59 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
>  	return ret;
>  }
>  
> -static int reftable_be_fsck(struct ref_store *ref_store UNUSED,
> -			    struct fsck_options *o UNUSED,
> +static void reftable_fsck_verbose_handler(const char *msg, void *cb_data)
> +{
> +	struct fsck_options *o = cb_data;
> +
> +	if (o->verbose)
> +		fprintf_ln(stderr, "%s", _(msg));
> +}
> +
> +static int reftable_fsck_error_handler(struct reftable_fsck_info info,

A design question: why do we need to pass the value "info" instead of
pointer?

> +				       void *cb_data)
> +{
> +	struct fsck_options *o = cb_data;
> +	struct fsck_ref_report report = { .path = info.path };

Let's make it reverse-christmas-tree ordering.

> +	enum fsck_msg_id msg_id;
> +
> +	switch (info.error) {
> +	case REFTABLE_FSCK_ERROR_TABLE_NAME:
> +		msg_id = FSCK_MSG_BAD_REFTABLE_TABLE_NAME;
> +		break;
> +	default:
> +		BUG("unknown fsck error: %d", info.error);
> +	}
> +
> +	return fsck_report_ref(o, &report, msg_id, "%s", info.msg);
> +}
> +
> +static int reftable_be_fsck(struct ref_store *ref_store, struct fsck_options *o,
>  			    struct worktree *wt UNUSED)
>  {
> -	return 0;
> +	struct reftable_ref_store *refs;
> +	struct strmap_entry *entry;
> +	struct hashmap_iter iter;
> +	int ret = 0;
> +
> +	refs = reftable_be_downcast(ref_store, REF_STORE_READ, "fsck");
> +
> +	if (o->verbose)
> +		fprintf_ln(stderr, _("Checking references consistency"));
> +
> +	ret = reftable_fsck_check(refs->main_backend.stack, reftable_fsck_error_handler,
> +				  reftable_fsck_verbose_handler, o);
> +	if (!ret)
> +		return ret;
> +

From my understanding, if we find that there is any trouble in the main
worktree reftable backend, we would just abort the check. Should we
continue to check the linked worktrees?

> +	strmap_for_each_entry(&refs->worktree_backends, &iter, entry) {
> +		struct reftable_backend *b = (struct reftable_backend *)entry->value;
> +		ret = reftable_fsck_check(b->stack, reftable_fsck_error_handler,
> +					  reftable_fsck_verbose_handler, o);
> +		if (!ret)
> +			return ret;
> +	}
> +
> +	return ret;
>  }
>  
>  struct ref_storage_be refs_be_reftable = {
> diff --git a/reftable/fsck.c b/reftable/fsck.c
> new file mode 100644
> index 0000000000..22ec3c26e9
> --- /dev/null
> +++ b/reftable/fsck.c
> @@ -0,0 +1,50 @@
> +#include "basics.h"
> +#include "reftable-fsck.h"
> +#include "stack.h"
> +
> +int reftable_fsck_check(struct reftable_stack *stack,
> +			reftable_fsck_report_fn report_fn,
> +			reftable_fsck_verbose_fn verbose_fn,
> +			void *cb_data)
> +{
> +	char **names = NULL;
> +	uint64_t min, max;
> +	int err = 0;
> +
> +	if (stack == NULL)
> +		goto out;
> +
> +	err = read_lines(stack->list_file, &names);
> +	if (err < 0)
> +		goto out;
> +
> +	verbose_fn("Checking reftable table names", cb_data);
> +
> +	for (size_t i = 0; names[i]; i++) {
> +		struct reftable_fsck_info info = {
> +			.error = REFTABLE_FSCK_ERROR_TABLE_NAME,
> +			.path = names[i],
> +			.msg = "invalid reftable name"
> +		};

Should we define this data structure outside of the loop? It's
unnecessary here as we could change ".path" and ".msg" dynamically in
the loop.

> +		uint32_t rnd;
> +		/*
> +		 * We want to match the tail '.ref'. One extra byte to ensure
> +		 * that there is no unexpected extra character and one byte for
> +		 * the null terminator added by sscanf.
> +		 */
> +		char tail[6];
> +
> +		if (sscanf(names[i], "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x%5s",
> +			   &min, &max, &rnd, tail) != 4) {
> +			err = report_fn(info, cb_data);

I think we could just pass pointer to avoid unnecessary copy operations.
Besides that, I think here we report two different kinds of problem. But
we would give report the user always the same message `invalid reftable
name`. This is too vague.

I think we'd better set different messages for different problems.

> +		}
> +
> +		if (strcmp(tail, ".ref")) {
> +			err = report_fn(info, cb_data);
> +		}
> +	}
> +
> +out:
> +	free_names(names);
> +	return err;
> +}
> diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
> new file mode 100644
> index 0000000000..087430d979
> --- /dev/null
> +++ b/reftable/reftable-fsck.h
> @@ -0,0 +1,38 @@
> +#ifndef REFTABLE_FSCK_H
> +#define REFTABLE_FSCK_H
> +
> +#include "reftable-stack.h"
> +
> +enum reftable_fsck_error {
> +	/* Invalid table name */
> +	REFTABLE_FSCK_ERROR_TABLE_NAME = -1,
> +};
> +
> +/* Represents an individual error encounctered during the FSCK checks. */
> +struct reftable_fsck_info {
> +	enum reftable_fsck_error error;
> +	const char *msg;
> +	const char *path;
> +};
> +
> +typedef int reftable_fsck_report_fn(struct reftable_fsck_info info,
> +				    void *cb_data);

As I have explained above, we should use `struct reftable_fsck_info
*info` instead of `struct reftable_fsck_info info`.

> +typedef void reftable_fsck_verbose_fn(const char *msg, void *cb_data);
> +
> +/*
> + * Given a reftable stack, perform FSCK check on the stack.
> + *
> + * If an issue is encountered, the issue is reported to the callee via the
> + * provided 'report_fn'. If the issue is non-recoverable the flow will not
> + * conitnue. If it is recoverable, the flow will continue and further issues
> + * will be reported as identified.
> + *
> + * The 'verbose_fn' will be invoked to provide verbose information about
> + * the progress and state of the FSCK checks.
> + */
> +int reftable_fsck_check(struct reftable_stack *stack,
> +			reftable_fsck_report_fn report_fn,
> +			reftable_fsck_verbose_fn verbose_fn,
> +			void *cb_data);
> +
> +#endif /* REFTABLE_FSCK_H */
> diff --git a/t/meson.build b/t/meson.build
> index bbeba1a8d5..a8eb44eb30 100644
> --- a/t/meson.build
> +++ b/t/meson.build
> @@ -145,6 +145,7 @@ integration_tests = [
>    't0611-reftable-httpd.sh',
>    't0612-reftable-jgit-compatibility.sh',
>    't0613-reftable-write-options.sh',
> +  't0614-reftable-fsck.sh',
>    't1000-read-tree-m-3way.sh',
>    't1001-read-tree-m-2way.sh',
>    't1002-read-tree-m-u-2way.sh',
> @@ -1214,4 +1215,4 @@ if perl.found() and time.found()
>        timeout: 0,
>      )
>    endforeach
> -endif
> \ No newline at end of file
> +endif
> diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
> new file mode 100755
> index 0000000000..0d11871b1c
> --- /dev/null
> +++ b/t/t0614-reftable-fsck.sh
> @@ -0,0 +1,35 @@
> +#!/bin/sh
> +
> +test_description='Test reftable backend consistency check'
> +
> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
> +GIT_TEST_DEFAULT_REF_FORMAT=reftable
> +export GIT_TEST_DEFAULT_REF_FORMAT
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'table name should be checked' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		git commit --allow-empty -m initial &&
> +
> +		git refs verify 2>err &&
> +		test_must_be_empty err &&
> +
> +		TABLE_NAME=$(cat .git/reftable/tables.list | head -n1) &&
> +		sed "1s/$/extra/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
> +		mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
> +		mv .git/reftable/${TABLE_NAME} .git/reftable/${TABLE_NAME}extra &&
> +
> +		test_must_fail git refs verify 2>err &&
> +		cat >expect <<-EOF &&
> +		error: ${TABLE_NAME}extra: badReftableTableName: invalid reftable name
> +		EOF
> +		test_cmp expect err
> +	)
> +'

We would check two kinds of errors, should we add two tests instead of
only this one.

> +
> +test_done
> 
> -- 
> 2.50.1
> 

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/5] refs/reftable: add fsck check for number of tables
  2025-08-19 12:21 ` [PATCH 3/5] refs/reftable: add fsck check for number of tables Karthik Nayak
@ 2025-08-26 16:33   ` shejialuo
  2025-09-01 13:40     ` Karthik Nayak
  2025-08-26 16:44   ` shejialuo
  1 sibling, 1 reply; 28+ messages in thread
From: shejialuo @ 2025-08-26 16:33 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git

On Tue, Aug 19, 2025 at 02:21:02PM +0200, Karthik Nayak wrote:
> diff --git a/reftable/fsck.c b/reftable/fsck.c
> index 22ec3c26e9..e92a630276 100644
> --- a/reftable/fsck.c
> +++ b/reftable/fsck.c
> @@ -2,6 +2,28 @@
>  #include "reftable-fsck.h"
>  #include "stack.h"
>  
> +static int reftable_fsck_valid_stack_count(struct reftable_stack *st)
> +{
> +	DIR *dir = opendir(st->reftable_dir);
> +	struct dirent *d = NULL;
> +	unsigned int count = 0;
> +
> +	if (!dir)
> +		return 0;
> +
> +	while ((d = readdir(dir))) {
> +		if (!strcmp(d->d_name, "tables.list"))
> +			continue;
> +
> +		if (d->d_type == DT_REG)
> +			count++;
> +	}
> +
> +	closedir(dir);
> +
> +	return count == st->tables_len;
> +}
> +

The above logic is clear to understand but I think we should our
internal interface in "dir-iterator.h" to implement above logic.

>  int reftable_fsck_check(struct reftable_stack *stack,
>  			reftable_fsck_report_fn report_fn,
>  			reftable_fsck_verbose_fn verbose_fn,
> @@ -44,6 +66,18 @@ int reftable_fsck_check(struct reftable_stack *stack,
>  		}
>  	}
>  
> +	verbose_fn("Checking reftable tables count", cb_data);
> +
> +	if (!reftable_fsck_valid_stack_count(stack)) {
> +		struct reftable_fsck_info info = {
> +			.error = REFTABLE_FSCK_ERROR_STACK_COUNT,
> +			.path = stack->list_file,
> +			.msg = "mismatch in number of tables"
> +		};
> +

When reading here, I somehow understand the reason why you define this
data structure in the loop. But I still think we could just define only
one `info`.

BTY, I wonder whether we should define some auxiliary functions for each
check instead of adding logic directly in `reftable_fsck_check`
function?

> +		err = report_fn(info, cb_data);
> +	}
> +
>  out:
>  	free_names(names);
>  	return err;

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/5] refs/reftable: add fsck checks
  2025-08-19 12:20 [PATCH 0/5] refs/reftable: add fsck checks Karthik Nayak
                   ` (4 preceding siblings ...)
  2025-08-19 12:21 ` [PATCH 5/5] refs/reftable: add fsck check for incorrect update index Karthik Nayak
@ 2025-08-26 16:39 ` shejialuo
  2025-09-01 13:52   ` Karthik Nayak
  2025-09-02  7:05 ` [PATCH v2 " Karthik Nayak
  6 siblings, 1 reply; 28+ messages in thread
From: shejialuo @ 2025-08-26 16:39 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git

On Tue, Aug 19, 2025 at 02:20:59PM +0200, Karthik Nayak wrote:
> This series adds the required infrastructure and also some fsck checks
> for the reftable backend.
> 
> Since the reftable backend is treated as a library within the Git
> codebase, we don't want to spillover our internal fsck implementation
> into the library. At the same time, the fsck checks need to access
> internal structures of the reftable library which aren't exposed outside
> the library.
> 
> So we solve this by adding a 'reftable/fsck.[ch]' which implements and
> exposes a checker for the reftable library and returns specific errors
> as defined by the library. We then add glue code within
> 'refs/reftable-backend.c' to map these errors to errors which Git's fsck
> implementation would understand. This allows us to separate concerns.
> 
> This series then adds some checks on the stack ('reftable/tables.list')
> level of reftable, namely:
> 1. The table name is as per the spec
> 2. The number of tables are consistent
> 3. The tables.list has a newline at the end of file
> 4. The table names follow correct index sequences
> 
> I also plan to send in follow up series's which will implement further
> checks and go into deeper layers (tables, block, references).
> 

Thanks for your patches, it's very nice to see that we begin to
implement the consistency checks for reftable backend. And I have left
some comments.

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/5] refs/reftable: add fsck check for number of tables
  2025-08-19 12:21 ` [PATCH 3/5] refs/reftable: add fsck check for number of tables Karthik Nayak
  2025-08-26 16:33   ` shejialuo
@ 2025-08-26 16:44   ` shejialuo
  2025-09-01 13:52     ` Karthik Nayak
  1 sibling, 1 reply; 28+ messages in thread
From: shejialuo @ 2025-08-26 16:44 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git

On Tue, Aug 19, 2025 at 02:21:02PM +0200, Karthik Nayak wrote:
> +test_expect_success 'table count should be checked' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		git commit --allow-empty -m initial &&
> +
> +		git refs verify 2>err &&
> +		test_must_be_empty err &&
> +
> +		touch .git/reftable/0x000000002812-0x000000002813-c830a596.ref &&
> +
> +		test_must_fail git refs verify 2>err &&
> +		cat >expect <<-EOF &&
> +		error: $(pwd)/.git/reftable/tables.list: badReftableStackCount: mismatch in number of tables

This is a bad usage, we should just use `reftable/tables.list`. And this
is a common pattern. We would print the relative path against the ".git"
directory.

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] refs/reftable: add fsck check for checking the table name
  2025-08-26 16:21   ` shejialuo
@ 2025-09-01 13:33     ` Karthik Nayak
  2025-09-03 13:39       ` shejialuo
  0 siblings, 1 reply; 28+ messages in thread
From: Karthik Nayak @ 2025-09-01 13:33 UTC (permalink / raw)
  To: shejialuo; +Cc: git, ps

[-- Attachment #1: Type: text/plain, Size: 9908 bytes --]

shejialuo <shejialuo@gmail.com> writes:

> On Tue, Aug 19, 2025 at 02:21:01PM +0200, Karthik Nayak wrote:
>> The `git refs verify` command is used to run fsck checks on the
>> reference backends. This command is also invoked when users run 'git
>> fsck'. While the files-backend has some fsck checks added, the reftable
>> backend lacks such checks. Let's add the required infrastructure and a
>> check to test for the table names in the 'tables.list' of reftables.
>>
>> For the infrastructure, since the reftable library is treated as an
>> independent library we should ensure that the library code works
>> independently without knowledge about Git's internals. To do this,
>> add both 'reftable/fsck.c' and 'reftable/reftable-fsck.h'. Which
>
> A design question here, we name the "fsck.c" for the source code but for
> the header, we use "reftable-fsck.h", it is a little strange. Why not
> just "fsck.h" instead of "reftable-fsck.h".
>

Since the reftable code is treated as an external library, all
'reftable-.*.h' headers are treated as headers which expose APIs for the
libraries users. We would have defined 'reftable/fsck.h' if there were
internal users of the 'fsck.c' code. But there are none.


>> diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
>> index 1c912615f9..784ddc0df5 100644
>> --- a/Documentation/fsck-msgids.adoc
>> +++ b/Documentation/fsck-msgids.adoc
>> @@ -38,6 +38,9 @@
>>  `badReferentName`::
>>  	(ERROR) The referent name of a symref is invalid.
>>
>> +`badReftableTableName`::
>> +	(ERROR) A reftable table has an invalid name.
>> +
>
> When reading this, I feel a little strange. `Reftable` already indicates
> it is a table. Should we simply say like the following:
>
>     A reftable has an invalid table name
>

I'm not sure about this, since 'reftable' refers to the reference
backend and the 'table' refers to an individual table within the
'reftable' format. I would say both are important.

CC'ing Patrick here for a second opinion.

>>  `badTagName`::
>>  	(INFO) A tag has an invalid format.
>>
>> diff --git a/Makefile b/Makefile
>> index e11340c1ae..f2ddcc8d7c 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -2733,6 +2733,7 @@ REFTABLE_OBJS += reftable/error.o
>>  REFTABLE_OBJS += reftable/block.o
>>  REFTABLE_OBJS += reftable/blocksource.o
>>  REFTABLE_OBJS += reftable/iter.o
>> +REFTABLE_OBJS += reftable/fsck.o
>>  REFTABLE_OBJS += reftable/merged.o
>>  REFTABLE_OBJS += reftable/pq.o
>>  REFTABLE_OBJS += reftable/record.o
>> diff --git a/fsck.h b/fsck.h
>> index 559ad57807..5901f944a1 100644
>> --- a/fsck.h
>> +++ b/fsck.h
>> @@ -34,6 +34,7 @@ enum fsck_msg_type {
>>  	FUNC(BAD_PACKED_REF_HEADER, ERROR)                         \
>>  	FUNC(BAD_PARENT_SHA1, ERROR)                               \
>>  	FUNC(BAD_REFERENT_NAME, ERROR)                             \
>> +	FUNC(BAD_REFTABLE_TABLE_NAME, ERROR)                       \
>>  	FUNC(BAD_REF_CONTENT, ERROR)                               \
>>  	FUNC(BAD_REF_FILETYPE, ERROR)                              \
>>  	FUNC(BAD_REF_NAME, ERROR)                                  \
>> diff --git a/meson.build b/meson.build
>> index 5dd299b496..82879fbfaa 100644
>> --- a/meson.build
>> +++ b/meson.build
>> @@ -452,6 +452,7 @@ libgit_sources = [
>>    'reftable/error.c',
>>    'reftable/block.c',
>>    'reftable/blocksource.c',
>> +  'reftable/fsck.c',
>>    'reftable/iter.c',
>>    'reftable/merged.c',
>>    'reftable/pq.c',
>> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
>> index 8dae1e1112..ccd12052f2 100644
>> --- a/refs/reftable-backend.c
>> +++ b/refs/reftable-backend.c
>> @@ -6,20 +6,21 @@
>>  #include "../config.h"
>>  #include "../dir.h"
>>  #include "../environment.h"
>> +#include "../fsck.h"
>>  #include "../gettext.h"
>>  #include "../hash.h"
>>  #include "../hex.h"
>>  #include "../iterator.h"
>>  #include "../ident.h"
>> -#include "../lockfile.h"
>
> Here, we delete this header file. Is the reason that we don't need this
> header file anymore?
>

Yes, it wasn't needed in the first place, let me add a comment in the
commit message.

>>  #include "../object.h"
>>  #include "../path.h"
>>  #include "../refs.h"
>>  #include "../reftable/reftable-basics.h"
>> -#include "../reftable/reftable-stack.h"
>> -#include "../reftable/reftable-record.h"
>>  #include "../reftable/reftable-error.h"
>> +#include "../reftable/reftable-fsck.h"
>>  #include "../reftable/reftable-iterator.h"
>> +#include "../reftable/reftable-record.h"
>> +#include "../reftable/reftable-stack.h"
>>  #include "../repo-settings.h"
>>  #include "../setup.h"
>>  #include "../strmap.h"
>> @@ -2675,11 +2676,59 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
>>  	return ret;
>>  }
>>
>> -static int reftable_be_fsck(struct ref_store *ref_store UNUSED,
>> -			    struct fsck_options *o UNUSED,
>> +static void reftable_fsck_verbose_handler(const char *msg, void *cb_data)
>> +{
>> +	struct fsck_options *o = cb_data;
>> +
>> +	if (o->verbose)
>> +		fprintf_ln(stderr, "%s", _(msg));
>> +}
>> +
>> +static int reftable_fsck_error_handler(struct reftable_fsck_info info,
>
> A design question: why do we need to pass the value "info" instead of
> pointer?
>

I didn't see a reason to make it a pointer. But it does make it more
efficient when the struct size increases. Let me change it!

>
>> +				       void *cb_data)
>> +{
>> +	struct fsck_options *o = cb_data;
>> +	struct fsck_ref_report report = { .path = info.path };
>
> Let's make it reverse-christmas-tree ordering.
>

Will change!

>> +static int reftable_be_fsck(struct ref_store *ref_store, struct fsck_options *o,
>>  			    struct worktree *wt UNUSED)
>>  {
>> -	return 0;
>> +	struct reftable_ref_store *refs;
>> +	struct strmap_entry *entry;
>> +	struct hashmap_iter iter;
>> +	int ret = 0;
>> +
>> +	refs = reftable_be_downcast(ref_store, REF_STORE_READ, "fsck");
>> +
>> +	if (o->verbose)
>> +		fprintf_ln(stderr, _("Checking references consistency"));
>> +
>> +	ret = reftable_fsck_check(refs->main_backend.stack, reftable_fsck_error_handler,
>> +				  reftable_fsck_verbose_handler, o);
>> +	if (!ret)
>> +		return ret;
>> +
>
> From my understanding, if we find that there is any trouble in the main
> worktree reftable backend, we would just abort the check. Should we
> continue to check the linked worktrees?
>

I think that makes sense. Let me make that change.

>> diff --git a/reftable/fsck.c b/reftable/fsck.c
>> new file mode 100644
>> index 0000000000..22ec3c26e9
>> --- /dev/null
>> +++ b/reftable/fsck.c
>> @@ -0,0 +1,50 @@
>> +#include "basics.h"
>> +#include "reftable-fsck.h"
>> +#include "stack.h"
>> +
>> +int reftable_fsck_check(struct reftable_stack *stack,
>> +			reftable_fsck_report_fn report_fn,
>> +			reftable_fsck_verbose_fn verbose_fn,
>> +			void *cb_data)
>> +{
>> +	char **names = NULL;
>> +	uint64_t min, max;
>> +	int err = 0;
>> +
>> +	if (stack == NULL)
>> +		goto out;
>> +
>> +	err = read_lines(stack->list_file, &names);
>> +	if (err < 0)
>> +		goto out;
>> +
>> +	verbose_fn("Checking reftable table names", cb_data);
>> +
>> +	for (size_t i = 0; names[i]; i++) {
>> +		struct reftable_fsck_info info = {
>> +			.error = REFTABLE_FSCK_ERROR_TABLE_NAME,
>> +			.path = names[i],
>> +			.msg = "invalid reftable name"
>> +		};
>
> Should we define this data structure outside of the loop? It's
> unnecessary here as we could change ".path" and ".msg" dynamically in
> the loop.
>

I don't think it'd make much difference for reftables, since tables are
geometrically packed. But I don't feel strongly, so I'll make the
change.

>> +		uint32_t rnd;
>> +		/*
>> +		 * We want to match the tail '.ref'. One extra byte to ensure
>> +		 * that there is no unexpected extra character and one byte for
>> +		 * the null terminator added by sscanf.
>> +		 */
>> +		char tail[6];
>> +
>> +		if (sscanf(names[i], "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x%5s",
>> +			   &min, &max, &rnd, tail) != 4) {
>> +			err = report_fn(info, cb_data);
>
> I think we could just pass pointer to avoid unnecessary copy operations.
> Besides that, I think here we report two different kinds of problem. But
> we would give report the user always the same message `invalid reftable
> name`. This is too vague.
>

Not sure what you mean by 'unnecessary copy operations', could you
elaborate?

> I think we'd better set different messages for different problems.
>

Fair enough, let me modify that.

[snip]

>> diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
>> new file mode 100755
>> index 0000000000..0d11871b1c
>> --- /dev/null
>> +++ b/t/t0614-reftable-fsck.sh
>> @@ -0,0 +1,35 @@
>> +#!/bin/sh
>> +
>> +test_description='Test reftable backend consistency check'
>> +
>> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
>> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
>> +GIT_TEST_DEFAULT_REF_FORMAT=reftable
>> +export GIT_TEST_DEFAULT_REF_FORMAT
>> +
>> +. ./test-lib.sh
>> +
>> +test_expect_success 'table name should be checked' '
>> +	test_when_finished "rm -rf repo" &&
>> +	git init repo &&
>> +	(
>> +		cd repo &&
>> +		git commit --allow-empty -m initial &&
>> +
>> +		git refs verify 2>err &&
>> +		test_must_be_empty err &&
>> +
>> +		TABLE_NAME=$(cat .git/reftable/tables.list | head -n1) &&
>> +		sed "1s/$/extra/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
>> +		mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
>> +		mv .git/reftable/${TABLE_NAME} .git/reftable/${TABLE_NAME}extra &&
>> +
>> +		test_must_fail git refs verify 2>err &&
>> +		cat >expect <<-EOF &&
>> +		error: ${TABLE_NAME}extra: badReftableTableName: invalid reftable name
>> +		EOF
>> +		test_cmp expect err
>> +	)
>> +'
>
> We would check two kinds of errors, should we add two tests instead of
> only this one.
>

Yeah, makes sense, will add!

>> +
>> +test_done
>>
>> --
>> 2.50.1
>>
>
> Thanks,
> Jialuo

Thanks for the review.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/5] refs/reftable: add fsck check for number of tables
  2025-08-26 16:33   ` shejialuo
@ 2025-09-01 13:40     ` Karthik Nayak
  0 siblings, 0 replies; 28+ messages in thread
From: Karthik Nayak @ 2025-09-01 13:40 UTC (permalink / raw)
  To: shejialuo; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2447 bytes --]

shejialuo <shejialuo@gmail.com> writes:

> On Tue, Aug 19, 2025 at 02:21:02PM +0200, Karthik Nayak wrote:
>> diff --git a/reftable/fsck.c b/reftable/fsck.c
>> index 22ec3c26e9..e92a630276 100644
>> --- a/reftable/fsck.c
>> +++ b/reftable/fsck.c
>> @@ -2,6 +2,28 @@
>>  #include "reftable-fsck.h"
>>  #include "stack.h"
>>
>> +static int reftable_fsck_valid_stack_count(struct reftable_stack *st)
>> +{
>> +	DIR *dir = opendir(st->reftable_dir);
>> +	struct dirent *d = NULL;
>> +	unsigned int count = 0;
>> +
>> +	if (!dir)
>> +		return 0;
>> +
>> +	while ((d = readdir(dir))) {
>> +		if (!strcmp(d->d_name, "tables.list"))
>> +			continue;
>> +
>> +		if (d->d_type == DT_REG)
>> +			count++;
>> +	}
>> +
>> +	closedir(dir);
>> +
>> +	return count == st->tables_len;
>> +}
>> +
>
> The above logic is clear to understand but I think we should our
> internal interface in "dir-iterator.h" to implement above logic.
>

Since the reftable library is treated as external one. We can't add and
rely on code outside of the library. That's why you'll see some
duplication here and there.

>>  int reftable_fsck_check(struct reftable_stack *stack,
>>  			reftable_fsck_report_fn report_fn,
>>  			reftable_fsck_verbose_fn verbose_fn,
>> @@ -44,6 +66,18 @@ int reftable_fsck_check(struct reftable_stack *stack,
>>  		}
>>  	}
>>
>> +	verbose_fn("Checking reftable tables count", cb_data);
>> +
>> +	if (!reftable_fsck_valid_stack_count(stack)) {
>> +		struct reftable_fsck_info info = {
>> +			.error = REFTABLE_FSCK_ERROR_STACK_COUNT,
>> +			.path = stack->list_file,
>> +			.msg = "mismatch in number of tables"
>> +		};
>> +
>
> When reading here, I somehow understand the reason why you define this
> data structure in the loop. But I still think we could just define only
> one `info`.
>

I tried to rewrite it like you suggested, but I think it still makes
sense to keep the error definitions separate. They help provide
localized context. Otherwise, we'd define the error at the start, then
set individual fields later on. This causes some confusion.

>
> BTY, I wonder whether we should define some auxiliary functions for each
> check instead of adding logic directly in `reftable_fsck_check`
> function?
>

Post this patch series we'll dive into block and reference checks, which
will be isolated into individual functions.

>> +		err = report_fn(info, cb_data);
>> +	}
>> +
>>  out:
>>  	free_names(names);
>>  	return err;
>
> Thanks,
> Jialuo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/5] refs/reftable: add fsck check for number of tables
  2025-08-26 16:44   ` shejialuo
@ 2025-09-01 13:52     ` Karthik Nayak
  0 siblings, 0 replies; 28+ messages in thread
From: Karthik Nayak @ 2025-09-01 13:52 UTC (permalink / raw)
  To: shejialuo; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 915 bytes --]

shejialuo <shejialuo@gmail.com> writes:

> On Tue, Aug 19, 2025 at 02:21:02PM +0200, Karthik Nayak wrote:
>> +test_expect_success 'table count should be checked' '
>> +	test_when_finished "rm -rf repo" &&
>> +	git init repo &&
>> +	(
>> +		cd repo &&
>> +		git commit --allow-empty -m initial &&
>> +
>> +		git refs verify 2>err &&
>> +		test_must_be_empty err &&
>> +
>> +		touch .git/reftable/0x000000002812-0x000000002813-c830a596.ref &&
>> +
>> +		test_must_fail git refs verify 2>err &&
>> +		cat >expect <<-EOF &&
>> +		error: $(pwd)/.git/reftable/tables.list: badReftableStackCount: mismatch in number of tables
>
> This is a bad usage, we should just use `reftable/tables.list`. And this
> is a common pattern. We would print the relative path against the ".git"
> directory.
>

Good point, this can be fixed to 'reftable/tables.list', we don't need
to obtain it from the stack.

> Thanks,
> Jialuo

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/5] refs/reftable: add fsck checks
  2025-08-26 16:39 ` [PATCH 0/5] refs/reftable: add fsck checks shejialuo
@ 2025-09-01 13:52   ` Karthik Nayak
  0 siblings, 0 replies; 28+ messages in thread
From: Karthik Nayak @ 2025-09-01 13:52 UTC (permalink / raw)
  To: shejialuo; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1573 bytes --]

shejialuo <shejialuo@gmail.com> writes:

> On Tue, Aug 19, 2025 at 02:20:59PM +0200, Karthik Nayak wrote:
>> This series adds the required infrastructure and also some fsck checks
>> for the reftable backend.
>>
>> Since the reftable backend is treated as a library within the Git
>> codebase, we don't want to spillover our internal fsck implementation
>> into the library. At the same time, the fsck checks need to access
>> internal structures of the reftable library which aren't exposed outside
>> the library.
>>
>> So we solve this by adding a 'reftable/fsck.[ch]' which implements and
>> exposes a checker for the reftable library and returns specific errors
>> as defined by the library. We then add glue code within
>> 'refs/reftable-backend.c' to map these errors to errors which Git's fsck
>> implementation would understand. This allows us to separate concerns.
>>
>> This series then adds some checks on the stack ('reftable/tables.list')
>> level of reftable, namely:
>> 1. The table name is as per the spec
>> 2. The number of tables are consistent
>> 3. The tables.list has a newline at the end of file
>> 4. The table names follow correct index sequences
>>
>> I also plan to send in follow up series's which will implement further
>> checks and go into deeper layers (tables, block, references).
>>
>
> Thanks for your patches, it's very nice to see that we begin to
> implement the consistency checks for reftable backend. And I have left
> some comments.
>

Thanks for your comments and the review. I'll send in a new version soon.

> Thanks,
> Jialuo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 0/5] refs/reftable: add fsck checks
  2025-08-19 12:20 [PATCH 0/5] refs/reftable: add fsck checks Karthik Nayak
                   ` (5 preceding siblings ...)
  2025-08-26 16:39 ` [PATCH 0/5] refs/reftable: add fsck checks shejialuo
@ 2025-09-02  7:05 ` Karthik Nayak
  2025-09-02  7:05   ` [PATCH v2 1/5] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
                     ` (4 more replies)
  6 siblings, 5 replies; 28+ messages in thread
From: Karthik Nayak @ 2025-09-02  7:05 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, jltobler, shejialuo

This series adds the required infrastructure and also some fsck checks
for the reftable backend.

Since the reftable backend is treated as a library within the Git
codebase, we don't want to spillover our internal fsck implementation
into the library. At the same time, the fsck checks need to access
internal structures of the reftable library which aren't exposed outside
the library.

So we solve this by adding a 'reftable/fsck.[ch]' which implements and
exposes a checker for the reftable library and returns specific errors
as defined by the library. We then add glue code within
'refs/reftable-backend.c' to map these errors to errors which Git's fsck
implementation would understand. This allows us to separate concerns.

This series then adds some checks on the stack ('reftable/tables.list')
level of reftable, namely:
1. The table name is as per the spec
2. The number of tables are consistent
3. The tables.list has a newline at the end of file
4. The table names follow correct index sequences

I also plan to send in follow up series's which will implement further
checks and go into deeper layers (tables, block, references).

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Changes in v2:
- Ensured that 'struct reftable_fsck_info' is passed around as a
  pointer, this provides a smaller footprint (pointer size vs struct
  size).
- Run FSCK checks for other worktrees too, even if one of them fails.
- Separate messaging for table name vs table check and add additional
  test.
- Use the relative path in messages used.
- Small style and typo fixes.
- Link to v1: https://lore.kernel.org/r/20250819-228-reftable-introduce-consistency-checks-v1-0-8b8f6879fa9e@gmail.com

---
 Documentation/fsck-msgids.adoc |  15 +++-
 Makefile                       |   1 +
 fsck.h                         | 154 ++++++++++++++++++++-------------------
 meson.build                    |   1 +
 refs/reftable-backend.c        |  66 +++++++++++++++--
 reftable/fsck.c                | 134 ++++++++++++++++++++++++++++++++++
 reftable/reftable-fsck.h       |  44 +++++++++++
 t/meson.build                  |   3 +-
 t/t0614-reftable-fsck.sh       | 161 +++++++++++++++++++++++++++++++++++++++++
 9 files changed, 494 insertions(+), 85 deletions(-)

Karthik Nayak (5):
      fsck: order 'fsck_msg_type' alphabetically
      refs/reftable: add fsck check for checking the table name
      refs/reftable: add fsck check for number of tables
      refs/reftable: add fsck check for trailing newline
      refs/reftable: add fsck check for incorrect update index

Range-diff versus v1:

1:  d1875fbbc7 = 1:  c049cd428a fsck: order 'fsck_msg_type' alphabetically
2:  b63799aad1 ! 2:  1e46786745 refs/reftable: add fsck check for checking the table name
    @@ Commit message
         Also add 'badReftableTableName' as a corresponding error within Git. Add
         a test to check for this behavior.
     
    +    While here, remove a unused header `#include "../lockfile.h"` from
    +    'refs/reftable-backend.c'.
    +
         Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
     
      ## Documentation/fsck-msgids.adoc ##
    @@ refs/reftable-backend.c: static int reftable_be_reflog_expire(struct ref_store *
     +		fprintf_ln(stderr, "%s", _(msg));
     +}
     +
    -+static int reftable_fsck_error_handler(struct reftable_fsck_info info,
    ++static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
     +				       void *cb_data)
     +{
    ++	struct fsck_ref_report report = { .path = info->path };
     +	struct fsck_options *o = cb_data;
    -+	struct fsck_ref_report report = { .path = info.path };
     +	enum fsck_msg_id msg_id;
     +
    -+	switch (info.error) {
    ++	switch (info->error) {
     +	case REFTABLE_FSCK_ERROR_TABLE_NAME:
     +		msg_id = FSCK_MSG_BAD_REFTABLE_TABLE_NAME;
     +		break;
     +	default:
    -+		BUG("unknown fsck error: %d", info.error);
    ++		BUG("unknown fsck error: %d", info->error);
     +	}
     +
    -+	return fsck_report_ref(o, &report, msg_id, "%s", info.msg);
    ++	return fsck_report_ref(o, &report, msg_id, "%s", info->msg);
     +}
     +
     +static int reftable_be_fsck(struct ref_store *ref_store, struct fsck_options *o,
    @@ refs/reftable-backend.c: static int reftable_be_reflog_expire(struct ref_store *
     +	if (o->verbose)
     +		fprintf_ln(stderr, _("Checking references consistency"));
     +
    -+	ret = reftable_fsck_check(refs->main_backend.stack, reftable_fsck_error_handler,
    ++	ret |= reftable_fsck_check(refs->main_backend.stack, reftable_fsck_error_handler,
     +				  reftable_fsck_verbose_handler, o);
    -+	if (!ret)
    -+		return ret;
     +
     +	strmap_for_each_entry(&refs->worktree_backends, &iter, entry) {
     +		struct reftable_backend *b = (struct reftable_backend *)entry->value;
    -+		ret = reftable_fsck_check(b->stack, reftable_fsck_error_handler,
    ++		ret |= reftable_fsck_check(b->stack, reftable_fsck_error_handler,
     +					  reftable_fsck_verbose_handler, o);
    -+		if (!ret)
    -+			return ret;
     +	}
     +
     +	return ret;
    @@ reftable/fsck.c (new)
     +			reftable_fsck_verbose_fn verbose_fn,
     +			void *cb_data)
     +{
    ++
     +	char **names = NULL;
     +	uint64_t min, max;
     +	int err = 0;
    @@ reftable/fsck.c (new)
     +		struct reftable_fsck_info info = {
     +			.error = REFTABLE_FSCK_ERROR_TABLE_NAME,
     +			.path = names[i],
    -+			.msg = "invalid reftable name"
     +		};
     +		uint32_t rnd;
     +		/*
    @@ reftable/fsck.c (new)
     +
     +		if (sscanf(names[i], "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x%5s",
     +			   &min, &max, &rnd, tail) != 4) {
    -+			err = report_fn(info, cb_data);
    ++			info.msg = "invalid reftable table name";
    ++			err = report_fn(&info, cb_data);
    ++			continue;
     +		}
     +
     +		if (strcmp(tail, ".ref")) {
    -+			err = report_fn(info, cb_data);
    ++			info.msg = "invalid reftable table extension";
    ++			err = report_fn(&info, cb_data);
     +		}
     +	}
     +
    @@ reftable/reftable-fsck.h (new)
     +	REFTABLE_FSCK_ERROR_TABLE_NAME = -1,
     +};
     +
    -+/* Represents an individual error encounctered during the FSCK checks. */
    ++/* Represents an individual error encountered during the FSCK checks. */
     +struct reftable_fsck_info {
     +	enum reftable_fsck_error error;
     +	const char *msg;
     +	const char *path;
     +};
     +
    -+typedef int reftable_fsck_report_fn(struct reftable_fsck_info info,
    ++typedef int reftable_fsck_report_fn(struct reftable_fsck_info *info,
     +				    void *cb_data);
     +typedef void reftable_fsck_verbose_fn(const char *msg, void *cb_data);
     +
    @@ reftable/reftable-fsck.h (new)
     + *
     + * If an issue is encountered, the issue is reported to the callee via the
     + * provided 'report_fn'. If the issue is non-recoverable the flow will not
    -+ * conitnue. If it is recoverable, the flow will continue and further issues
    ++ * continue. If it is recoverable, the flow will continue and further issues
     + * will be reported as identified.
     + *
     + * The 'verbose_fn' will be invoked to provide verbose information about
    @@ t/t0614-reftable-fsck.sh (new)
     +		test_must_be_empty err &&
     +
     +		TABLE_NAME=$(cat .git/reftable/tables.list | head -n1) &&
    ++		sed "1s/^/extra/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
    ++		mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
    ++		mv .git/reftable/${TABLE_NAME} .git/reftable/extra${TABLE_NAME} &&
    ++
    ++		test_must_fail git refs verify 2>err &&
    ++		cat >expect <<-EOF &&
    ++		error: extra${TABLE_NAME}: badReftableTableName: invalid reftable table name
    ++		EOF
    ++		test_cmp expect err
    ++	)
    ++'
    ++
    ++test_expect_success 'table name should be checked' '
    ++	test_when_finished "rm -rf repo" &&
    ++	git init repo &&
    ++	(
    ++		cd repo &&
    ++		git commit --allow-empty -m initial &&
    ++
    ++		git refs verify 2>err &&
    ++		test_must_be_empty err &&
    ++
    ++		TABLE_NAME=$(cat .git/reftable/tables.list | head -n1) &&
     +		sed "1s/$/extra/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
     +		mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
     +		mv .git/reftable/${TABLE_NAME} .git/reftable/${TABLE_NAME}extra &&
     +
     +		test_must_fail git refs verify 2>err &&
     +		cat >expect <<-EOF &&
    -+		error: ${TABLE_NAME}extra: badReftableTableName: invalid reftable name
    ++		error: ${TABLE_NAME}extra: badReftableTableName: invalid reftable table extension
     +		EOF
     +		test_cmp expect err
     +	)
3:  4c6c99ded3 ! 3:  52fc14fdeb refs/reftable: add fsck check for number of tables
    @@ fsck.h: enum fsck_msg_type {
      	FUNC(BAD_REF_FILETYPE, ERROR)                              \
     
      ## refs/reftable-backend.c ##
    -@@ refs/reftable-backend.c: static int reftable_fsck_error_handler(struct reftable_fsck_info info,
    +@@ refs/reftable-backend.c: static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
      	case REFTABLE_FSCK_ERROR_TABLE_NAME:
      		msg_id = FSCK_MSG_BAD_REFTABLE_TABLE_NAME;
      		break;
    @@ refs/reftable-backend.c: static int reftable_fsck_error_handler(struct reftable_
     +		msg_id = FSCK_MSG_BAD_REFTABLE_STACK_COUNT;
     +		break;
      	default:
    - 		BUG("unknown fsck error: %d", info.error);
    + 		BUG("unknown fsck error: %d", info->error);
      	}
     
      ## reftable/fsck.c ##
    @@ reftable/fsck.c: int reftable_fsck_check(struct reftable_stack *stack,
     +	if (!reftable_fsck_valid_stack_count(stack)) {
     +		struct reftable_fsck_info info = {
     +			.error = REFTABLE_FSCK_ERROR_STACK_COUNT,
    -+			.path = stack->list_file,
    ++			.path = "reftable/tables.list",
     +			.msg = "mismatch in number of tables"
     +		};
     +
    -+		err = report_fn(info, cb_data);
    ++		err = report_fn(&info, cb_data);
     +	}
     +
      out:
    @@ reftable/reftable-fsck.h
     +	REFTABLE_FSCK_ERROR_STACK_COUNT = -2,
      };
      
    - /* Represents an individual error encounctered during the FSCK checks. */
    + /* Represents an individual error encountered during the FSCK checks. */
     
      ## t/t0614-reftable-fsck.sh ##
     @@ t/t0614-reftable-fsck.sh: test_expect_success 'table name should be checked' '
    @@ t/t0614-reftable-fsck.sh: test_expect_success 'table name should be checked' '
     +
     +		test_must_fail git refs verify 2>err &&
     +		cat >expect <<-EOF &&
    -+		error: $(pwd)/.git/reftable/tables.list: badReftableStackCount: mismatch in number of tables
    ++		error: reftable/tables.list: badReftableStackCount: mismatch in number of tables
     +		EOF
     +		test_cmp expect err
     +	)
4:  7e8a14c77e ! 4:  4099878ceb refs/reftable: add fsck check for trailing newline
    @@ fsck.h: enum fsck_msg_type {
      	FUNC(BAD_REF_FILETYPE, ERROR)                              \
     
      ## refs/reftable-backend.c ##
    -@@ refs/reftable-backend.c: static int reftable_fsck_error_handler(struct reftable_fsck_info info,
    +@@ refs/reftable-backend.c: static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
      	case REFTABLE_FSCK_ERROR_STACK_COUNT:
      		msg_id = FSCK_MSG_BAD_REFTABLE_STACK_COUNT;
      		break;
    @@ refs/reftable-backend.c: static int reftable_fsck_error_handler(struct reftable_
     +		msg_id = FSCK_MSG_BAD_REFTABLE_STACK_LIST_NEWLINE;
     +		break;
      	default:
    - 		BUG("unknown fsck error: %d", info.error);
    + 		BUG("unknown fsck error: %d", info->error);
      	}
     
      ## reftable/fsck.c ##
    @@ reftable/fsck.c: int reftable_fsck_check(struct reftable_stack *stack,
     +	if (!reftable_fsck_stack_contains_newline(stack->list_file)) {
     +		struct reftable_fsck_info info = {
     +			.error = REFTABLE_FSCK_ERROR_STACK_LIST_MISSING_NEWLINE,
    -+			.path = stack->list_file,
    ++			.path = "reftable/tables.list",
     +			.msg = "trailing newline missing in stack list"
     +		};
     +
    -+		err = report_fn(info, cb_data);
    ++		err = report_fn(&info, cb_data);
     +	}
     +
      	verbose_fn("Checking reftable tables count", cb_data);
    @@ reftable/reftable-fsck.h: enum reftable_fsck_error {
     +	REFTABLE_FSCK_ERROR_STACK_LIST_MISSING_NEWLINE = -3,
      };
      
    - /* Represents an individual error encounctered during the FSCK checks. */
    + /* Represents an individual error encountered during the FSCK checks. */
     
      ## t/t0614-reftable-fsck.sh ##
     @@ t/t0614-reftable-fsck.sh: test_expect_success 'table count should be checked' '
    @@ t/t0614-reftable-fsck.sh: test_expect_success 'table count should be checked' '
     +
     +		test_must_fail git refs verify 2>err &&
     +		cat >expect <<-EOF &&
    -+		error: $(pwd)/.git/reftable/tables.list: badReftableStackListNewline: trailing newline missing in stack list
    ++		error: reftable/tables.list: badReftableStackListNewline: trailing newline missing in stack list
     +		EOF
     +		test_cmp expect err
     +	)
5:  56ee4348d5 ! 5:  e33345088b refs/reftable: add fsck check for incorrect update index
    @@ fsck.h: enum fsck_msg_type {
      	FUNC(BAD_REF_NAME, ERROR)                                  \
     
      ## refs/reftable-backend.c ##
    -@@ refs/reftable-backend.c: static int reftable_fsck_error_handler(struct reftable_fsck_info info,
    +@@ refs/reftable-backend.c: static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
      	case REFTABLE_FSCK_ERROR_STACK_LIST_MISSING_NEWLINE:
      		msg_id = FSCK_MSG_BAD_REFTABLE_STACK_LIST_NEWLINE;
      		break;
    @@ refs/reftable-backend.c: static int reftable_fsck_error_handler(struct reftable_
     +		msg_id = FSCK_MSG_BAD_REFTABLE_UPDATE_INDEX;
     +		break;
      	default:
    - 		BUG("unknown fsck error: %d", info.error);
    + 		BUG("unknown fsck error: %d", info->error);
      	}
     
      ## reftable/fsck.c ##
    @@ reftable/fsck.c: int reftable_fsck_check(struct reftable_stack *stack,
      			reftable_fsck_verbose_fn verbose_fn,
      			void *cb_data)
      {
    +-
     +	uint64_t min, max, prev_max = 0;
      	char **names = NULL;
     -	uint64_t min, max;
    @@ reftable/fsck.c: int reftable_fsck_check(struct reftable_stack *stack,
      
      	if (stack == NULL)
     @@ reftable/fsck.c: int reftable_fsck_check(struct reftable_stack *stack,
    - 			err = report_fn(info, cb_data);
    + 			continue;
      		}
      
     +		if (min != (prev_max + 1) || max < min) {
    @@ reftable/fsck.c: int reftable_fsck_check(struct reftable_stack *stack,
     +				.msg = "incorrect update index in table name"
     +			};
     +
    -+			err = report_fn(info, cb_data);
    ++			err = report_fn(&info, cb_data);
     +		}
     +
      		if (strcmp(tail, ".ref")) {
    - 			err = report_fn(info, cb_data);
    + 			info.msg = "invalid reftable table extension";
    + 			err = report_fn(&info, cb_data);
      		}
     +
     +		prev_max = max;
    @@ reftable/reftable-fsck.h: enum reftable_fsck_error {
     +	REFTABLE_FSCK_ERROR_UPDATE_INDEX = -4,
      };
      
    - /* Represents an individual error encounctered during the FSCK checks. */
    + /* Represents an individual error encountered during the FSCK checks. */
     
      ## t/t0614-reftable-fsck.sh ##
     @@ t/t0614-reftable-fsck.sh: test_expect_success 'stack list must contain trailing newline' '


base-commit: c44beea485f0f2feaf460e2ac87fdd5608d63cf0
change-id: 20250714-228-reftable-introduce-consistency-checks-379ded93c544

Thanks
- Karthik


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 1/5] fsck: order 'fsck_msg_type' alphabetically
  2025-09-02  7:05 ` [PATCH v2 " Karthik Nayak
@ 2025-09-02  7:05   ` Karthik Nayak
  2025-09-02 22:25     ` Junio C Hamano
  2025-09-02  7:05   ` [PATCH v2 2/5] refs/reftable: add fsck check for checking the table name Karthik Nayak
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 28+ messages in thread
From: Karthik Nayak @ 2025-09-02  7:05 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, jltobler, shejialuo

The list of 'fsck_msg_type' seem to be alphabetically ordered, but there
are a few small misses. Fix this by sorting the sub-sections of the
list to maintain alphabetical ordering. Also fix a clang-format issue
where the escaped newlines are not aligned.

While here, remove a duplicate instance of 'gitmodulesLarge' in the
'fsck-msgids' documentation.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/fsck-msgids.adoc |   3 -
 fsck.h                         | 150 ++++++++++++++++++++---------------------
 2 files changed, 75 insertions(+), 78 deletions(-)

diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 0ba4f9a27e..1c912615f9 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -104,9 +104,6 @@
 `gitmodulesParse`::
 	(INFO) Could not parse `.gitmodules` blob.
 
-`gitmodulesLarge`;
-	(ERROR) `.gitmodules` blob is too large to parse.
-
 `gitmodulesPath`::
 	(ERROR) `.gitmodules` path is invalid.
 
diff --git a/fsck.h b/fsck.h
index dd7df3d5b3..559ad57807 100644
--- a/fsck.h
+++ b/fsck.h
@@ -20,82 +20,82 @@ enum fsck_msg_type {
  * two in sync.
  */
 
-#define FOREACH_FSCK_MSG_ID(FUNC) \
-	/* fatal errors */ \
-	FUNC(NUL_IN_HEADER, FATAL) \
-	FUNC(UNTERMINATED_HEADER, FATAL) \
-	/* errors */ \
-	FUNC(BAD_DATE, ERROR) \
-	FUNC(BAD_DATE_OVERFLOW, ERROR) \
-	FUNC(BAD_EMAIL, ERROR) \
-	FUNC(BAD_NAME, ERROR) \
-	FUNC(BAD_OBJECT_SHA1, ERROR) \
-	FUNC(BAD_PACKED_REF_ENTRY, ERROR) \
-	FUNC(BAD_PACKED_REF_HEADER, ERROR) \
-	FUNC(BAD_PARENT_SHA1, ERROR) \
-	FUNC(BAD_REF_CONTENT, ERROR) \
-	FUNC(BAD_REF_FILETYPE, ERROR) \
-	FUNC(BAD_REF_NAME, ERROR) \
-	FUNC(BAD_REFERENT_NAME, ERROR) \
-	FUNC(BAD_TIMEZONE, ERROR) \
-	FUNC(BAD_TREE, ERROR) \
-	FUNC(BAD_TREE_SHA1, ERROR) \
-	FUNC(BAD_TYPE, ERROR) \
-	FUNC(DUPLICATE_ENTRIES, ERROR) \
-	FUNC(MISSING_AUTHOR, ERROR) \
-	FUNC(MISSING_COMMITTER, ERROR) \
-	FUNC(MISSING_EMAIL, ERROR) \
-	FUNC(MISSING_NAME_BEFORE_EMAIL, ERROR) \
-	FUNC(MISSING_OBJECT, ERROR) \
-	FUNC(MISSING_SPACE_BEFORE_DATE, ERROR) \
-	FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR) \
-	FUNC(MISSING_TAG, ERROR) \
-	FUNC(MISSING_TAG_ENTRY, ERROR) \
-	FUNC(MISSING_TREE, ERROR) \
-	FUNC(MISSING_TYPE, ERROR) \
-	FUNC(MISSING_TYPE_ENTRY, ERROR) \
-	FUNC(MULTIPLE_AUTHORS, ERROR) \
-	FUNC(PACKED_REF_ENTRY_NOT_TERMINATED, ERROR) \
-	FUNC(PACKED_REF_UNSORTED, ERROR) \
-	FUNC(TREE_NOT_SORTED, ERROR) \
-	FUNC(UNKNOWN_TYPE, ERROR) \
-	FUNC(ZERO_PADDED_DATE, ERROR) \
-	FUNC(GITMODULES_MISSING, ERROR) \
-	FUNC(GITMODULES_BLOB, ERROR) \
-	FUNC(GITMODULES_LARGE, ERROR) \
-	FUNC(GITMODULES_NAME, ERROR) \
-	FUNC(GITMODULES_SYMLINK, ERROR) \
-	FUNC(GITMODULES_URL, ERROR) \
-	FUNC(GITMODULES_PATH, ERROR) \
-	FUNC(GITMODULES_UPDATE, ERROR) \
-	FUNC(GITATTRIBUTES_MISSING, ERROR) \
-	FUNC(GITATTRIBUTES_LARGE, ERROR) \
-	FUNC(GITATTRIBUTES_LINE_LENGTH, ERROR) \
-	FUNC(GITATTRIBUTES_BLOB, ERROR) \
-	/* warnings */ \
-	FUNC(EMPTY_NAME, WARN) \
-	FUNC(FULL_PATHNAME, WARN) \
-	FUNC(HAS_DOT, WARN) \
-	FUNC(HAS_DOTDOT, WARN) \
-	FUNC(HAS_DOTGIT, WARN) \
-	FUNC(NULL_SHA1, WARN) \
-	FUNC(ZERO_PADDED_FILEMODE, WARN) \
-	FUNC(NUL_IN_COMMIT, WARN) \
-	FUNC(LARGE_PATHNAME, WARN) \
+#define FOREACH_FSCK_MSG_ID(FUNC)                                  \
+	/* fatal errors */                                         \
+	FUNC(NUL_IN_HEADER, FATAL)                                 \
+	FUNC(UNTERMINATED_HEADER, FATAL)                           \
+	/* errors */                                               \
+	FUNC(BAD_DATE, ERROR)                                      \
+	FUNC(BAD_DATE_OVERFLOW, ERROR)                             \
+	FUNC(BAD_EMAIL, ERROR)                                     \
+	FUNC(BAD_NAME, ERROR)                                      \
+	FUNC(BAD_OBJECT_SHA1, ERROR)                               \
+	FUNC(BAD_PACKED_REF_ENTRY, ERROR)                          \
+	FUNC(BAD_PACKED_REF_HEADER, ERROR)                         \
+	FUNC(BAD_PARENT_SHA1, ERROR)                               \
+	FUNC(BAD_REFERENT_NAME, ERROR)                             \
+	FUNC(BAD_REF_CONTENT, ERROR)                               \
+	FUNC(BAD_REF_FILETYPE, ERROR)                              \
+	FUNC(BAD_REF_NAME, ERROR)                                  \
+	FUNC(BAD_TIMEZONE, ERROR)                                  \
+	FUNC(BAD_TREE, ERROR)                                      \
+	FUNC(BAD_TREE_SHA1, ERROR)                                 \
+	FUNC(BAD_TYPE, ERROR)                                      \
+	FUNC(DUPLICATE_ENTRIES, ERROR)                             \
+	FUNC(GITATTRIBUTES_BLOB, ERROR)                            \
+	FUNC(GITATTRIBUTES_LARGE, ERROR)                           \
+	FUNC(GITATTRIBUTES_LINE_LENGTH, ERROR)                     \
+	FUNC(GITATTRIBUTES_MISSING, ERROR)                         \
+	FUNC(GITMODULES_BLOB, ERROR)                               \
+	FUNC(GITMODULES_LARGE, ERROR)                              \
+	FUNC(GITMODULES_MISSING, ERROR)                            \
+	FUNC(GITMODULES_NAME, ERROR)                               \
+	FUNC(GITMODULES_PATH, ERROR)                               \
+	FUNC(GITMODULES_SYMLINK, ERROR)                            \
+	FUNC(GITMODULES_UPDATE, ERROR)                             \
+	FUNC(GITMODULES_URL, ERROR)                                \
+	FUNC(MISSING_AUTHOR, ERROR)                                \
+	FUNC(MISSING_COMMITTER, ERROR)                             \
+	FUNC(MISSING_EMAIL, ERROR)                                 \
+	FUNC(MISSING_NAME_BEFORE_EMAIL, ERROR)                     \
+	FUNC(MISSING_OBJECT, ERROR)                                \
+	FUNC(MISSING_SPACE_BEFORE_DATE, ERROR)                     \
+	FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR)                    \
+	FUNC(MISSING_TAG, ERROR)                                   \
+	FUNC(MISSING_TAG_ENTRY, ERROR)                             \
+	FUNC(MISSING_TREE, ERROR)                                  \
+	FUNC(MISSING_TYPE, ERROR)                                  \
+	FUNC(MISSING_TYPE_ENTRY, ERROR)                            \
+	FUNC(MULTIPLE_AUTHORS, ERROR)                              \
+	FUNC(PACKED_REF_ENTRY_NOT_TERMINATED, ERROR)               \
+	FUNC(PACKED_REF_UNSORTED, ERROR)                           \
+	FUNC(TREE_NOT_SORTED, ERROR)                               \
+	FUNC(UNKNOWN_TYPE, ERROR)                                  \
+	FUNC(ZERO_PADDED_DATE, ERROR)                              \
+	/* warnings */                                             \
+	FUNC(EMPTY_NAME, WARN)                                     \
+	FUNC(FULL_PATHNAME, WARN)                                  \
+	FUNC(HAS_DOT, WARN)                                        \
+	FUNC(HAS_DOTDOT, WARN)                                     \
+	FUNC(HAS_DOTGIT, WARN)                                     \
+	FUNC(LARGE_PATHNAME, WARN)                                 \
+	FUNC(NULL_SHA1, WARN)                                      \
+	FUNC(NUL_IN_COMMIT, WARN)                                  \
+	FUNC(ZERO_PADDED_FILEMODE, WARN)                           \
 	/* infos (reported as warnings, but ignored by default) */ \
-	FUNC(BAD_FILEMODE, INFO) \
-	FUNC(EMPTY_PACKED_REFS_FILE, INFO) \
-	FUNC(GITMODULES_PARSE, INFO) \
-	FUNC(GITIGNORE_SYMLINK, INFO) \
-	FUNC(GITATTRIBUTES_SYMLINK, INFO) \
-	FUNC(MAILMAP_SYMLINK, INFO) \
-	FUNC(BAD_TAG_NAME, INFO) \
-	FUNC(MISSING_TAGGER_ENTRY, INFO) \
-	FUNC(SYMLINK_REF, INFO) \
-	FUNC(REF_MISSING_NEWLINE, INFO) \
-	FUNC(SYMREF_TARGET_IS_NOT_A_REF, INFO) \
-	FUNC(TRAILING_REF_CONTENT, INFO) \
-	/* ignored (elevated when requested) */ \
+	FUNC(BAD_FILEMODE, INFO)                                   \
+	FUNC(BAD_TAG_NAME, INFO)                                   \
+	FUNC(EMPTY_PACKED_REFS_FILE, INFO)                         \
+	FUNC(GITATTRIBUTES_SYMLINK, INFO)                          \
+	FUNC(GITIGNORE_SYMLINK, INFO)                              \
+	FUNC(GITMODULES_PARSE, INFO)                               \
+	FUNC(MAILMAP_SYMLINK, INFO)                                \
+	FUNC(MISSING_TAGGER_ENTRY, INFO)                           \
+	FUNC(REF_MISSING_NEWLINE, INFO)                            \
+	FUNC(SYMLINK_REF, INFO)                                    \
+	FUNC(SYMREF_TARGET_IS_NOT_A_REF, INFO)                     \
+	FUNC(TRAILING_REF_CONTENT, INFO)                           \
+	/* ignored (elevated when requested) */                    \
 	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
 
 #define MSG_ID(id, msg_type) FSCK_MSG_##id,

-- 
2.50.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 2/5] refs/reftable: add fsck check for checking the table name
  2025-09-02  7:05 ` [PATCH v2 " Karthik Nayak
  2025-09-02  7:05   ` [PATCH v2 1/5] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
@ 2025-09-02  7:05   ` Karthik Nayak
  2025-09-03  8:07     ` Patrick Steinhardt
  2025-09-02  7:05   ` [PATCH v2 3/5] refs/reftable: add fsck check for number of tables Karthik Nayak
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 28+ messages in thread
From: Karthik Nayak @ 2025-09-02  7:05 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, jltobler, shejialuo

The `git refs verify` command is used to run fsck checks on the
reference backends. This command is also invoked when users run 'git
fsck'. While the files-backend has some fsck checks added, the reftable
backend lacks such checks. Let's add the required infrastructure and a
check to test for the table names in the 'tables.list' of reftables.

For the infrastructure, since the reftable library is treated as an
independent library we should ensure that the library code works
independently without knowledge about Git's internals. To do this,
add both 'reftable/fsck.c' and 'reftable/reftable-fsck.h'. Which
provide an entry point 'reftable_fsck_check' for running fsck checks
over a provided reftable stack. The callee provides the function with
callbacks to handle issue and information reporting.

Add glue code in 'refs/reftable-backend.c' which calls the reftable
library to perform the fsck checks. Here we also map the reftable errors
to Git' fsck errors.

Introduce a check to validate table names for a given reftable stack.
Also add 'badReftableTableName' as a corresponding error within Git. Add
a test to check for this behavior.

While here, remove a unused header `#include "../lockfile.h"` from
'refs/reftable-backend.c'.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/fsck-msgids.adoc |  3 +++
 Makefile                       |  1 +
 fsck.h                         |  1 +
 meson.build                    |  1 +
 refs/reftable-backend.c        | 57 ++++++++++++++++++++++++++++++++++++-----
 reftable/fsck.c                | 53 ++++++++++++++++++++++++++++++++++++++
 reftable/reftable-fsck.h       | 38 +++++++++++++++++++++++++++
 t/meson.build                  |  3 ++-
 t/t0614-reftable-fsck.sh       | 58 ++++++++++++++++++++++++++++++++++++++++++
 9 files changed, 208 insertions(+), 7 deletions(-)

diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 1c912615f9..784ddc0df5 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -38,6 +38,9 @@
 `badReferentName`::
 	(ERROR) The referent name of a symref is invalid.
 
+`badReftableTableName`::
+	(ERROR) A reftable table has an invalid name.
+
 `badTagName`::
 	(INFO) A tag has an invalid format.
 
diff --git a/Makefile b/Makefile
index e11340c1ae..f2ddcc8d7c 100644
--- a/Makefile
+++ b/Makefile
@@ -2733,6 +2733,7 @@ REFTABLE_OBJS += reftable/error.o
 REFTABLE_OBJS += reftable/block.o
 REFTABLE_OBJS += reftable/blocksource.o
 REFTABLE_OBJS += reftable/iter.o
+REFTABLE_OBJS += reftable/fsck.o
 REFTABLE_OBJS += reftable/merged.o
 REFTABLE_OBJS += reftable/pq.o
 REFTABLE_OBJS += reftable/record.o
diff --git a/fsck.h b/fsck.h
index 559ad57807..5901f944a1 100644
--- a/fsck.h
+++ b/fsck.h
@@ -34,6 +34,7 @@ enum fsck_msg_type {
 	FUNC(BAD_PACKED_REF_HEADER, ERROR)                         \
 	FUNC(BAD_PARENT_SHA1, ERROR)                               \
 	FUNC(BAD_REFERENT_NAME, ERROR)                             \
+	FUNC(BAD_REFTABLE_TABLE_NAME, ERROR)                       \
 	FUNC(BAD_REF_CONTENT, ERROR)                               \
 	FUNC(BAD_REF_FILETYPE, ERROR)                              \
 	FUNC(BAD_REF_NAME, ERROR)                                  \
diff --git a/meson.build b/meson.build
index 5dd299b496..82879fbfaa 100644
--- a/meson.build
+++ b/meson.build
@@ -452,6 +452,7 @@ libgit_sources = [
   'reftable/error.c',
   'reftable/block.c',
   'reftable/blocksource.c',
+  'reftable/fsck.c',
   'reftable/iter.c',
   'reftable/merged.c',
   'reftable/pq.c',
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 8dae1e1112..c38c6422f8 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -6,20 +6,21 @@
 #include "../config.h"
 #include "../dir.h"
 #include "../environment.h"
+#include "../fsck.h"
 #include "../gettext.h"
 #include "../hash.h"
 #include "../hex.h"
 #include "../iterator.h"
 #include "../ident.h"
-#include "../lockfile.h"
 #include "../object.h"
 #include "../path.h"
 #include "../refs.h"
 #include "../reftable/reftable-basics.h"
-#include "../reftable/reftable-stack.h"
-#include "../reftable/reftable-record.h"
 #include "../reftable/reftable-error.h"
+#include "../reftable/reftable-fsck.h"
 #include "../reftable/reftable-iterator.h"
+#include "../reftable/reftable-record.h"
+#include "../reftable/reftable-stack.h"
 #include "../repo-settings.h"
 #include "../setup.h"
 #include "../strmap.h"
@@ -2675,11 +2676,55 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
 	return ret;
 }
 
-static int reftable_be_fsck(struct ref_store *ref_store UNUSED,
-			    struct fsck_options *o UNUSED,
+static void reftable_fsck_verbose_handler(const char *msg, void *cb_data)
+{
+	struct fsck_options *o = cb_data;
+
+	if (o->verbose)
+		fprintf_ln(stderr, "%s", _(msg));
+}
+
+static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
+				       void *cb_data)
+{
+	struct fsck_ref_report report = { .path = info->path };
+	struct fsck_options *o = cb_data;
+	enum fsck_msg_id msg_id;
+
+	switch (info->error) {
+	case REFTABLE_FSCK_ERROR_TABLE_NAME:
+		msg_id = FSCK_MSG_BAD_REFTABLE_TABLE_NAME;
+		break;
+	default:
+		BUG("unknown fsck error: %d", info->error);
+	}
+
+	return fsck_report_ref(o, &report, msg_id, "%s", info->msg);
+}
+
+static int reftable_be_fsck(struct ref_store *ref_store, struct fsck_options *o,
 			    struct worktree *wt UNUSED)
 {
-	return 0;
+	struct reftable_ref_store *refs;
+	struct strmap_entry *entry;
+	struct hashmap_iter iter;
+	int ret = 0;
+
+	refs = reftable_be_downcast(ref_store, REF_STORE_READ, "fsck");
+
+	if (o->verbose)
+		fprintf_ln(stderr, _("Checking references consistency"));
+
+	ret |= reftable_fsck_check(refs->main_backend.stack, reftable_fsck_error_handler,
+				  reftable_fsck_verbose_handler, o);
+
+	strmap_for_each_entry(&refs->worktree_backends, &iter, entry) {
+		struct reftable_backend *b = (struct reftable_backend *)entry->value;
+		ret |= reftable_fsck_check(b->stack, reftable_fsck_error_handler,
+					  reftable_fsck_verbose_handler, o);
+	}
+
+	return ret;
 }
 
 struct ref_storage_be refs_be_reftable = {
diff --git a/reftable/fsck.c b/reftable/fsck.c
new file mode 100644
index 0000000000..4282b1413e
--- /dev/null
+++ b/reftable/fsck.c
@@ -0,0 +1,53 @@
+#include "basics.h"
+#include "reftable-fsck.h"
+#include "stack.h"
+
+int reftable_fsck_check(struct reftable_stack *stack,
+			reftable_fsck_report_fn report_fn,
+			reftable_fsck_verbose_fn verbose_fn,
+			void *cb_data)
+{
+
+	char **names = NULL;
+	uint64_t min, max;
+	int err = 0;
+
+	if (stack == NULL)
+		goto out;
+
+	err = read_lines(stack->list_file, &names);
+	if (err < 0)
+		goto out;
+
+	verbose_fn("Checking reftable table names", cb_data);
+
+	for (size_t i = 0; names[i]; i++) {
+		struct reftable_fsck_info info = {
+			.error = REFTABLE_FSCK_ERROR_TABLE_NAME,
+			.path = names[i],
+		};
+		uint32_t rnd;
+		/*
+		 * We want to match the tail '.ref'. One extra byte to ensure
+		 * that there is no unexpected extra character and one byte for
+		 * the null terminator added by sscanf.
+		 */
+		char tail[6];
+
+		if (sscanf(names[i], "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x%5s",
+			   &min, &max, &rnd, tail) != 4) {
+			info.msg = "invalid reftable table name";
+			err = report_fn(&info, cb_data);
+			continue;
+		}
+
+		if (strcmp(tail, ".ref")) {
+			info.msg = "invalid reftable table extension";
+			err = report_fn(&info, cb_data);
+		}
+	}
+
+out:
+	free_names(names);
+	return err;
+}
diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
new file mode 100644
index 0000000000..4cf0053234
--- /dev/null
+++ b/reftable/reftable-fsck.h
@@ -0,0 +1,38 @@
+#ifndef REFTABLE_FSCK_H
+#define REFTABLE_FSCK_H
+
+#include "reftable-stack.h"
+
+enum reftable_fsck_error {
+	/* Invalid table name */
+	REFTABLE_FSCK_ERROR_TABLE_NAME = -1,
+};
+
+/* Represents an individual error encountered during the FSCK checks. */
+struct reftable_fsck_info {
+	enum reftable_fsck_error error;
+	const char *msg;
+	const char *path;
+};
+
+typedef int reftable_fsck_report_fn(struct reftable_fsck_info *info,
+				    void *cb_data);
+typedef void reftable_fsck_verbose_fn(const char *msg, void *cb_data);
+
+/*
+ * Given a reftable stack, perform FSCK check on the stack.
+ *
+ * If an issue is encountered, the issue is reported to the callee via the
+ * provided 'report_fn'. If the issue is non-recoverable the flow will not
+ * continue. If it is recoverable, the flow will continue and further issues
+ * will be reported as identified.
+ *
+ * The 'verbose_fn' will be invoked to provide verbose information about
+ * the progress and state of the FSCK checks.
+ */
+int reftable_fsck_check(struct reftable_stack *stack,
+			reftable_fsck_report_fn report_fn,
+			reftable_fsck_verbose_fn verbose_fn,
+			void *cb_data);
+
+#endif /* REFTABLE_FSCK_H */
diff --git a/t/meson.build b/t/meson.build
index bbeba1a8d5..a8eb44eb30 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -145,6 +145,7 @@ integration_tests = [
   't0611-reftable-httpd.sh',
   't0612-reftable-jgit-compatibility.sh',
   't0613-reftable-write-options.sh',
+  't0614-reftable-fsck.sh',
   't1000-read-tree-m-3way.sh',
   't1001-read-tree-m-2way.sh',
   't1002-read-tree-m-u-2way.sh',
@@ -1214,4 +1215,4 @@ if perl.found() and time.found()
       timeout: 0,
     )
   endforeach
-endif
\ No newline at end of file
+endif
diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
new file mode 100755
index 0000000000..81d30df2d7
--- /dev/null
+++ b/t/t0614-reftable-fsck.sh
@@ -0,0 +1,58 @@
+#!/bin/sh
+
+test_description='Test reftable backend consistency check'
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+GIT_TEST_DEFAULT_REF_FORMAT=reftable
+export GIT_TEST_DEFAULT_REF_FORMAT
+
+. ./test-lib.sh
+
+test_expect_success 'table name should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git commit --allow-empty -m initial &&
+
+		git refs verify 2>err &&
+		test_must_be_empty err &&
+
+		TABLE_NAME=$(cat .git/reftable/tables.list | head -n1) &&
+		sed "1s/^/extra/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
+		mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
+		mv .git/reftable/${TABLE_NAME} .git/reftable/extra${TABLE_NAME} &&
+
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: extra${TABLE_NAME}: badReftableTableName: invalid reftable table name
+		EOF
+		test_cmp expect err
+	)
+'
+
+test_expect_success 'table name should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git commit --allow-empty -m initial &&
+
+		git refs verify 2>err &&
+		test_must_be_empty err &&
+
+		TABLE_NAME=$(cat .git/reftable/tables.list | head -n1) &&
+		sed "1s/$/extra/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
+		mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
+		mv .git/reftable/${TABLE_NAME} .git/reftable/${TABLE_NAME}extra &&
+
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: ${TABLE_NAME}extra: badReftableTableName: invalid reftable table extension
+		EOF
+		test_cmp expect err
+	)
+'
+
+test_done

-- 
2.50.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 3/5] refs/reftable: add fsck check for number of tables
  2025-09-02  7:05 ` [PATCH v2 " Karthik Nayak
  2025-09-02  7:05   ` [PATCH v2 1/5] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
  2025-09-02  7:05   ` [PATCH v2 2/5] refs/reftable: add fsck check for checking the table name Karthik Nayak
@ 2025-09-02  7:05   ` Karthik Nayak
  2025-09-03  8:07     ` Patrick Steinhardt
  2025-09-02  7:05   ` [PATCH v2 4/5] refs/reftable: add fsck check for trailing newline Karthik Nayak
  2025-09-02  7:05   ` [PATCH v2 5/5] refs/reftable: add fsck check for incorrect update index Karthik Nayak
  4 siblings, 1 reply; 28+ messages in thread
From: Karthik Nayak @ 2025-09-02  7:05 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, jltobler, shejialuo

Introduce a reftable fsck check to check that the number of files in the
reftable directory matches the number of files listed in 'tables.list'.
We do this by iterating over the files in the reftable directory and
counting all the files present excluding the 'tables.list'. This is also
exposed over Git's fsck checks as a 'badReftableStackCount' error.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/fsck-msgids.adoc |  3 +++
 fsck.h                         |  1 +
 refs/reftable-backend.c        |  3 +++
 reftable/fsck.c                | 34 ++++++++++++++++++++++++++++++++++
 reftable/reftable-fsck.h       |  2 ++
 t/t0614-reftable-fsck.sh       | 20 ++++++++++++++++++++
 6 files changed, 63 insertions(+)

diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 784ddc0df5..707e2fc50a 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -38,6 +38,9 @@
 `badReferentName`::
 	(ERROR) The referent name of a symref is invalid.
 
+`badReftableStackCount`::
+	(ERROR) Mismatch in number of tables.
+
 `badReftableTableName`::
 	(ERROR) A reftable table has an invalid name.
 
diff --git a/fsck.h b/fsck.h
index 5901f944a1..256effc4f8 100644
--- a/fsck.h
+++ b/fsck.h
@@ -34,6 +34,7 @@ enum fsck_msg_type {
 	FUNC(BAD_PACKED_REF_HEADER, ERROR)                         \
 	FUNC(BAD_PARENT_SHA1, ERROR)                               \
 	FUNC(BAD_REFERENT_NAME, ERROR)                             \
+	FUNC(BAD_REFTABLE_STACK_COUNT, ERROR)                      \
 	FUNC(BAD_REFTABLE_TABLE_NAME, ERROR)                       \
 	FUNC(BAD_REF_CONTENT, ERROR)                               \
 	FUNC(BAD_REF_FILETYPE, ERROR)                              \
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index c38c6422f8..59c39f9b52 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -2695,6 +2695,9 @@ static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
 	case REFTABLE_FSCK_ERROR_TABLE_NAME:
 		msg_id = FSCK_MSG_BAD_REFTABLE_TABLE_NAME;
 		break;
+	case REFTABLE_FSCK_ERROR_STACK_COUNT:
+		msg_id = FSCK_MSG_BAD_REFTABLE_STACK_COUNT;
+		break;
 	default:
 		BUG("unknown fsck error: %d", info->error);
 	}
diff --git a/reftable/fsck.c b/reftable/fsck.c
index 4282b1413e..20e6bfb0f1 100644
--- a/reftable/fsck.c
+++ b/reftable/fsck.c
@@ -2,6 +2,28 @@
 #include "reftable-fsck.h"
 #include "stack.h"
 
+static int reftable_fsck_valid_stack_count(struct reftable_stack *st)
+{
+	DIR *dir = opendir(st->reftable_dir);
+	struct dirent *d = NULL;
+	unsigned int count = 0;
+
+	if (!dir)
+		return 0;
+
+	while ((d = readdir(dir))) {
+		if (!strcmp(d->d_name, "tables.list"))
+			continue;
+
+		if (d->d_type == DT_REG)
+			count++;
+	}
+
+	closedir(dir);
+
+	return count == st->tables_len;
+}
+
 int reftable_fsck_check(struct reftable_stack *stack,
 			reftable_fsck_report_fn report_fn,
 			reftable_fsck_verbose_fn verbose_fn,
@@ -47,6 +69,18 @@ int reftable_fsck_check(struct reftable_stack *stack,
 		}
 	}
 
+	verbose_fn("Checking reftable tables count", cb_data);
+
+	if (!reftable_fsck_valid_stack_count(stack)) {
+		struct reftable_fsck_info info = {
+			.error = REFTABLE_FSCK_ERROR_STACK_COUNT,
+			.path = "reftable/tables.list",
+			.msg = "mismatch in number of tables"
+		};
+
+		err = report_fn(&info, cb_data);
+	}
+
 out:
 	free_names(names);
 	return err;
diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
index 4cf0053234..beba1bdd1a 100644
--- a/reftable/reftable-fsck.h
+++ b/reftable/reftable-fsck.h
@@ -6,6 +6,8 @@
 enum reftable_fsck_error {
 	/* Invalid table name */
 	REFTABLE_FSCK_ERROR_TABLE_NAME = -1,
+	/* Incorrect number of tables present */
+	REFTABLE_FSCK_ERROR_STACK_COUNT = -2,
 };
 
 /* Represents an individual error encountered during the FSCK checks. */
diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
index 81d30df2d7..3a34a31890 100755
--- a/t/t0614-reftable-fsck.sh
+++ b/t/t0614-reftable-fsck.sh
@@ -55,4 +55,24 @@ test_expect_success 'table name should be checked' '
 	)
 '
 
+test_expect_success 'table count should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git commit --allow-empty -m initial &&
+
+		git refs verify 2>err &&
+		test_must_be_empty err &&
+
+		touch .git/reftable/0x000000002812-0x000000002813-c830a596.ref &&
+
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: reftable/tables.list: badReftableStackCount: mismatch in number of tables
+		EOF
+		test_cmp expect err
+	)
+'
+
 test_done

-- 
2.50.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 4/5] refs/reftable: add fsck check for trailing newline
  2025-09-02  7:05 ` [PATCH v2 " Karthik Nayak
                     ` (2 preceding siblings ...)
  2025-09-02  7:05   ` [PATCH v2 3/5] refs/reftable: add fsck check for number of tables Karthik Nayak
@ 2025-09-02  7:05   ` Karthik Nayak
  2025-09-02 22:38     ` Junio C Hamano
  2025-09-02  7:05   ` [PATCH v2 5/5] refs/reftable: add fsck check for incorrect update index Karthik Nayak
  4 siblings, 1 reply; 28+ messages in thread
From: Karthik Nayak @ 2025-09-02  7:05 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, jltobler, shejialuo

Introduce a fsck check for the reftable backend, which checks if the
'tables.list' contains a newline. The reftable backend writes a trailing
newline when writing the 'tables.list', but it doesn't check for it when
reading the file. A missing newline however indicates that the file was
manually tampered with, so let's raise this as an error to the user.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/fsck-msgids.adoc |  3 +++
 fsck.h                         |  1 +
 refs/reftable-backend.c        |  3 +++
 reftable/fsck.c                | 36 ++++++++++++++++++++++++++++++++++++
 reftable/reftable-fsck.h       |  2 ++
 t/t0614-reftable-fsck.sh       | 21 +++++++++++++++++++++
 6 files changed, 66 insertions(+)

diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 707e2fc50a..1432b1de06 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -41,6 +41,9 @@
 `badReftableStackCount`::
 	(ERROR) Mismatch in number of tables.
 
+`badReftableStackListNewline`::
+	(ERROR) Reftable stack list missing trailing newline.
+
 `badReftableTableName`::
 	(ERROR) A reftable table has an invalid name.
 
diff --git a/fsck.h b/fsck.h
index 256effc4f8..33432bae79 100644
--- a/fsck.h
+++ b/fsck.h
@@ -35,6 +35,7 @@ enum fsck_msg_type {
 	FUNC(BAD_PARENT_SHA1, ERROR)                               \
 	FUNC(BAD_REFERENT_NAME, ERROR)                             \
 	FUNC(BAD_REFTABLE_STACK_COUNT, ERROR)                      \
+	FUNC(BAD_REFTABLE_STACK_LIST_NEWLINE, ERROR)               \
 	FUNC(BAD_REFTABLE_TABLE_NAME, ERROR)                       \
 	FUNC(BAD_REF_CONTENT, ERROR)                               \
 	FUNC(BAD_REF_FILETYPE, ERROR)                              \
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 59c39f9b52..7331513b19 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -2698,6 +2698,9 @@ static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
 	case REFTABLE_FSCK_ERROR_STACK_COUNT:
 		msg_id = FSCK_MSG_BAD_REFTABLE_STACK_COUNT;
 		break;
+	case REFTABLE_FSCK_ERROR_STACK_LIST_MISSING_NEWLINE:
+		msg_id = FSCK_MSG_BAD_REFTABLE_STACK_LIST_NEWLINE;
+		break;
 	default:
 		BUG("unknown fsck error: %d", info->error);
 	}
diff --git a/reftable/fsck.c b/reftable/fsck.c
index 20e6bfb0f1..9a7f22c56b 100644
--- a/reftable/fsck.c
+++ b/reftable/fsck.c
@@ -1,7 +1,31 @@
 #include "basics.h"
+#include "reftable-error.h"
 #include "reftable-fsck.h"
 #include "stack.h"
 
+static int reftable_fsck_stack_contains_newline(const char *list_file)
+{
+	FILE *f = fopen(list_file, "r");
+	int c = 0;
+
+	if (f == NULL) {
+		if (errno == ENOENT)
+			return 0;
+		return REFTABLE_IO_ERROR;
+	}
+
+	if (fseek(f, 0, SEEK_END) == 0) {
+		long size = ftell(f);
+		if (size <= 0)
+			return REFTABLE_IO_ERROR;
+		fseek(f, -1, SEEK_END);
+		c = fgetc(f);
+	}
+	fclose(f);
+
+	return c == '\n';
+}
+
 static int reftable_fsck_valid_stack_count(struct reftable_stack *st)
 {
 	DIR *dir = opendir(st->reftable_dir);
@@ -69,6 +93,18 @@ int reftable_fsck_check(struct reftable_stack *stack,
 		}
 	}
 
+	verbose_fn("Checking trailing newline in stack list", cb_data);
+
+	if (!reftable_fsck_stack_contains_newline(stack->list_file)) {
+		struct reftable_fsck_info info = {
+			.error = REFTABLE_FSCK_ERROR_STACK_LIST_MISSING_NEWLINE,
+			.path = "reftable/tables.list",
+			.msg = "trailing newline missing in stack list"
+		};
+
+		err = report_fn(&info, cb_data);
+	}
+
 	verbose_fn("Checking reftable tables count", cb_data);
 
 	if (!reftable_fsck_valid_stack_count(stack)) {
diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
index beba1bdd1a..17df661da8 100644
--- a/reftable/reftable-fsck.h
+++ b/reftable/reftable-fsck.h
@@ -8,6 +8,8 @@ enum reftable_fsck_error {
 	REFTABLE_FSCK_ERROR_TABLE_NAME = -1,
 	/* Incorrect number of tables present */
 	REFTABLE_FSCK_ERROR_STACK_COUNT = -2,
+	/* Newline missing at the end of the stack list */
+	REFTABLE_FSCK_ERROR_STACK_LIST_MISSING_NEWLINE = -3,
 };
 
 /* Represents an individual error encountered during the FSCK checks. */
diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
index 3a34a31890..3b119eae62 100755
--- a/t/t0614-reftable-fsck.sh
+++ b/t/t0614-reftable-fsck.sh
@@ -75,4 +75,25 @@ test_expect_success 'table count should be checked' '
 	)
 '
 
+test_expect_success 'stack list must contain trailing newline' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git commit --allow-empty -m initial &&
+
+		git refs verify 2>err &&
+		test_must_be_empty err &&
+
+		printf "%s" "$(cat .git/reftable/tables.list)" >.git/reftable/tables.list.tmp &&
+		mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
+
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: reftable/tables.list: badReftableStackListNewline: trailing newline missing in stack list
+		EOF
+		test_cmp expect err
+	)
+'
+
 test_done

-- 
2.50.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 5/5] refs/reftable: add fsck check for incorrect update index
  2025-09-02  7:05 ` [PATCH v2 " Karthik Nayak
                     ` (3 preceding siblings ...)
  2025-09-02  7:05   ` [PATCH v2 4/5] refs/reftable: add fsck check for trailing newline Karthik Nayak
@ 2025-09-02  7:05   ` Karthik Nayak
  2025-09-02 22:42     ` Junio C Hamano
  4 siblings, 1 reply; 28+ messages in thread
From: Karthik Nayak @ 2025-09-02  7:05 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, jltobler, shejialuo

Introduce a fsck check for the reftable backend, which checks if the
tables in 'tables.list' contain sequential update index. The tables in
the reftable backend should contain sequential update index. This fsck
check ensures that.

We must note that the reftable backend itself doesn't check to ensure
this and it also doesn't check to ensure that the index in the table
name matches the index in the header or the table. The latter is not
implemented in this fsck check either and will be added in a future
patch where we add fsck checks for internals of a table.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/fsck-msgids.adoc |  3 ++
 fsck.h                         |  1 +
 refs/reftable-backend.c        |  3 ++
 reftable/fsck.c                | 15 ++++++++--
 reftable/reftable-fsck.h       |  2 ++
 t/t0614-reftable-fsck.sh       | 62 ++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 84 insertions(+), 2 deletions(-)

diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 1432b1de06..982d51876c 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -47,6 +47,9 @@
 `badReftableTableName`::
 	(ERROR) A reftable table has an invalid name.
 
+`badReftableUpdateIndex`::
+	(ERROR) Incorrect update index found for table.
+
 `badTagName`::
 	(INFO) A tag has an invalid format.
 
diff --git a/fsck.h b/fsck.h
index 33432bae79..60e9b84183 100644
--- a/fsck.h
+++ b/fsck.h
@@ -37,6 +37,7 @@ enum fsck_msg_type {
 	FUNC(BAD_REFTABLE_STACK_COUNT, ERROR)                      \
 	FUNC(BAD_REFTABLE_STACK_LIST_NEWLINE, ERROR)               \
 	FUNC(BAD_REFTABLE_TABLE_NAME, ERROR)                       \
+	FUNC(BAD_REFTABLE_UPDATE_INDEX, ERROR)                     \
 	FUNC(BAD_REF_CONTENT, ERROR)                               \
 	FUNC(BAD_REF_FILETYPE, ERROR)                              \
 	FUNC(BAD_REF_NAME, ERROR)                                  \
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 7331513b19..519ade24b8 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -2701,6 +2701,9 @@ static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
 	case REFTABLE_FSCK_ERROR_STACK_LIST_MISSING_NEWLINE:
 		msg_id = FSCK_MSG_BAD_REFTABLE_STACK_LIST_NEWLINE;
 		break;
+	case REFTABLE_FSCK_ERROR_UPDATE_INDEX:
+		msg_id = FSCK_MSG_BAD_REFTABLE_UPDATE_INDEX;
+		break;
 	default:
 		BUG("unknown fsck error: %d", info->error);
 	}
diff --git a/reftable/fsck.c b/reftable/fsck.c
index 9a7f22c56b..5c6d842ac1 100644
--- a/reftable/fsck.c
+++ b/reftable/fsck.c
@@ -53,9 +53,8 @@ int reftable_fsck_check(struct reftable_stack *stack,
 			reftable_fsck_verbose_fn verbose_fn,
 			void *cb_data)
 {
-
+	uint64_t min, max, prev_max = 0;
 	char **names = NULL;
-	uint64_t min, max;
 	int err = 0;
 
 	if (stack == NULL)
@@ -87,10 +86,22 @@ int reftable_fsck_check(struct reftable_stack *stack,
 			continue;
 		}
 
+		if (min != (prev_max + 1) || max < min) {
+			struct reftable_fsck_info info = {
+				.error = REFTABLE_FSCK_ERROR_UPDATE_INDEX,
+				.path = names[i],
+				.msg = "incorrect update index in table name"
+			};
+
+			err = report_fn(&info, cb_data);
+		}
+
 		if (strcmp(tail, ".ref")) {
 			info.msg = "invalid reftable table extension";
 			err = report_fn(&info, cb_data);
 		}
+
+		prev_max = max;
 	}
 
 	verbose_fn("Checking trailing newline in stack list", cb_data);
diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
index 17df661da8..0ab20a99b6 100644
--- a/reftable/reftable-fsck.h
+++ b/reftable/reftable-fsck.h
@@ -10,6 +10,8 @@ enum reftable_fsck_error {
 	REFTABLE_FSCK_ERROR_STACK_COUNT = -2,
 	/* Newline missing at the end of the stack list */
 	REFTABLE_FSCK_ERROR_STACK_LIST_MISSING_NEWLINE = -3,
+	/* Incorrect update index for table */
+	REFTABLE_FSCK_ERROR_UPDATE_INDEX = -4,
 };
 
 /* Represents an individual error encountered during the FSCK checks. */
diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
index 3b119eae62..1f37691b2e 100755
--- a/t/t0614-reftable-fsck.sh
+++ b/t/t0614-reftable-fsck.sh
@@ -96,4 +96,66 @@ test_expect_success 'stack list must contain trailing newline' '
 	)
 '
 
+test_expect_success 'table update index should be sequential between tables' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git commit --allow-empty -m initial &&
+
+		# Lock the existing table to disable auto-compaction
+		CUR_TABLE=$(cat .git/reftable/tables.list | tail -n1) &&
+		touch .git/reftable/${CUR_TABLE}.lock &&
+		git update-ref refs/heads/sample @ &&
+		rm .git/reftable/${CUR_TABLE}.lock &&
+
+		git refs verify 2>err &&
+		test_must_be_empty err &&
+
+		TABLE_NAME=$(cat .git/reftable/tables.list | tail -n1) &&
+		NEW_TABLE_NAME=$(echo ${TABLE_NAME} | sed "s/0003/0009/g") &&
+
+		sed "2s/.*/${NEW_TABLE_NAME}/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
+		mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
+		mv .git/reftable/${TABLE_NAME} .git/reftable/${NEW_TABLE_NAME} &&
+
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: ${NEW_TABLE_NAME}: badReftableUpdateIndex: incorrect update index in table name
+		EOF
+		test_cmp expect err
+	)
+'
+
+test_expect_success 'table update index should be sequential within a table' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git commit --allow-empty -m initial &&
+
+		# Lock the existing table to disable auto-compaction
+		CUR_TABLE=$(cat .git/reftable/tables.list | tail -n1) &&
+		touch .git/reftable/${CUR_TABLE}.lock &&
+		git update-ref refs/heads/sample @ &&
+		rm .git/reftable/${CUR_TABLE}.lock &&
+
+		git refs verify 2>err &&
+		test_must_be_empty err &&
+
+		TABLE_NAME=$(cat .git/reftable/tables.list | tail -n1) &&
+		NEW_TABLE_NAME=$(echo ${TABLE_NAME} | sed "s/\(.*\)0003/\10002/") &&
+
+		sed "2s/.*/${NEW_TABLE_NAME}/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
+		mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
+		mv .git/reftable/${TABLE_NAME} .git/reftable/${NEW_TABLE_NAME} &&
+
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: ${NEW_TABLE_NAME}: badReftableUpdateIndex: incorrect update index in table name
+		EOF
+		test_cmp expect err
+	)
+'
+
 test_done

-- 
2.50.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/5] fsck: order 'fsck_msg_type' alphabetically
  2025-09-02  7:05   ` [PATCH v2 1/5] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
@ 2025-09-02 22:25     ` Junio C Hamano
  0 siblings, 0 replies; 28+ messages in thread
From: Junio C Hamano @ 2025-09-02 22:25 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, jltobler, shejialuo

Karthik Nayak <karthik.188@gmail.com> writes:

> The list of 'fsck_msg_type' seem to be alphabetically ordered, but there
> are a few small misses. Fix this by sorting the sub-sections of the
> list to maintain alphabetical ordering. Also fix a clang-format issue
> where the escaped newlines are not aligned.
>
> While here, remove a duplicate instance of 'gitmodulesLarge' in the
> 'fsck-msgids' documentation.

"A few small misses".

> diff --git a/fsck.h b/fsck.h
> index dd7df3d5b3..559ad57807 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -20,82 +20,82 @@ enum fsck_msg_type {
> ...
> -#define FOREACH_FSCK_MSG_ID(FUNC) \
> -	/* fatal errors */ \
> -	FUNC(NUL_IN_HEADER, FATAL) \
> -	FUNC(UNTERMINATED_HEADER, FATAL) \
> ...
> +#define FOREACH_FSCK_MSG_ID(FUNC)                                  \
> +	/* fatal errors */                                         \
> +	FUNC(NUL_IN_HEADER, FATAL)                                 \
> +	FUNC(UNTERMINATED_HEADER, FATAL)                           \
> ...

Please undo these "pad by spaces before backslash"; otherwise we
cannot tell which ones are "a few small misses".

Thanks.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/5] refs/reftable: add fsck check for trailing newline
  2025-09-02  7:05   ` [PATCH v2 4/5] refs/reftable: add fsck check for trailing newline Karthik Nayak
@ 2025-09-02 22:38     ` Junio C Hamano
  2025-09-03  8:07       ` Patrick Steinhardt
  0 siblings, 1 reply; 28+ messages in thread
From: Junio C Hamano @ 2025-09-02 22:38 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, jltobler, shejialuo

Karthik Nayak <karthik.188@gmail.com> writes:

> Introduce a fsck check for the reftable backend, which checks if the
> 'tables.list' contains a newline. The reftable backend writes a trailing
> newline when writing the 'tables.list', but it doesn't check for it when
> reading the file. A missing newline however indicates that the file was
> manually tampered with, so let's raise this as an error to the user.

Hmph, how does the code react to other kinds of "manual tampering"?
For example, if an empty line is inserted between two existing lines
(or at the beginning of the file, for that matter), would the parser
detect it as a corrupt file and die?

If so, it makes me strongly suspect that we are better off enforcing
that the file does not end in an incomplete line at runtime and barf
just the same way, instead of "most of the anomalies that the write
codepath would never produce would cause error on the read codepath,
but only this one that the read codepath is happy with is caught by
the fsck", which does not sound quite right.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/5] refs/reftable: add fsck check for incorrect update index
  2025-09-02  7:05   ` [PATCH v2 5/5] refs/reftable: add fsck check for incorrect update index Karthik Nayak
@ 2025-09-02 22:42     ` Junio C Hamano
  0 siblings, 0 replies; 28+ messages in thread
From: Junio C Hamano @ 2025-09-02 22:42 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, jltobler, shejialuo

Karthik Nayak <karthik.188@gmail.com> writes:

> Introduce a fsck check for the reftable backend, which checks if the
> tables in 'tables.list' contain sequential update index. The tables in
> the reftable backend should contain sequential update index. This fsck
> check ensures that.
>
> We must note that the reftable backend itself doesn't check to ensure
> this and it also doesn't check to ensure that the index in the table
> name matches the index in the header or the table. The latter is not
> implemented in this fsck check either and will be added in a future
> patch where we add fsck checks for internals of a table.

Similar to the previous step, I am not sure why this should not be
checked at runtime and is flagged as an error.

In general, we do try to avoid retroactively tightening rules, but
the reftable is so new and not even the default.  If we noticed that
the runtime has been overly loose, the time to tighten it is now,
not after even more installations use it.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/5] refs/reftable: add fsck check for checking the table name
  2025-09-02  7:05   ` [PATCH v2 2/5] refs/reftable: add fsck check for checking the table name Karthik Nayak
@ 2025-09-03  8:07     ` Patrick Steinhardt
  2025-09-03 16:51       ` shejialuo
  0 siblings, 1 reply; 28+ messages in thread
From: Patrick Steinhardt @ 2025-09-03  8:07 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, jltobler, shejialuo

On Tue, Sep 02, 2025 at 09:05:22AM +0200, Karthik Nayak wrote:
> The `git refs verify` command is used to run fsck checks on the
> reference backends. This command is also invoked when users run 'git
> fsck'. While the files-backend has some fsck checks added, the reftable
> backend lacks such checks. Let's add the required infrastructure and a
> check to test for the table names in the 'tables.list' of reftables.
> 
> For the infrastructure, since the reftable library is treated as an
> independent library we should ensure that the library code works
> independently without knowledge about Git's internals. To do this,
> add both 'reftable/fsck.c' and 'reftable/reftable-fsck.h'. Which
> provide an entry point 'reftable_fsck_check' for running fsck checks
> over a provided reftable stack. The callee provides the function with
> callbacks to handle issue and information reporting.
> 
> Add glue code in 'refs/reftable-backend.c' which calls the reftable
> library to perform the fsck checks. Here we also map the reftable errors
> to Git' fsck errors.
> 
> Introduce a check to validate table names for a given reftable stack.
> Also add 'badReftableTableName' as a corresponding error within Git. Add
> a test to check for this behavior.
> 
> While here, remove a unused header `#include "../lockfile.h"` from
> 'refs/reftable-backend.c'.

It's quite a bunch of changes overall that could've been reasonably
split up into multiple commits. E.g. one to introduce the reftable-side
logic, one to start calling it in Git, and one to drop the superfluous
header.

> diff --git a/Makefile b/Makefile
> index e11340c1ae..f2ddcc8d7c 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -2733,6 +2733,7 @@ REFTABLE_OBJS += reftable/error.o
>  REFTABLE_OBJS += reftable/block.o
>  REFTABLE_OBJS += reftable/blocksource.o
>  REFTABLE_OBJS += reftable/iter.o
> +REFTABLE_OBJS += reftable/fsck.o

"f" is before "i" in the alphabet I'm accustomed to :) So let's retain
lexicographic ordering here.

> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index 8dae1e1112..c38c6422f8 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -2675,11 +2676,55 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
>  	return ret;
>  }
>  
> -static int reftable_be_fsck(struct ref_store *ref_store UNUSED,
> -			    struct fsck_options *o UNUSED,
> +static void reftable_fsck_verbose_handler(const char *msg, void *cb_data)
> +{
> +	struct fsck_options *o = cb_data;
> +
> +	if (o->verbose)
> +		fprintf_ln(stderr, "%s", _(msg));
> +}

Is this `_()` marker correct here? There isn't really any reasonable way
for somebody to translate a variable with unknown contents. So shouldn't
it only be the caller of `reftable_fsck_verbose_handler()` that should
mark the string as translatable?

> +static int reftable_fsck_error_handler(struct reftable_fsck_info *info,
> +				       void *cb_data)
> +{
> +	struct fsck_ref_report report = { .path = info->path };
> +	struct fsck_options *o = cb_data;
> +	enum fsck_msg_id msg_id;
> +
> +	switch (info->error) {
> +	case REFTABLE_FSCK_ERROR_TABLE_NAME:
> +		msg_id = FSCK_MSG_BAD_REFTABLE_TABLE_NAME;
> +		break;
> +	default:
> +		BUG("unknown fsck error: %d", info->error);
> +	}
> +
> +	return fsck_report_ref(o, &report, msg_id, "%s", info->msg);
> +}

I think this function will become a bit unwieldy over time. We might
instead want to have an array that maps from reftable-specific to
fsck-specific error code:

    static const fsck_msg_id[] = {
        [REFTABLE_FSCK_ERROR_TABLE_NAME] = FSCK_MSG_BAD_REFTABLE_TABLE_NAME,
    };

So in that case, all we'd have to do is to perform bounds checking in
the above function. And maybe verify that the developer didn't forget to
fill in a new msg ID by checking that the derived message ID is non-zero.

> +static int reftable_be_fsck(struct ref_store *ref_store, struct fsck_options *o,
>  			    struct worktree *wt UNUSED)
>  {
> -	return 0;
> +	struct reftable_ref_store *refs;
> +	struct strmap_entry *entry;
> +	struct hashmap_iter iter;
> +	int ret = 0;
> +
> +	refs = reftable_be_downcast(ref_store, REF_STORE_READ, "fsck");
> +
> +	if (o->verbose)
> +		fprintf_ln(stderr, _("Checking references consistency"));

This line is duplicate across both backends, right? Maybe it's something
that we can do in the generic logic?

> +	ret |= reftable_fsck_check(refs->main_backend.stack, reftable_fsck_error_handler,
> +				  reftable_fsck_verbose_handler, o);
> +
> +	strmap_for_each_entry(&refs->worktree_backends, &iter, entry) {
> +		struct reftable_backend *b = (struct reftable_backend *)entry->value;
> +		ret |= reftable_fsck_check(b->stack, reftable_fsck_error_handler,
> +					  reftable_fsck_verbose_handler, o);
> +	}
> +
> +	return ret;
>  }
>  
>  struct ref_storage_be refs_be_reftable = {

Looks good.

> diff --git a/reftable/fsck.c b/reftable/fsck.c
> new file mode 100644
> index 0000000000..4282b1413e
> --- /dev/null
> +++ b/reftable/fsck.c
> @@ -0,0 +1,53 @@
> +#include "basics.h"
> +#include "reftable-fsck.h"
> +#include "stack.h"
> +
> +int reftable_fsck_check(struct reftable_stack *stack,
> +			reftable_fsck_report_fn report_fn,
> +			reftable_fsck_verbose_fn verbose_fn,
> +			void *cb_data)
> +{
> +
> +	char **names = NULL;
> +	uint64_t min, max;
> +	int err = 0;
> +
> +	if (stack == NULL)
> +		goto out;
> +
> +	err = read_lines(stack->list_file, &names);
> +	if (err < 0)
> +		goto out;
> +
> +	verbose_fn("Checking reftable table names", cb_data);
> +
> +	for (size_t i = 0; names[i]; i++) {
> +		struct reftable_fsck_info info = {
> +			.error = REFTABLE_FSCK_ERROR_TABLE_NAME,
> +			.path = names[i],
> +		};
> +		uint32_t rnd;
> +		/*
> +		 * We want to match the tail '.ref'. One extra byte to ensure
> +		 * that there is no unexpected extra character and one byte for
> +		 * the null terminator added by sscanf.
> +		 */
> +		char tail[6];
> +
> +		if (sscanf(names[i], "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x%5s",
> +			   &min, &max, &rnd, tail) != 4) {
> +			info.msg = "invalid reftable table name";

This here is where the string should be translated.

> +			err = report_fn(&info, cb_data);
> +			continue;
> +		}

I think sscanf is quite frowned-upon in the Git codebase. Maybe we
should manually parse through the string instead?

Furthermore, I think we should move every single check into a separate
function, similar to how the files backend does it. This ensures that
checks are self-contained and that it's way easier to add new checks
over time.

Another angle: did you verify that reftables written by JGit follow this
format?

> +		if (strcmp(tail, ".ref")) {
> +			info.msg = "invalid reftable table extension";

Same here, this should be translated.

> diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
> new file mode 100644
> index 0000000000..4cf0053234
> --- /dev/null
> +++ b/reftable/reftable-fsck.h
> @@ -0,0 +1,38 @@
> +#ifndef REFTABLE_FSCK_H
> +#define REFTABLE_FSCK_H
> +
> +#include "reftable-stack.h"
> +
> +enum reftable_fsck_error {
> +	/* Invalid table name */
> +	REFTABLE_FSCK_ERROR_TABLE_NAME = -1,
> +};

Wouldn't it be more natural to give these positive numbers?

> +/* Represents an individual error encountered during the FSCK checks. */
> +struct reftable_fsck_info {
> +	enum reftable_fsck_error error;
> +	const char *msg;
> +	const char *path;
> +};

I wonder whether it should be the reftable library that decides on the
severity of each generated finding.

> +typedef int reftable_fsck_report_fn(struct reftable_fsck_info *info,
> +				    void *cb_data);
> +typedef void reftable_fsck_verbose_fn(const char *msg, void *cb_data);
> +
> +/*
> + * Given a reftable stack, perform FSCK check on the stack.

s/FSCK check/consistency checks/

> + *
> + * If an issue is encountered, the issue is reported to the callee via the
> + * provided 'report_fn'. If the issue is non-recoverable the flow will not
> + * continue. If it is recoverable, the flow will continue and further issues
> + * will be reported as identified.
> + *
> + * The 'verbose_fn' will be invoked to provide verbose information about
> + * the progress and state of the FSCK checks.

Same here.

> diff --git a/t/t0614-reftable-fsck.sh b/t/t0614-reftable-fsck.sh
> new file mode 100755
> index 0000000000..81d30df2d7
> --- /dev/null
> +++ b/t/t0614-reftable-fsck.sh
> @@ -0,0 +1,58 @@
> +#!/bin/sh
> +
> +test_description='Test reftable backend consistency check'
> +
> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME

Tests shouldn't define these variables, but should dynamically figure
out what the default branch name is as required, e.g. by using
git-symbolic-ref(1).

> +GIT_TEST_DEFAULT_REF_FORMAT=reftable
> +export GIT_TEST_DEFAULT_REF_FORMAT
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'table name should be checked' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		git commit --allow-empty -m initial &&
> +
> +		git refs verify 2>err &&
> +		test_must_be_empty err &&
> +
> +		TABLE_NAME=$(cat .git/reftable/tables.list | head -n1) &&

You can drop the cat(1) invocation and directly say `head -n1 file`.

> +		sed "1s/^/extra/" .git/reftable/tables.list >.git/reftable/tables.list.tmp &&
> +		mv .git/reftable/tables.list.tmp .git/reftable/tables.list &&
> +		mv .git/reftable/${TABLE_NAME} .git/reftable/extra${TABLE_NAME} &&

No need for the curly braces around TABLE_NAME here and further down. It
would be nice to quote these strings though.

> +
> +		test_must_fail git refs verify 2>err &&
> +		cat >expect <<-EOF &&
> +		error: extra${TABLE_NAME}: badReftableTableName: invalid reftable table name
> +		EOF
> +		test_cmp expect err
> +	)
> +'
> +
> +test_expect_success 'table name should be checked' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		git commit --allow-empty -m initial &&
> +
> +		git refs verify 2>err &&
> +		test_must_be_empty err &&
> +
> +		TABLE_NAME=$(cat .git/reftable/tables.list | head -n1) &&

Same here wrt the extra invocation of cat(1).

Patrick

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/5] refs/reftable: add fsck check for number of tables
  2025-09-02  7:05   ` [PATCH v2 3/5] refs/reftable: add fsck check for number of tables Karthik Nayak
@ 2025-09-03  8:07     ` Patrick Steinhardt
  0 siblings, 0 replies; 28+ messages in thread
From: Patrick Steinhardt @ 2025-09-03  8:07 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, jltobler, shejialuo

On Tue, Sep 02, 2025 at 09:05:23AM +0200, Karthik Nayak wrote:
> Introduce a reftable fsck check to check that the number of files in the
> reftable directory matches the number of files listed in 'tables.list'.
> We do this by iterating over the files in the reftable directory and
> counting all the files present excluding the 'tables.list'. This is also
> exposed over Git's fsck checks as a 'badReftableStackCount' error.

This feels overly strict, as it can always be the case that a concurrent
process is currently updating the stack. Furthermore, it's expected that
on Windows systems deletion of an old table may not work because the
file is still kept open by another process. The reftable library is
prepared to handle this alright and will re-try deleting the table at a
later point in time.

So maybe a better check would be to verify that there are no files with
unexpected names in the directory?

Patrick

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/5] refs/reftable: add fsck check for trailing newline
  2025-09-02 22:38     ` Junio C Hamano
@ 2025-09-03  8:07       ` Patrick Steinhardt
  0 siblings, 0 replies; 28+ messages in thread
From: Patrick Steinhardt @ 2025-09-03  8:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Karthik Nayak, git, jltobler, shejialuo

On Tue, Sep 02, 2025 at 03:38:33PM -0700, Junio C Hamano wrote:
> Karthik Nayak <karthik.188@gmail.com> writes:
> 
> > Introduce a fsck check for the reftable backend, which checks if the
> > 'tables.list' contains a newline. The reftable backend writes a trailing
> > newline when writing the 'tables.list', but it doesn't check for it when
> > reading the file. A missing newline however indicates that the file was
> > manually tampered with, so let's raise this as an error to the user.
> 
> Hmph, how does the code react to other kinds of "manual tampering"?
> For example, if an empty line is inserted between two existing lines
> (or at the beginning of the file, for that matter), would the parser
> detect it as a corrupt file and die?
> 
> If so, it makes me strongly suspect that we are better off enforcing
> that the file does not end in an incomplete line at runtime and barf
> just the same way, instead of "most of the anomalies that the write
> codepath would never produce would cause error on the read codepath,
> but only this one that the read codepath is happy with is caught by
> the fsck", which does not sound quite right.

Fair, I'm also of the opinion that we should tighten the parser logic to
detect and reject any invalid files. For previous checks where we verify
that the table names are sane I think it's fair to live with it and warn
about those, as the actual names don't really matter. But as soon as we
hit actually-broken formats I also think that we're better of rejecting
those altogether.

Patrick

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] refs/reftable: add fsck check for checking the table name
  2025-09-01 13:33     ` Karthik Nayak
@ 2025-09-03 13:39       ` shejialuo
  0 siblings, 0 replies; 28+ messages in thread
From: shejialuo @ 2025-09-03 13:39 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, ps

On Mon, Sep 01, 2025 at 06:33:24AM -0700, Karthik Nayak wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> > On Tue, Aug 19, 2025 at 02:21:01PM +0200, Karthik Nayak wrote:
> >> The `git refs verify` command is used to run fsck checks on the
> >> reference backends. This command is also invoked when users run 'git
> >> fsck'. While the files-backend has some fsck checks added, the reftable
> >> backend lacks such checks. Let's add the required infrastructure and a
> >> check to test for the table names in the 'tables.list' of reftables.
> >>
> >> For the infrastructure, since the reftable library is treated as an
> >> independent library we should ensure that the library code works
> >> independently without knowledge about Git's internals. To do this,
> >> add both 'reftable/fsck.c' and 'reftable/reftable-fsck.h'. Which
> >
> > A design question here, we name the "fsck.c" for the source code but for
> > the header, we use "reftable-fsck.h", it is a little strange. Why not
> > just "fsck.h" instead of "reftable-fsck.h".
> >
> 
> Since the reftable code is treated as an external library, all
> 'reftable-.*.h' headers are treated as headers which expose APIs for the
> libraries users. We would have defined 'reftable/fsck.h' if there were
> internal users of the 'fsck.c' code. But there are none.
> 

I understand the design. Thanks for the explanation.

[snip]

> >> +		uint32_t rnd;
> >> +		/*
> >> +		 * We want to match the tail '.ref'. One extra byte to ensure
> >> +		 * that there is no unexpected extra character and one byte for
> >> +		 * the null terminator added by sscanf.
> >> +		 */
> >> +		char tail[6];
> >> +
> >> +		if (sscanf(names[i], "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x%5s",
> >> +			   &min, &max, &rnd, tail) != 4) {
> >> +			err = report_fn(info, cb_data);
> >
> > I think we could just pass pointer to avoid unnecessary copy operations.
> > Besides that, I think here we report two different kinds of problem. But
> > we would give report the user always the same message `invalid reftable
> > name`. This is too vague.
> >
> 
> Not sure what you mean by 'unnecessary copy operations', could you
> elaborate?
> 

In `report_fn`, we would copy the `info` value for each call. That's my
meaning.

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/5] refs/reftable: add fsck check for checking the table name
  2025-09-03  8:07     ` Patrick Steinhardt
@ 2025-09-03 16:51       ` shejialuo
  0 siblings, 0 replies; 28+ messages in thread
From: shejialuo @ 2025-09-03 16:51 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Karthik Nayak, git, jltobler

On Wed, Sep 03, 2025 at 10:07:13AM +0200, Patrick Steinhardt wrote:

[snip]

> > +static int reftable_be_fsck(struct ref_store *ref_store, struct fsck_options *o,
> >  			    struct worktree *wt UNUSED)
> >  {
> > -	return 0;
> > +	struct reftable_ref_store *refs;
> > +	struct strmap_entry *entry;
> > +	struct hashmap_iter iter;
> > +	int ret = 0;
> > +
> > +	refs = reftable_be_downcast(ref_store, REF_STORE_READ, "fsck");
> > +
> > +	if (o->verbose)
> > +		fprintf_ln(stderr, _("Checking references consistency"));
> 
> This line is duplicate across both backends, right? Maybe it's something
> that we can do in the generic logic?
> 

That's right, it is duplicate. If we want to remove this, we need to do
this in the "builtin/refs.c". But I wonder whether we should do this in
the first place. Should we rather add more detailed information just
like the following code for packed backend?

    if (o->verbose)
        fprintf_ln(stderr, "Checking packed-refs file %s", refs->path);

Instead of just using

    Checking references consistency

Could we use

    Checking reftable references consistency

However, I also feel strange about above, :)

[snip]

> > +/* Represents an individual error encountered during the FSCK checks. */
> > +struct reftable_fsck_info {
> > +	enum reftable_fsck_error error;
> > +	const char *msg;
> > +	const char *path;
> > +};
> 
> I wonder whether it should be the reftable library that decides on the
> severity of each generated finding.
> 

That's an interesting question. Let's inspect how Git handles the
severity. When defining the fsck message id, we need to specify its
severity like the following shows, this happens at compile time:

    FUNC(BAD_REFERENT_NAME, ERROR)

And we could set the configuration "fsck.[message id]=" to change the
fsck message severity.

Then let's think if reftable library decides the severity. It means that
we need to use the API from reftable library to update
"fsck_option->msg_type" at the runtime. And it is bad because the fsck
infrastructure would be highly coupled with the reftable library.

So, I don't think it's a good idea for reftable library to choose the
severity. Instead, reftable library should just provide users with error
types and let the users decide the severity.

Thanks,
Jialuo

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2025-09-03 16:51 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-19 12:20 [PATCH 0/5] refs/reftable: add fsck checks Karthik Nayak
2025-08-19 12:21 ` [PATCH 1/5] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
2025-08-19 12:21 ` [PATCH 2/5] refs/reftable: add fsck check for checking the table name Karthik Nayak
2025-08-26 16:21   ` shejialuo
2025-09-01 13:33     ` Karthik Nayak
2025-09-03 13:39       ` shejialuo
2025-08-19 12:21 ` [PATCH 3/5] refs/reftable: add fsck check for number of tables Karthik Nayak
2025-08-26 16:33   ` shejialuo
2025-09-01 13:40     ` Karthik Nayak
2025-08-26 16:44   ` shejialuo
2025-09-01 13:52     ` Karthik Nayak
2025-08-19 12:21 ` [PATCH 4/5] refs/reftable: add fsck check for trailing newline Karthik Nayak
2025-08-19 12:21 ` [PATCH 5/5] refs/reftable: add fsck check for incorrect update index Karthik Nayak
2025-08-26 16:39 ` [PATCH 0/5] refs/reftable: add fsck checks shejialuo
2025-09-01 13:52   ` Karthik Nayak
2025-09-02  7:05 ` [PATCH v2 " Karthik Nayak
2025-09-02  7:05   ` [PATCH v2 1/5] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
2025-09-02 22:25     ` Junio C Hamano
2025-09-02  7:05   ` [PATCH v2 2/5] refs/reftable: add fsck check for checking the table name Karthik Nayak
2025-09-03  8:07     ` Patrick Steinhardt
2025-09-03 16:51       ` shejialuo
2025-09-02  7:05   ` [PATCH v2 3/5] refs/reftable: add fsck check for number of tables Karthik Nayak
2025-09-03  8:07     ` Patrick Steinhardt
2025-09-02  7:05   ` [PATCH v2 4/5] refs/reftable: add fsck check for trailing newline Karthik Nayak
2025-09-02 22:38     ` Junio C Hamano
2025-09-03  8:07       ` Patrick Steinhardt
2025-09-02  7:05   ` [PATCH v2 5/5] refs/reftable: add fsck check for incorrect update index Karthik Nayak
2025-09-02 22:42     ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).