Git development
 help / color / mirror / Atom feed
From: "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Elijah Newren <newren@gmail.com>, Patrick Steinhardt <ps@pks.im>,
	Johannes Schindelin <johannes.schindelin@gmx.de>,
	Johannes Schindelin <johannes.schindelin@gmx.de>
Subject: [PATCH/RFC 4/5] test-tool: add a "historian" subcommand for building merge fixtures
Date: Wed, 06 May 2026 22:43:23 +0000	[thread overview]
Message-ID: <72c486312cde9a9fd2dedb60bc43c5c3e40a0d64.1778107405.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2106.git.1778107405.gitgitgadget@gmail.com>

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The merge-replay tests added in a follow-up commit need a way to set
up specific topologies with full control over blob contents, parent
order, and per-side trees. Sequencing plumbing commands or driving
plain `git fast-import` from shell quickly becomes unreadable for
the kinds of scenarios that exercise non-trivial merge resolution
(textual conflicts, semantic edits outside the conflict region,
intentional limitations such as new content on one side).

Add a small `test-tool historian` subcommand that reads a tight,
shell-quoted, one-line-per-object DSL and feeds an equivalent stream
to a `git fast-import` child process. Each blob and commit is given
a logical name; the helper allocates fast-import marks on first use
and emits a lightweight tag for every commit so tests can refer to
the resulting object via `refs/tags/<name>`.

The DSL has just two directives:

  blob NAME LINE...
  commit NAME BRANCH SUBJECT [from=NAME] [merge=NAME]... [PATH=BLOB]...

A blob's content is the listed lines joined with `\n` (and a final
`\n`); a commit's tree is exactly the listed PATH=BLOB pairs (the
helper emits a `deleteall` so nothing leaks in from the implicit
parent). Token splitting is delegated to `split_cmdline()` so quoted
arguments work as in shell. Marks for parent references and file
contents go through the same `strintmap`-backed name resolver, which
keeps the helper itself trivially small: blob writing, tree
construction, commit creation and merge-base computation are all
handled by `git fast-import`.

Note that the DSL reserves the names `from` and `merge` (with a
trailing `=`) for parent specification; a tree path called `from` or
`merge` cannot be expressed via this helper. That is acceptable here
because every input is a tightly controlled test fixture and the
filenames are chosen by the test author.

The helper trusts its caller: malformed input results in a
fast-import error rather than a friendly diagnostic.

Wire the new subcommand into the Makefile and meson build, register
it in `t/helper/test-tool.{c,h}`.

Assisted-by: Claude Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Makefile                  |   1 +
 t/helper/meson.build      |   1 +
 t/helper/test-historian.c | 189 ++++++++++++++++++++++++++++++++++++++
 t/helper/test-tool.c      |   1 +
 t/helper/test-tool.h      |   1 +
 5 files changed, 193 insertions(+)
 create mode 100644 t/helper/test-historian.c

diff --git a/Makefile b/Makefile
index cedc234173..b38678b484 100644
--- a/Makefile
+++ b/Makefile
@@ -832,6 +832,7 @@ TEST_BUILTINS_OBJS += test-hash-speed.o
 TEST_BUILTINS_OBJS += test-hash.o
 TEST_BUILTINS_OBJS += test-hashmap.o
 TEST_BUILTINS_OBJS += test-hexdump.o
+TEST_BUILTINS_OBJS += test-historian.o
 TEST_BUILTINS_OBJS += test-json-writer.o
 TEST_BUILTINS_OBJS += test-lazy-init-name-hash.o
 TEST_BUILTINS_OBJS += test-match-trees.o
diff --git a/t/helper/meson.build b/t/helper/meson.build
index 675e64c010..704edd1e1f 100644
--- a/t/helper/meson.build
+++ b/t/helper/meson.build
@@ -29,6 +29,7 @@ test_tool_sources = [
   'test-hash.c',
   'test-hashmap.c',
   'test-hexdump.c',
+  'test-historian.c',
   'test-json-writer.c',
   'test-lazy-init-name-hash.c',
   'test-match-trees.c',
diff --git a/t/helper/test-historian.c b/t/helper/test-historian.c
new file mode 100644
index 0000000000..2250d420c0
--- /dev/null
+++ b/t/helper/test-historian.c
@@ -0,0 +1,189 @@
+/*
+ * Build a small history out of a tiny declarative input. Used by tests
+ * that need specific merge topologies without long sequences of
+ * plumbing commands or fragile shell helpers.
+ *
+ * The historian reads stdin line by line and emits an equivalent
+ * stream to a `git fast-import` child process. It also allocates marks
+ * for named objects so tests can refer to commits and blobs by name.
+ *
+ * Input directives (one per line, shell-style quoting):
+ *
+ *	blob NAME LINE1 LINE2 ...
+ *	    Each LINE becomes a content line in the blob; lines are
+ *	    joined with '\n' and the blob ends with a final '\n'. With
+ *	    no LINEs, the blob is empty.
+ *
+ *	commit NAME BRANCH SUBJECT [from=PARENT] [merge=PARENT]... [PATH=BLOB]...
+ *	    Creates a commit on refs/heads/BRANCH using the listed
+ *	    file=blob mappings as the entire tree (no inheritance from
+ *	    parents). Up to one `from=` and any number of `merge=`
+ *	    parents may be given. `from=` defaults to the current branch
+ *	    tip; if BRANCH has no tip yet, the commit becomes a root.
+ *
+ * Each `commit NAME` directive also creates a lightweight tag
+ * `refs/tags/NAME` so tests can `git rev-parse NAME`.
+ *
+ * This helper trusts its caller; malformed input results in fast-import
+ * errors. That is fine because test scripts feed it tightly controlled
+ * input.
+ */
+
+#define USE_THE_REPOSITORY_VARIABLE
+
+#include "test-tool.h"
+#include "git-compat-util.h"
+#include "alias.h"
+#include "run-command.h"
+#include "setup.h"
+#include "strbuf.h"
+#include "strmap.h"
+#include "strvec.h"
+
+static int next_mark = 1;
+
+static int resolve_mark(struct strintmap *names, const char *name)
+{
+	int n = strintmap_get(names, name);
+	if (!n) {
+		n = next_mark++;
+		strintmap_set(names, name, n);
+	}
+	return n;
+}
+
+static void emit_data(FILE *out, const char *data, size_t len)
+{
+	fprintf(out, "data %"PRIuMAX"\n", (uintmax_t)len);
+	fwrite(data, 1, len, out);
+	fputc('\n', out);
+}
+
+static void emit_blob(FILE *out, struct strintmap *names,
+		      int argc, const char **argv)
+{
+	struct strbuf content = STRBUF_INIT;
+	int n = resolve_mark(names, argv[1]);
+	int i;
+
+	for (i = 2; i < argc; i++) {
+		strbuf_addstr(&content, argv[i]);
+		strbuf_addch(&content, '\n');
+	}
+
+	fprintf(out, "blob\nmark :%d\n", n);
+	emit_data(out, content.buf, content.len);
+	strbuf_release(&content);
+}
+
+static void emit_tag(FILE *out, const char *name, int mark)
+{
+	fprintf(out, "reset refs/tags/%s\nfrom :%d\n\n", name, mark);
+}
+
+static void emit_commit(FILE *out, struct strintmap *names,
+			int argc, const char **argv, int seq)
+{
+	int n = resolve_mark(names, argv[1]);
+	const char *branch = argv[2];
+	const char *subject = argv[3];
+	const char *rest;
+	int i;
+
+	fprintf(out, "commit refs/heads/%s\nmark :%d\n", branch, n);
+	fprintf(out, "author A <a@e> %d +0000\n", 1700000000 + seq);
+	fprintf(out, "committer A <a@e> %d +0000\n", 1700000000 + seq);
+	emit_data(out, subject, strlen(subject));
+
+	/*
+	 * fast-import requires `from` and `merge` to precede all file
+	 * operations; emit them first regardless of argv ordering.
+	 */
+	for (i = 4; i < argc; i++) {
+		if (skip_prefix(argv[i], "from=", &rest))
+			fprintf(out, "from :%d\n", resolve_mark(names, rest));
+		else if (skip_prefix(argv[i], "merge=", &rest))
+			fprintf(out, "merge :%d\n", resolve_mark(names, rest));
+	}
+
+	/*
+	 * The PATH=BLOB list is the entire tree; wipe whatever the
+	 * implicit parent contributed before re-applying it.
+	 */
+	fprintf(out, "deleteall\n");
+	for (i = 4; i < argc; i++) {
+		const char *eq;
+		size_t key_len;
+		char *path;
+
+		if (skip_prefix(argv[i], "from=", &rest) ||
+		    skip_prefix(argv[i], "merge=", &rest))
+			continue;
+		eq = strchr(argv[i], '=');
+		if (!eq)
+			die("bad commit spec '%s'", argv[i]);
+		key_len = eq - argv[i];
+		path = xmemdupz(argv[i], key_len);
+		fprintf(out, "M 100644 :%d %s\n",
+			resolve_mark(names, eq + 1), path);
+		free(path);
+	}
+
+	fputc('\n', out);
+	emit_tag(out, argv[1], n);
+}
+
+int cmd__historian(int argc, const char **argv UNUSED)
+{
+	struct child_process fi = CHILD_PROCESS_INIT;
+	struct strintmap names = STRINTMAP_INIT;
+	struct strbuf line = STRBUF_INIT;
+	int seq = 0;
+	int ret = 0;
+	FILE *fi_in;
+
+	if (argc != 1)
+		die("usage: test-tool historian <input");
+
+	setup_git_directory();
+
+	strvec_pushl(&fi.args, "fast-import", "--quiet", "--force", NULL);
+	fi.git_cmd = 1;
+	fi.in = -1;
+	fi.no_stdout = 1;
+	if (start_command(&fi))
+		die("failed to start git fast-import");
+	fi_in = xfdopen(fi.in, "w");
+
+	while (strbuf_getline_lf(&line, stdin) != EOF) {
+		const char **a = NULL;
+		int n;
+
+		strbuf_trim(&line);
+		if (!line.len || line.buf[0] == '#')
+			continue;
+
+		n = split_cmdline(line.buf, &a);
+		if (n < 0)
+			die("split_cmdline failed: %s",
+			    split_cmdline_strerror(n));
+
+		if (n >= 2 && !strcmp(a[0], "blob"))
+			emit_blob(fi_in, &names, n, a);
+		else if (n >= 4 && !strcmp(a[0], "commit"))
+			emit_commit(fi_in, &names, n, a, seq++);
+		else
+			die("unknown directive: %s", a[0]);
+
+		free(a);
+	}
+
+	if (fclose(fi_in))
+		die_errno("close fast-import stdin");
+	if (finish_command(&fi))
+		ret = 1;
+
+	strbuf_release(&line);
+	strintmap_clear(&names);
+	return ret;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index a7abc618b3..28bde98ce1 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -39,6 +39,7 @@ static struct test_cmd cmds[] = {
 	{ "hashmap", cmd__hashmap },
 	{ "hash-speed", cmd__hash_speed },
 	{ "hexdump", cmd__hexdump },
+	{ "historian", cmd__historian },
 	{ "json-writer", cmd__json_writer },
 	{ "lazy-init-name-hash", cmd__lazy_init_name_hash },
 	{ "match-trees", cmd__match_trees },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 7f150fa1eb..78cec8594a 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -32,6 +32,7 @@ int cmd__getcwd(int argc, const char **argv);
 int cmd__hashmap(int argc, const char **argv);
 int cmd__hash_speed(int argc, const char **argv);
 int cmd__hexdump(int argc, const char **argv);
+int cmd__historian(int argc, const char **argv);
 int cmd__json_writer(int argc, const char **argv);
 int cmd__lazy_init_name_hash(int argc, const char **argv);
 int cmd__match_trees(int argc, const char **argv);
-- 
gitgitgadget


  parent reply	other threads:[~2026-05-06 22:43 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-06 22:43 [PATCH/RFC 0/5] replay: support replaying 2-parent merges Johannes Schindelin via GitGitGadget
2026-05-06 22:43 ` [PATCH/RFC 1/5] " Johannes Schindelin via GitGitGadget
2026-05-08  9:36   ` Phillip Wood
2026-05-08 10:05     ` Phillip Wood
2026-05-06 22:43 ` [PATCH/RFC 2/5] replay: short-circuit merge replay when parent and base trees are unchanged Johannes Schindelin via GitGitGadget
2026-05-06 22:43 ` [PATCH/RFC 3/5] history.adoc: describe merge-replay support and its limits Johannes Schindelin via GitGitGadget
2026-05-06 22:43 ` Johannes Schindelin via GitGitGadget [this message]
2026-05-12 10:54   ` [PATCH/RFC 4/5] test-tool: add a "historian" subcommand for building merge fixtures Toon Claes
2026-05-06 22:43 ` [PATCH/RFC 5/5] t3454: cover merge-replay scenarios with the historian helper Johannes Schindelin via GitGitGadget
2026-05-07 14:14 ` [PATCH/RFC 0/5] replay: support replaying 2-parent merges D. Ben Knoble
2026-05-07 15:06   ` Johannes Schindelin
2026-05-07 15:39     ` Ben Knoble

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=72c486312cde9a9fd2dedb60bc43c5c3e40a0d64.1778107405.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=johannes.schindelin@gmx.de \
    --cc=newren@gmail.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox