Git development

Git development
 help / color / mirror / Atom feed

* [PATCH 4/6] t: add lint-style.pl with test_grep negation rule
From: Michael Montalbo via GitGitGadget @ 2026-06-04  7:45 UTC (permalink / raw)
  To: git; +Cc: D. Ben Knoble, Eric Sunshine, Michael Montalbo, Michael Montalbo
In-Reply-To: <pull.2135.git.1780559158.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

Add a mechanical lint checker for test scripts, similar in spirit to
check-non-portable-shell.pl but focused on test conventions rather
than portability.

The tool defines LintParser, a subclass of ScriptParser (from the
shared lib-shell-parser.pl module).  ScriptParser's
parse_cmd() finds test_expect_success blocks and calls check_test()
for each body; LintParser overrides check_test() to run lint rules
on the parsed commands.  A "# lint-ok" comment suppresses all
checks for intentional style violations.

The first rule detects '! test_grep' and replaces it with
'test_grep !'.  Shell-level negation suppresses the diagnostic
output that test_grep prints on failure; the built-in negation
preserves it.

Three violations inside test bodies are converted via --fix.  One
additional violation in a helper function outside test_expect_success
(t7900's test_geometric_repack_needed) is converted manually, since
the parser only processes test bodies.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
 t/.gitattributes                           |   2 +
 t/Makefile                                 |  32 +++-
 t/lint-style.pl                            | 200 +++++++++++++++++++++
 t/lint-style/heredoc.expect                |   3 +
 t/lint-style/heredoc.test                  |  14 ++
 t/lint-style/test-grep-negation-fix.expect |   4 +
 t/lint-style/test-grep-negation-fix.test   |   4 +
 t/lint-style/test-grep-negation.expect     |   3 +
 t/lint-style/test-grep-negation.test       |   4 +
 t/t0031-lockfile-pid.sh                    |   2 +-
 t/t5300-pack-object.sh                     |   2 +-
 t/t5319-multi-pack-index.sh                |   2 +-
 t/t7900-maintenance.sh                     |   2 +-
 13 files changed, 268 insertions(+), 6 deletions(-)
 create mode 100755 t/lint-style.pl
 create mode 100644 t/lint-style/heredoc.expect
 create mode 100644 t/lint-style/heredoc.test
 create mode 100644 t/lint-style/test-grep-negation-fix.expect
 create mode 100644 t/lint-style/test-grep-negation-fix.test
 create mode 100644 t/lint-style/test-grep-negation.expect
 create mode 100644 t/lint-style/test-grep-negation.test

diff --git a/t/.gitattributes b/t/.gitattributes
index 7664c6e027..aea6889d03 100644
--- a/t/.gitattributes
+++ b/t/.gitattributes
@@ -1,5 +1,7 @@
 t[0-9][0-9][0-9][0-9]/* -whitespace
 /chainlint/*.expect eol=lf -whitespace
+/lint-style/*.expect eol=lf -whitespace
+/lint-style/*.test eol=lf -whitespace
 /t0110/url-* binary
 /t3206/* eol=lf
 /t3900/*.txt eol=lf
diff --git a/t/Makefile b/t/Makefile
index 25f923fed9..3a5fa4ce37 100644
--- a/t/Makefile
+++ b/t/Makefile
@@ -46,6 +46,7 @@ TPERF = $(sort $(wildcard perf/p[0-9][0-9][0-9][0-9]-*.sh))
 TINTEROP = $(sort $(wildcard interop/i[0-9][0-9][0-9][0-9]-*.sh))
 CHAINLINTTESTS = $(sort $(patsubst chainlint/%.test,%,$(wildcard chainlint/*.test)))
 CHAINLINT = '$(PERL_PATH_SQ)' chainlint.pl
+LINT_STYLE_TESTS = $(sort $(wildcard lint-style/*.test))
 UNIT_TEST_SOURCES = $(wildcard unit-tests/t-*.c)
 UNIT_TEST_PROGRAMS = $(patsubst unit-tests/%.c,unit-tests/bin/%$(X),$(UNIT_TEST_SOURCES))
 UNIT_TEST_PROGRAMS += unit-tests/bin/unit-tests$(X)
@@ -139,7 +140,7 @@ check-meson:
 test-lint: test-lint-duplicates test-lint-executable \
 	test-lint-filenames
 ifneq ($(PERL_PATH),)
-test-lint: test-lint-shell-syntax check-shell-parser
+test-lint: test-lint-shell-syntax test-lint-style check-lint-style check-shell-parser
 else
 GIT_TEST_CHAIN_LINT = 0
 endif
@@ -162,6 +163,32 @@ test-lint-shell-syntax:
 
 check-shell-parser:
 	@'$(PERL_PATH_SQ)' check-shell-parser.pl
+
+test-lint-style:
+	@'$(PERL_PATH_SQ)' lint-style.pl $(T) $(THELPERS) $(TPERF)
+
+check-lint-style:
+	@rc=0; for t in $(LINT_STYLE_TESTS); do \
+		base=$${t%.test}; \
+		case $$base in \
+		*-fix) \
+			cp "$$t" "$$t.tmp" && \
+			'$(PERL_PATH_SQ)' lint-style.pl --fix "$$t.tmp" >/dev/null 2>&1; \
+			fix_rc=$$?; \
+			if test $$fix_rc != 0; then \
+				echo "FAIL: $$t (--fix exit code $$fix_rc)"; rc=1; \
+			elif ! diff -u "$$base.expect" "$$t.tmp"; then \
+				echo "FAIL: $$t (--fix output)"; rc=1; \
+			fi; \
+			rm -f "$$t.tmp" ;; \
+		*) \
+			if ! '$(PERL_PATH_SQ)' lint-style.pl "$$t" 2>&1 | \
+				diff -u "$$base.expect" -; then \
+				echo "FAIL: $$t"; rc=1; \
+			fi ;; \
+		esac; \
+	done; test $$rc = 0
+
 test-lint-filenames:
 	@# We do *not* pass a glob to ls-files but use grep instead, to catch
 	@# non-ASCII characters (which are quoted within double-quotes)
@@ -188,7 +215,8 @@ perf:
 
 .PHONY: pre-clean $(T) aggregate-results clean valgrind perf \
 	check-chainlint clean-chainlint test-chainlint \
-	check-shell-parser $(UNIT_TESTS)
+	check-shell-parser \
+	check-lint-style test-lint-style $(UNIT_TESTS)
 
 .PHONY: libgit-sys-test libgit-rs-test
 libgit-sys-test:
diff --git a/t/lint-style.pl b/t/lint-style.pl
new file mode 100755
index 0000000000..9268577f9b
--- /dev/null
+++ b/t/lint-style.pl
@@ -0,0 +1,200 @@
+#!/usr/bin/perl
+
+# Check test scripts for style violations that can be detected
+# mechanically, such as using bare 'grep' where test_grep should
+# be used.  Use --fix to automatically apply suggested replacements.
+#
+# Detection uses parsed tokens from the shared shell parser for
+# correct handling of heredocs, $(...), pipes, and quoting.
+# Fixes modify the original file text to preserve formatting.
+
+use strict;
+use warnings;
+use File::Basename;
+# Force LF output so check-lint-style's diff against the
+# pre-committed .expect files works on Windows.
+binmode(STDOUT, ':unix');
+binmode(STDERR, ':unix');
+
+my $fix_mode = 0;
+if (@ARGV && $ARGV[0] eq '--fix') {
+	$fix_mode = 1;
+	shift @ARGV;
+}
+
+# Load the shared shell parser (Lexer, ShellParser, ScriptParser).
+my $_lib = dirname($0) . "/lib-shell-parser.pl";
+$_lib = "./$_lib" unless $_lib =~ m{^/};
+do $_lib or die "$0: failed to load $_lib: $@$!\n";
+
+# LintParser is a subclass of ScriptParser which runs lint rules
+# on each test body.  Per-file state (file name, raw lines, dirty
+# flag) is stored on the instance before calling parse().
+#
+# Subroutines defined below (parse_commands, check_test_grep_negation,
+# etc.) are in package main and called with the main:: prefix.
+# File-scoped lexicals ($fix_mode, $has_fixable, etc.) are visible
+# across packages since 'package' does not introduce a new scope.
+package LintParser;
+our @ISA = ('ScriptParser');
+
+package main;
+
+my $exit_code = 0;
+my $has_fixable = 0;
+
+sub err {
+	my ($file, $lineno, $line, $msg, %opts) = @_;
+	$line =~ s/^\s+//;
+	$line =~ s/\s+$//;
+	$line =~ s/\s+/ /g;
+	my $prefix = ($fix_mode && $opts{fixable}) ? 'fixed' : 'error';
+	print "$file:$lineno: $prefix: $msg: $line\n";
+	$exit_code = 1 unless $fix_mode && $opts{fixable};
+}
+
+# Report a lint violation found by a rule.  In --fix mode, apply
+# the regex substitution on the raw line and report success.
+# Otherwise just report.  Returns 1 if the line was modified.
+sub report_violation {
+	my ($file, $cmd, $line_ref, $match, $fix, $from) = @_;
+	my $lineno = $cmd->{lineno};
+	my $display = join(' ', @{$cmd->{tokens}});
+	$has_fixable++;  # count for the "--fix" hint
+	if ($fix_mode) {
+		if ($$line_ref =~ s/$match/$fix/) {
+			err $file, $lineno, $display,
+				"replace '$from' with '$fix'",
+				fixable => 1;
+			return 1;
+		}
+		err $file, $lineno, $display,
+			"replace '$from' with '$fix' (could not auto-fix)";
+	} else {
+		err $file, $lineno, $display,
+			"replace '$from' with '$fix'";
+	}
+	return 0;
+}
+
+# Split a token stream into commands at &&, ||, ;;, and \n.
+sub parse_commands {
+	my ($content) = @_;
+	my $parser = ShellParser->new(\$content);
+	my @all_tokens = $parser->parse();
+
+	my @commands;
+	my @current;
+	my $lineno = 1;
+
+	for (my $ti = 0; $ti < @all_tokens; $ti++) {
+		my $text = $all_tokens[$ti]->[0];
+		if ($text =~ /^(?:&&|\|\||;;|\n)$/) {
+			if (@current) {
+				push @commands, {
+					tokens => [@current],
+					lineno => $lineno,
+				};
+				@current = ();
+			}
+		} else {
+			$lineno = $all_tokens[$ti]->[3]
+				if !@current && defined $all_tokens[$ti]->[3];
+			push @current, $text;
+		}
+	}
+	if (@current) {
+		push @commands, {
+			tokens => [@current],
+			lineno => $lineno,
+		};
+	}
+	return @commands;
+}
+
+# --- Rule: '! test_grep' should be 'test_grep !' ---
+# Shell-level negation suppresses test_grep's diagnostic output
+# on failure.  Built-in negation preserves it.
+sub check_test_grep_negation {
+	my ($cmd, $file, $line_ref) = @_;
+	my @tokens = @{$cmd->{tokens}};
+	return unless @tokens >= 2 && $tokens[0] eq '!' && $tokens[1] eq 'test_grep';
+
+	return report_violation($file, $cmd, $line_ref,
+		qr/!\s*test_grep/, 'test_grep !', '! test_grep');
+}
+
+# Map parsed commands back to raw file lines for --fix.
+# Detection uses parsed tokens (correct handling of quoting,
+# heredocs, pipes) but fixes must modify the original text
+# to preserve formatting.
+package LintParser;
+
+sub check_test {
+	# Called by ScriptParser::parse_cmd for each test_expect_success
+	# or test_expect_failure block.
+	my $self = shift @_;
+	my $title = ScriptParser::unwrap(shift @_);
+
+	# Two test body formats:
+	#   Quoted:  test_expect_success 'title' '..body..'
+	#   Heredoc: test_expect_success 'title' - <<\EOF
+	#              ..body..
+	#            EOF
+	# For quoted, the body token is the quoted string.
+	# For heredoc, the body token is '-' and the actual
+	# code arrives as the next argument from the Lexer.
+	my $body_token = shift @_;
+	my $lineno_base = $body_token->[3] || 1;
+	my $body = ScriptParser::unwrap($body_token);
+
+	if ($body eq '-') {
+		my $herebody = shift @_;
+		if ($herebody) {
+			$body = $herebody->{content};
+			$lineno_base = $herebody->{start_line} || 1;
+		}
+	}
+	return unless $body;
+
+	# Map each command back to its file line number.
+	# $lineno_base is where the body starts in the file;
+	# $cmd->{lineno} is relative to the body (starting at 1).
+	my $raw_lines = $self->{raw_lines};
+	for my $cmd (main::parse_commands($body)) {
+		my $ln = ($cmd->{lineno} || 0) + $lineno_base - 1;
+		$cmd->{lineno} = $ln;
+		next unless $ln >= 1 && $ln <= @$raw_lines;
+		next if $raw_lines->[$ln - 1] =~ /#.*lint-ok/;
+
+		if (main::check_test_grep_negation($cmd, $self->{file}, \$raw_lines->[$ln - 1])) {
+			$self->{dirty} = 1;
+		}
+	}
+}
+
+package main;
+
+for my $file (@ARGV) {
+	# :unix:crlf strips \r on Windows (same as chainlint.pl)
+	open(my $fh, '<:unix:crlf', $file) or die "$0: $file: $!\n";
+	my @raw_lines = <$fh>;
+	close $fh;
+
+	my $parser = LintParser->new(\join('', @raw_lines));
+	$parser->{file} = $file;
+	$parser->{raw_lines} = \@raw_lines;
+	$parser->{dirty} = 0;
+	$parser->parse();
+
+	if ($fix_mode && $parser->{dirty}) {
+		open(my $out, '>', $file) or die "$0: $file: $!\n";
+		print $out @{$parser->{raw_lines}};
+		close $out;
+	}
+}
+
+if ($has_fixable && !$fix_mode) {
+	print "hint: run with --fix to apply the suggested replacements.\n";
+}
+exit $exit_code;
diff --git a/t/lint-style/heredoc.expect b/t/lint-style/heredoc.expect
new file mode 100644
index 0000000000..7ff6d4a52d
--- /dev/null
+++ b/t/lint-style/heredoc.expect
@@ -0,0 +1,3 @@
+lint-style/heredoc.test:8: error: replace '! test_grep' with 'test_grep !': ! test_grep "after-heredoc-is-caught" actual
+lint-style/heredoc.test:13: error: replace '! test_grep' with 'test_grep !': ! test_grep "not-inside-sed-heredoc" actual
+hint: run with --fix to apply the suggested replacements.
diff --git a/t/lint-style/heredoc.test b/t/lint-style/heredoc.test
new file mode 100644
index 0000000000..4c05831cfb
--- /dev/null
+++ b/t/lint-style/heredoc.test
@@ -0,0 +1,14 @@
+test_expect_success 'greps inside heredocs are skipped' '
+	cat <<-EOF &&
+	grep "inside-strip-tabs" file
+	EOF
+	cat <<-\EOF &&
+	grep "inside-no-expand" file
+	EOF
+	! test_grep "after-heredoc-is-caught" actual
+'
+
+test_expect_success 'sed with << does not start a heredoc' '
+	sed "s/<< foo/bar/" file &&
+	! test_grep "not-inside-sed-heredoc" actual
+'
diff --git a/t/lint-style/test-grep-negation-fix.expect b/t/lint-style/test-grep-negation-fix.expect
new file mode 100644
index 0000000000..28ecde1073
--- /dev/null
+++ b/t/lint-style/test-grep-negation-fix.expect
@@ -0,0 +1,4 @@
+test_expect_success 'negated test_grep' '
+	test_grep ! "pattern" actual &&
+	test_grep ! -i "insensitive" actual
+'
diff --git a/t/lint-style/test-grep-negation-fix.test b/t/lint-style/test-grep-negation-fix.test
new file mode 100644
index 0000000000..571c150031
--- /dev/null
+++ b/t/lint-style/test-grep-negation-fix.test
@@ -0,0 +1,4 @@
+test_expect_success 'negated test_grep' '
+	! test_grep "pattern" actual &&
+	! test_grep -i "insensitive" actual
+'
diff --git a/t/lint-style/test-grep-negation.expect b/t/lint-style/test-grep-negation.expect
new file mode 100644
index 0000000000..1fa9e124aa
--- /dev/null
+++ b/t/lint-style/test-grep-negation.expect
@@ -0,0 +1,3 @@
+lint-style/test-grep-negation.test:2: error: replace '! test_grep' with 'test_grep !': ! test_grep "pattern" actual
+lint-style/test-grep-negation.test:3: error: replace '! test_grep' with 'test_grep !': ! test_grep -i "insensitive" actual
+hint: run with --fix to apply the suggested replacements.
diff --git a/t/lint-style/test-grep-negation.test b/t/lint-style/test-grep-negation.test
new file mode 100644
index 0000000000..571c150031
--- /dev/null
+++ b/t/lint-style/test-grep-negation.test
@@ -0,0 +1,4 @@
+test_expect_success 'negated test_grep' '
+	! test_grep "pattern" actual &&
+	! test_grep -i "insensitive" actual
+'
diff --git a/t/t0031-lockfile-pid.sh b/t/t0031-lockfile-pid.sh
index 8ef87addf5..e9e2f04049 100755
--- a/t/t0031-lockfile-pid.sh
+++ b/t/t0031-lockfile-pid.sh
@@ -29,7 +29,7 @@ test_expect_success 'PID info not shown by default' '
 		test_must_fail git add . 2>err &&
 		# Should not crash, just show normal error without PID
 		test_grep "Unable to create" err &&
-		! test_grep "is held by process" err
+		test_grep ! "is held by process" err
 	)
 '
 
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 73445782e7..3179b4963e 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -720,7 +720,7 @@ test_expect_success '--name-hash-version=2 and --write-bitmap-index are incompat
 
 	# --stdout option silently removes --write-bitmap-index
 	git pack-objects --stdout --all --name-hash-version=2 --write-bitmap-index >out 2>err &&
-	! test_grep "currently, --write-bitmap-index requires --name-hash-version=1" err
+	test_grep ! "currently, --write-bitmap-index requires --name-hash-version=1" err
 '
 
 test_expect_success '--path-walk pack everything' '
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index fa0d4046f7..9154d9795f 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -1175,7 +1175,7 @@ test_expect_success 'load reverse index when missing .idx, .pack' '
 
 test_expect_success 'usage shown without sub-command' '
 	test_expect_code 129 git multi-pack-index 2>err &&
-	! test_grep "unrecognized subcommand" err
+	test_grep ! "unrecognized subcommand" err
 '
 
 test_expect_success 'complains when run outside of a repository' '
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index d7f82e1bec..9db4a76f67 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -664,7 +664,7 @@ test_geometric_repack_needed () {
 	true)
 		test_grep "\[\"git\",\"repack\"," trace2.txt;;
 	false)
-		! test_grep "\[\"git\",\"repack\"," trace2.txt;;
+		test_grep ! "\[\"git\",\"repack\"," trace2.txt;;
 	*)
 		BUG "invalid parameter: $NEEDED";;
 	esac
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH 2/6] t: extract chainlint's parser into shared module
From: Michael Montalbo via GitGitGadget @ 2026-06-04  7:45 UTC (permalink / raw)
  To: git; +Cc: D. Ben Knoble, Eric Sunshine, Michael Montalbo, Michael Montalbo
In-Reply-To: <pull.2135.git.1780559158.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

Move the Lexer, ShellParser, and ScriptParser packages from
chainlint.pl into t/lib-shell-parser.pl so they can be reused by
other tools.  ScriptParser's check_test() is a no-op in the shared
module; callers subclass ScriptParser and override it.

chainlint.pl defines TestParser (&&-chain detection) and
ChainlintParser (a ScriptParser subclass whose check_test runs
TestParser and formats the results).  The shared module is loaded
via do() for portability with minimal Perl installations.

A subsequent commit introduces lint-style.pl which needs the same
shell parser to properly tokenize test scripts.  Sharing the parser
avoids reimplementing heredoc handling, $(...) nesting, pipe
tracking, quoting, and test body extraction.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
 t/chainlint.pl        | 521 ++----------------------------------------
 t/lib-shell-parser.pl | 517 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 530 insertions(+), 508 deletions(-)
 create mode 100644 t/lib-shell-parser.pl

diff --git a/t/chainlint.pl b/t/chainlint.pl
index f0598e3934..49b7cc6cb8 100755
--- a/t/chainlint.pl
+++ b/t/chainlint.pl
@@ -23,458 +23,10 @@ my $jobs = -1;
 my $show_stats;
 my $emit_all;
 
-# Lexer tokenizes POSIX shell scripts. It is roughly modeled after section 2.3
-# "Token Recognition" of POSIX chapter 2 "Shell Command Language". Although
-# similar to lexical analyzers for other languages, this one differs in a few
-# substantial ways due to quirks of the shell command language.
-#
-# For instance, in many languages, newline is just whitespace like space or
-# TAB, but in shell a newline is a command separator, thus a distinct lexical
-# token. A newline is significant and returned as a distinct token even at the
-# end of a shell comment.
-#
-# In other languages, `1+2` would typically be scanned as three tokens
-# (`1`, `+`, and `2`), but in shell it is a single token. However, the similar
-# `1 + 2`, which embeds whitepace, is scanned as three token in shell, as well.
-# In shell, several characters with special meaning lose that meaning when not
-# surrounded by whitespace. For instance, the negation operator `!` is special
-# when standing alone surrounded by whitespace; whereas in `foo!uucp` it is
-# just a plain character in the longer token "foo!uucp". In many other
-# languages, `"string"/foo:'string'` might be scanned as five tokens ("string",
-# `/`, `foo`, `:`, and 'string'), but in shell, it is just a single token.
-#
-# The lexical analyzer for the shell command language is also somewhat unusual
-# in that it recursively invokes the parser to handle the body of `$(...)`
-# expressions which can contain arbitrary shell code. Such expressions may be
-# encountered both inside and outside of double-quoted strings.
-#
-# The lexical analyzer is responsible for consuming shell here-doc bodies which
-# extend from the line following a `<<TAG` operator until a line consisting
-# solely of `TAG`. Here-doc consumption begins when a newline is encountered.
-# It is legal for multiple here-doc `<<TAG` operators to be present on a single
-# line, in which case their bodies must be present one following the next, and
-# are consumed in the (left-to-right) order the `<<TAG` operators appear on the
-# line. A special complication is that the bodies of all here-docs must be
-# consumed when the newline is encountered even if the parse context depth has
-# changed. For instance, in `cat <<A && x=$(cat <<B &&\n`, bodies of here-docs
-# "A" and "B" must be consumed even though "A" was introduced outside the
-# recursive parse context in which "B" was introduced and in which the newline
-# is encountered.
-package Lexer;
-
-sub new {
-	my ($class, $parser, $s) = @_;
-	bless {
-		parser => $parser,
-		buff => $s,
-		lineno => 1,
-		heretags => []
-	} => $class;
-}
-
-sub scan_heredoc_tag {
-	my $self = shift @_;
-	${$self->{buff}} =~ /\G(-?)/gc;
-	my $indented = $1;
-	my $token = $self->scan_token();
-	return "<<$indented" unless $token;
-	my $tag = $token->[0];
-	$tag =~ s/['"\\]//g;
-	$$token[0] = $indented ? "\t$tag" : "$tag";
-	push(@{$self->{heretags}}, $token);
-	return "<<$indented$tag";
-}
-
-sub scan_op {
-	my ($self, $c) = @_;
-	my $b = $self->{buff};
-	return $c unless $$b =~ /\G(.)/sgc;
-	my $cc = $c . $1;
-	return scan_heredoc_tag($self) if $cc eq '<<';
-	return $cc if $cc =~ /^(?:&&|\|\||>>|;;|<&|>&|<>|>\|)$/;
-	pos($$b)--;
-	return $c;
-}
-
-sub scan_sqstring {
-	my $self = shift @_;
-	${$self->{buff}} =~ /\G([^']*'|.*\z)/sgc;
-	my $s = $1;
-	$self->{lineno} += () = $s =~ /\n/sg;
-	return "'" . $s;
-}
-
-sub scan_dqstring {
-	my $self = shift @_;
-	my $b = $self->{buff};
-	my $s = '"';
-	while (1) {
-		# slurp up non-special characters
-		$s .= $1 if $$b =~ /\G([^"\$\\]+)/gc;
-		# handle special characters
-		last unless $$b =~ /\G(.)/sgc;
-		my $c = $1;
-		$s .= '"', last if $c eq '"';
-		$s .= '$' . $self->scan_dollar(), next if $c eq '$';
-		if ($c eq '\\') {
-			$s .= '\\', last unless $$b =~ /\G(.)/sgc;
-			$c = $1;
-			$self->{lineno}++, next if $c eq "\n"; # line splice
-			# backslash escapes only $, `, ", \ in dq-string
-			$s .= '\\' unless $c =~ /^[\$`"\\]$/;
-			$s .= $c;
-			next;
-		}
-		die("internal error scanning dq-string '$c'\n");
-	}
-	$self->{lineno} += () = $s =~ /\n/sg;
-	return $s;
-}
-
-sub scan_balanced {
-	my ($self, $c1, $c2) = @_;
-	my $b = $self->{buff};
-	my $depth = 1;
-	my $s = $c1;
-	while ($$b =~ /\G([^\Q$c1$c2\E]*(?:[\Q$c1$c2\E]|\z))/gc) {
-		$s .= $1;
-		$depth++, next if $s =~ /\Q$c1\E$/;
-		$depth--;
-		last if $depth == 0;
-	}
-	$self->{lineno} += () = $s =~ /\n/sg;
-	return $s;
-}
-
-sub scan_subst {
-	my $self = shift @_;
-	my @tokens = $self->{parser}->parse(qr/^\)$/);
-	$self->{parser}->next_token(); # closing ")"
-	return @tokens;
-}
-
-sub scan_dollar {
-	my $self = shift @_;
-	my $b = $self->{buff};
-	return $self->scan_balanced('(', ')') if $$b =~ /\G\((?=\()/gc; # $((...))
-	return '(' . join(' ', map {$_->[0]} $self->scan_subst()) . ')' if $$b =~ /\G\(/gc; # $(...)
-	return $self->scan_balanced('{', '}') if $$b =~ /\G\{/gc; # ${...}
-	return $1 if $$b =~ /\G(\w+)/gc; # $var
-	return $1 if $$b =~ /\G([@*#?$!0-9-])/gc; # $*, $1, $$, etc.
-	return '';
-}
-
-sub swallow_heredocs {
-	my $self = shift @_;
-	my $b = $self->{buff};
-	my $tags = $self->{heretags};
-	while (my $tag = shift @$tags) {
-		my $start = pos($$b);
-		my $indent = $$tag[0] =~ s/^\t// ? '\\s*' : '';
-		$$b =~ /(?:\G|\n)$indent\Q$$tag[0]\E(?:\n|\z)/gc;
-		if (pos($$b) > $start) {
-			my $body = substr($$b, $start, pos($$b) - $start);
-			$self->{parser}->{heredocs}->{$$tag[0]} = {
-				content => substr($body, 0, length($body) - length($&)),
-				start_line => $self->{lineno},
-		        };
-			$self->{lineno} += () = $body =~ /\n/sg;
-			next;
-		}
-		push(@{$self->{parser}->{problems}}, ['HEREDOC', $tag]);
-		$$b =~ /(?:\G|\n).*\z/gc; # consume rest of input
-		my $body = substr($$b, $start, pos($$b) - $start);
-		$self->{lineno} += () = $body =~ /\n/sg;
-		last;
-	}
-}
-
-sub scan_token {
-	my $self = shift @_;
-	my $b = $self->{buff};
-	my $token = '';
-	my ($start, $startln);
-RESTART:
-	$startln = $self->{lineno};
-	$$b =~ /\G[ \t]+/gc; # skip whitespace (but not newline)
-	$start = pos($$b) || 0;
-	$self->{lineno}++, return ["\n", $start, pos($$b), $startln, $startln] if $$b =~ /\G#[^\n]*(?:\n|\z)/gc; # comment
-	while (1) {
-		# slurp up non-special characters
-		$token .= $1 if $$b =~ /\G([^\\;&|<>(){}'"\$\s]+)/gc;
-		# handle special characters
-		last unless $$b =~ /\G(.)/sgc;
-		my $c = $1;
-		pos($$b)--, last if $c =~ /^[ \t]$/; # whitespace ends token
-		pos($$b)--, last if length($token) && $c =~ /^[;&|<>(){}\n]$/;
-		$token .= $self->scan_sqstring(), next if $c eq "'";
-		$token .= $self->scan_dqstring(), next if $c eq '"';
-		$token .= $c . $self->scan_dollar(), next if $c eq '$';
-		$self->{lineno}++, $self->swallow_heredocs(), $token = $c, last if $c eq "\n";
-		$token = $self->scan_op($c), last if $c =~ /^[;&|<>]$/;
-		$token = $c, last if $c =~ /^[(){}]$/;
-		if ($c eq '\\') {
-			$token .= '\\', last unless $$b =~ /\G(.)/sgc;
-			$c = $1;
-			$self->{lineno}++, next if $c eq "\n" && length($token); # line splice
-			$self->{lineno}++, goto RESTART if $c eq "\n"; # line splice
-			$token .= '\\' . $c;
-			next;
-		}
-		die("internal error scanning character '$c'\n");
-	}
-	return length($token) ? [$token, $start, pos($$b), $startln, $self->{lineno}] : undef;
-}
-
-# ShellParser parses POSIX shell scripts (with minor extensions for Bash). It
-# is a recursive descent parser very roughly modeled after section 2.10 "Shell
-# Grammar" of POSIX chapter 2 "Shell Command Language".
-package ShellParser;
-
-sub new {
-	my ($class, $s) = @_;
-	my $self = bless {
-		buff => [],
-		stop => [],
-		output => [],
-		heredocs => {},
-		insubshell => 0,
-	} => $class;
-	$self->{lexer} = Lexer->new($self, $s);
-	return $self;
-}
-
-sub next_token {
-	my $self = shift @_;
-	return pop(@{$self->{buff}}) if @{$self->{buff}};
-	return $self->{lexer}->scan_token();
-}
-
-sub untoken {
-	my $self = shift @_;
-	push(@{$self->{buff}}, @_);
-}
-
-sub peek {
-	my $self = shift @_;
-	my $token = $self->next_token();
-	return undef unless defined($token);
-	$self->untoken($token);
-	return $token;
-}
-
-sub stop_at {
-	my ($self, $token) = @_;
-	return 1 unless defined($token);
-	my $stop = ${$self->{stop}}[-1] if @{$self->{stop}};
-	return defined($stop) && $token->[0] =~ $stop;
-}
-
-sub expect {
-	my ($self, $expect) = @_;
-	my $token = $self->next_token();
-	return $token if defined($token) && $token->[0] eq $expect;
-	push(@{$self->{output}}, "?!ERR?! expected '$expect' but found '" . (defined($token) ? $token->[0] : "<end-of-input>") . "'\n");
-	$self->untoken($token) if defined($token);
-	return ();
-}
-
-sub optional_newlines {
-	my $self = shift @_;
-	my @tokens;
-	while (my $token = $self->peek()) {
-		last unless $token->[0] eq "\n";
-		push(@tokens, $self->next_token());
-	}
-	return @tokens;
-}
-
-sub parse_group {
-	my $self = shift @_;
-	return ($self->parse(qr/^}$/),
-		$self->expect('}'));
-}
-
-sub parse_subshell {
-	my $self = shift @_;
-	$self->{insubshell}++;
-	my @tokens = ($self->parse(qr/^\)$/),
-		      $self->expect(')'));
-	$self->{insubshell}--;
-	return @tokens;
-}
-
-sub parse_case_pattern {
-	my $self = shift @_;
-	my @tokens;
-	while (defined(my $token = $self->next_token())) {
-		push(@tokens, $token);
-		last if $token->[0] eq ')';
-	}
-	return @tokens;
-}
-
-sub parse_case {
-	my $self = shift @_;
-	my @tokens;
-	push(@tokens,
-	     $self->next_token(), # subject
-	     $self->optional_newlines(),
-	     $self->expect('in'),
-	     $self->optional_newlines());
-	while (1) {
-		my $token = $self->peek();
-		last unless defined($token) && $token->[0] ne 'esac';
-		push(@tokens,
-		     $self->parse_case_pattern(),
-		     $self->optional_newlines(),
-		     $self->parse(qr/^(?:;;|esac)$/)); # item body
-		$token = $self->peek();
-		last unless defined($token) && $token->[0] ne 'esac';
-		push(@tokens,
-		     $self->expect(';;'),
-		     $self->optional_newlines());
-	}
-	push(@tokens, $self->expect('esac'));
-	return @tokens;
-}
-
-sub parse_for {
-	my $self = shift @_;
-	my @tokens;
-	push(@tokens,
-	     $self->next_token(), # variable
-	     $self->optional_newlines());
-	my $token = $self->peek();
-	if (defined($token) && $token->[0] eq 'in') {
-		push(@tokens,
-		     $self->expect('in'),
-		     $self->optional_newlines());
-	}
-	push(@tokens,
-	     $self->parse(qr/^do$/), # items
-	     $self->expect('do'),
-	     $self->optional_newlines(),
-	     $self->parse_loop_body(),
-	     $self->expect('done'));
-	return @tokens;
-}
-
-sub parse_if {
-	my $self = shift @_;
-	my @tokens;
-	while (1) {
-		push(@tokens,
-		     $self->parse(qr/^then$/), # if/elif condition
-		     $self->expect('then'),
-		     $self->optional_newlines(),
-		     $self->parse(qr/^(?:elif|else|fi)$/)); # if/elif body
-		my $token = $self->peek();
-		last unless defined($token) && $token->[0] eq 'elif';
-		push(@tokens, $self->expect('elif'));
-	}
-	my $token = $self->peek();
-	if (defined($token) && $token->[0] eq 'else') {
-		push(@tokens,
-		     $self->expect('else'),
-		     $self->optional_newlines(),
-		     $self->parse(qr/^fi$/)); # else body
-	}
-	push(@tokens, $self->expect('fi'));
-	return @tokens;
-}
-
-sub parse_loop_body {
-	my $self = shift @_;
-	return $self->parse(qr/^done$/);
-}
-
-sub parse_loop {
-	my $self = shift @_;
-	return ($self->parse(qr/^do$/), # condition
-		$self->expect('do'),
-		$self->optional_newlines(),
-		$self->parse_loop_body(),
-		$self->expect('done'));
-}
-
-sub parse_func {
-	my $self = shift @_;
-	return ($self->expect('('),
-		$self->expect(')'),
-		$self->optional_newlines(),
-		$self->parse_cmd()); # body
-}
-
-sub parse_bash_array_assignment {
-	my $self = shift @_;
-	my @tokens = $self->expect('(');
-	while (defined(my $token = $self->next_token())) {
-		push(@tokens, $token);
-		last if $token->[0] eq ')';
-	}
-	return @tokens;
-}
-
-my %compound = (
-	'{' => \&parse_group,
-	'(' => \&parse_subshell,
-	'case' => \&parse_case,
-	'for' => \&parse_for,
-	'if' => \&parse_if,
-	'until' => \&parse_loop,
-	'while' => \&parse_loop);
-
-sub parse_cmd {
-	my $self = shift @_;
-	my $cmd = $self->next_token();
-	return () unless defined($cmd);
-	return $cmd if $cmd->[0] eq "\n";
-
-	my $token;
-	my @tokens = $cmd;
-	if ($cmd->[0] eq '!') {
-		push(@tokens, $self->parse_cmd());
-		return @tokens;
-	} elsif (my $f = $compound{$cmd->[0]}) {
-		push(@tokens, $self->$f());
-	} elsif (defined($token = $self->peek()) && $token->[0] eq '(') {
-		if ($cmd->[0] !~ /\w=$/) {
-			push(@tokens, $self->parse_func());
-			return @tokens;
-		}
-		my @array = $self->parse_bash_array_assignment();
-		$tokens[-1]->[0] .= join(' ', map {$_->[0]} @array);
-		$tokens[-1]->[2] = $array[$#array][2] if @array;
-	}
-
-	while (defined(my $token = $self->next_token())) {
-		$self->untoken($token), last if $self->stop_at($token);
-		push(@tokens, $token);
-		last if $token->[0] =~ /^(?:[;&\n|]|&&|\|\|)$/;
-	}
-	push(@tokens, $self->next_token()) if $tokens[-1]->[0] ne "\n" && defined($token = $self->peek()) && $token->[0] eq "\n";
-	return @tokens;
-}
-
-sub accumulate {
-	my ($self, $tokens, $cmd) = @_;
-	push(@$tokens, @$cmd);
-}
-
-sub parse {
-	my ($self, $stop) = @_;
-	push(@{$self->{stop}}, $stop);
-	goto DONE if $self->stop_at($self->peek());
-	my @tokens;
-	while (my @cmd = $self->parse_cmd()) {
-		$self->accumulate(\@tokens, \@cmd);
-		last if $self->stop_at($self->peek());
-	}
-DONE:
-	pop(@{$self->{stop}});
-	return @tokens;
-}
+use File::Basename;
+my $_lib = dirname($0) . "/lib-shell-parser.pl";
+$_lib = "./$_lib" unless $_lib =~ m{^/};
+do $_lib or die "failed to load $_lib: $@$!\n";
 
 # TestParser is a subclass of ShellParser which, beyond parsing shell script
 # code, is also imbued with semantic knowledge of test construction, and checks
@@ -484,7 +36,7 @@ DONE:
 # scripts in which the tests are defined.
 package TestParser;
 
-use base 'ShellParser';
+our @ISA = ('ShellParser');
 
 sub new {
 	my $class = shift @_;
@@ -578,14 +130,12 @@ DONE:
 	$self->SUPER::accumulate($tokens, $cmd);
 }
 
-# ScriptParser is a subclass of ShellParser which identifies individual test
-# definitions within test scripts, and passes each test body through TestParser
-# to identify possible problems. ShellParser detects test definitions not only
-# at the top-level of test scripts but also within compound commands such as
-# loops and function definitions.
-package ScriptParser;
+# ChainlintParser is a subclass of ScriptParser which checks each test
+# body for broken &&-chains via TestParser, then formats and collects
+# the results.
+package ChainlintParser;
 
-use base 'ShellParser';
+our @ISA = ('ScriptParser');
 
 sub new {
 	my $class = shift @_;
@@ -595,35 +145,6 @@ sub new {
 	return $self;
 }
 
-# extract the raw content of a token, which may be a single string or a
-# composition of multiple strings and non-string character runs; for instance,
-# `"test body"` unwraps to `test body`; `word"a b"42'c d'` to `worda b42c d`
-sub unwrap {
-	my $token = (@_ ? shift @_ : $_)->[0];
-	# simple case: 'sqstring' or "dqstring"
-	return $token if $token =~ s/^'([^']*)'$/$1/;
-	return $token if $token =~ s/^"([^"]*)"$/$1/;
-
-	# composite case
-	my ($s, $q, $escaped);
-	while (1) {
-		# slurp up non-special characters
-		$s .= $1 if $token =~ /\G([^\\'"]*)/gc;
-		# handle special characters
-		last unless $token =~ /\G(.)/sgc;
-		my $c = $1;
-		$q = undef, next if defined($q) && $c eq $q;
-		$q = $c, next if !defined($q) && $c =~ /^['"]$/;
-		if ($c eq '\\') {
-			last unless $token =~ /\G(.)/sgc;
-			$c = $1;
-			$s .= '\\' if $c eq "\n"; # preserve line splice
-		}
-		$s .= $c;
-	}
-	return $s
-}
-
 sub format_problem {
 	local $_ = shift;
 	/^AMP$/ && return "missing '&&'";
@@ -635,10 +156,10 @@ sub format_problem {
 
 sub check_test {
 	my $self = shift @_;
-	my $title = unwrap(shift @_);
+	my $title = ScriptParser::unwrap(shift @_);
 	my $body = shift @_;
 	my $lineno = $body->[3];
-	$body = unwrap($body);
+	$body = ScriptParser::unwrap($body);
 	if ($body eq '-') {
 		my $herebody = shift @_;
 		$body = $herebody->{content};
@@ -673,22 +194,6 @@ sub check_test {
 	push(@{$self->{output}}, "$c->{blue}# chainlint: $title$c->{reset}\n$checked");
 }
 
-sub parse_cmd {
-	my $self = shift @_;
-	my @tokens = $self->SUPER::parse_cmd();
-	return @tokens unless @tokens && $tokens[0]->[0] =~ /^test_expect_(?:success|failure)$/;
-	my $n = $#tokens;
-	$n-- while $n >= 0 && $tokens[$n]->[0] =~ /^(?:[;&\n|]|&&|\|\|)$/;
-	my $herebody;
-	if ($n >= 2 && $tokens[$n-1]->[0] eq '-' && $tokens[$n]->[0] =~ /^<<-?(.+)$/) {
-		$herebody = $self->{heredocs}->{$1};
-		$n--;
-	}
-	$self->check_test($tokens[1], $tokens[2], $herebody) if $n == 2; # title body
-	$self->check_test($tokens[2], $tokens[3], $herebody) if $n > 2;  # prereq title body
-	return @tokens;
-}
-
 # main contains high-level functionality for processing command-line switches,
 # feeding input test scripts to ScriptParser, and reporting results.
 package main;
@@ -803,7 +308,7 @@ sub check_script {
 		}
 		my $s = do { local $/; <$fh> };
 		close($fh);
-		my $parser = ScriptParser->new(\$s);
+		my $parser = ChainlintParser->new(\$s);
 		1 while $parser->parse_cmd();
 		if (@{$parser->{output}}) {
 			my $c = fd_colors(1);
diff --git a/t/lib-shell-parser.pl b/t/lib-shell-parser.pl
new file mode 100644
index 0000000000..1e521a94f8
--- /dev/null
+++ b/t/lib-shell-parser.pl
@@ -0,0 +1,517 @@
+use strict;
+use warnings;
+
+# Copyright (c) 2021-2022 Eric Sunshine <sunshine@sunshineco.com>
+#
+# Lexer tokenizes POSIX shell scripts. It is roughly modeled after section 2.3
+# "Token Recognition" of POSIX chapter 2 "Shell Command Language". Although
+# similar to lexical analyzers for other languages, this one differs in a few
+# substantial ways due to quirks of the shell command language.
+#
+# For instance, in many languages, newline is just whitespace like space or
+# TAB, but in shell a newline is a command separator, thus a distinct lexical
+# token. A newline is significant and returned as a distinct token even at the
+# end of a shell comment.
+#
+# In other languages, `1+2` would typically be scanned as three tokens
+# (`1`, `+`, and `2`), but in shell it is a single token. However, the similar
+# `1 + 2`, which embeds whitepace, is scanned as three token in shell, as well.
+# In shell, several characters with special meaning lose that meaning when not
+# surrounded by whitespace. For instance, the negation operator `!` is special
+# when standing alone surrounded by whitespace; whereas in `foo!uucp` it is
+# just a plain character in the longer token "foo!uucp". In many other
+# languages, `"string"/foo:'string'` might be scanned as five tokens ("string",
+# `/`, `foo`, `:`, and 'string'), but in shell, it is just a single token.
+#
+# The lexical analyzer for the shell command language is also somewhat unusual
+# in that it recursively invokes the parser to handle the body of `$(...)`
+# expressions which can contain arbitrary shell code. Such expressions may be
+# encountered both inside and outside of double-quoted strings.
+#
+# The lexical analyzer is responsible for consuming shell here-doc bodies which
+# extend from the line following a `<<TAG` operator until a line consisting
+# solely of `TAG`. Here-doc consumption begins when a newline is encountered.
+# It is legal for multiple here-doc `<<TAG` operators to be present on a single
+# line, in which case their bodies must be present one following the next, and
+# are consumed in the (left-to-right) order the `<<TAG` operators appear on the
+# line. A special complication is that the bodies of all here-docs must be
+# consumed when the newline is encountered even if the parse context depth has
+# changed. For instance, in `cat <<A && x=$(cat <<B &&\n`, bodies of here-docs
+# "A" and "B" must be consumed even though "A" was introduced outside the
+# recursive parse context in which "B" was introduced and in which the newline
+# is encountered.
+package Lexer;
+
+sub new {
+	my ($class, $parser, $s) = @_;
+	bless {
+		parser => $parser,
+		buff => $s,
+		lineno => 1,
+		heretags => []
+	} => $class;
+}
+
+sub scan_heredoc_tag {
+	my $self = shift @_;
+	${$self->{buff}} =~ /\G(-?)/gc;
+	my $indented = $1;
+	my $token = $self->scan_token();
+	return "<<$indented" unless $token;
+	my $tag = $token->[0];
+	$tag =~ s/['"\\]//g;
+	$$token[0] = $indented ? "\t$tag" : "$tag";
+	push(@{$self->{heretags}}, $token);
+	return "<<$indented$tag";
+}
+
+sub scan_op {
+	my ($self, $c) = @_;
+	my $b = $self->{buff};
+	return $c unless $$b =~ /\G(.)/sgc;
+	my $cc = $c . $1;
+	return scan_heredoc_tag($self) if $cc eq '<<';
+	return $cc if $cc =~ /^(?:&&|\|\||>>|;;|<&|>&|<>|>\|)$/;
+	pos($$b)--;
+	return $c;
+}
+
+sub scan_sqstring {
+	my $self = shift @_;
+	${$self->{buff}} =~ /\G([^']*'|.*\z)/sgc;
+	my $s = $1;
+	$self->{lineno} += () = $s =~ /\n/sg;
+	return "'" . $s;
+}
+
+sub scan_dqstring {
+	my $self = shift @_;
+	my $b = $self->{buff};
+	my $s = '"';
+	while (1) {
+		# slurp up non-special characters
+		$s .= $1 if $$b =~ /\G([^"\$\\]+)/gc;
+		# handle special characters
+		last unless $$b =~ /\G(.)/sgc;
+		my $c = $1;
+		$s .= '"', last if $c eq '"';
+		$s .= '$' . $self->scan_dollar(), next if $c eq '$';
+		if ($c eq '\\') {
+			$s .= '\\', last unless $$b =~ /\G(.)/sgc;
+			$c = $1;
+			$self->{lineno}++, next if $c eq "\n"; # line splice
+			# backslash escapes only $, `, ", \ in dq-string
+			$s .= '\\' unless $c =~ /^[\$`"\\]$/;
+			$s .= $c;
+			next;
+		}
+		die("internal error scanning dq-string '$c'\n");
+	}
+	$self->{lineno} += () = $s =~ /\n/sg;
+	return $s;
+}
+
+sub scan_balanced {
+	my ($self, $c1, $c2) = @_;
+	my $b = $self->{buff};
+	my $depth = 1;
+	my $s = $c1;
+	while ($$b =~ /\G([^\Q$c1$c2\E]*(?:[\Q$c1$c2\E]|\z))/gc) {
+		$s .= $1;
+		$depth++, next if $s =~ /\Q$c1\E$/;
+		$depth--;
+		last if $depth == 0;
+	}
+	$self->{lineno} += () = $s =~ /\n/sg;
+	return $s;
+}
+
+sub scan_subst {
+	my $self = shift @_;
+	my @tokens = $self->{parser}->parse(qr/^\)$/);
+	$self->{parser}->next_token(); # closing ")"
+	return @tokens;
+}
+
+sub scan_dollar {
+	my $self = shift @_;
+	my $b = $self->{buff};
+	return $self->scan_balanced('(', ')') if $$b =~ /\G\((?=\()/gc; # $((...))
+	return '(' . join(' ', map {$_->[0]} $self->scan_subst()) . ')' if $$b =~ /\G\(/gc; # $(...)
+	return $self->scan_balanced('{', '}') if $$b =~ /\G\{/gc; # ${...}
+	return $1 if $$b =~ /\G(\w+)/gc; # $var
+	return $1 if $$b =~ /\G([@*#?$!0-9-])/gc; # $*, $1, $$, etc.
+	return '';
+}
+
+sub swallow_heredocs {
+	my $self = shift @_;
+	my $b = $self->{buff};
+	my $tags = $self->{heretags};
+	while (my $tag = shift @$tags) {
+		my $start = pos($$b);
+		my $indent = $$tag[0] =~ s/^\t// ? '\\s*' : '';
+		$$b =~ /(?:\G|\n)$indent\Q$$tag[0]\E(?:\n|\z)/gc;
+		if (pos($$b) > $start) {
+			my $body = substr($$b, $start, pos($$b) - $start);
+			$self->{parser}->{heredocs}->{$$tag[0]} = {
+				content => substr($body, 0, length($body) - length($&)),
+				start_line => $self->{lineno},
+		        };
+			$self->{lineno} += () = $body =~ /\n/sg;
+			next;
+		}
+		push(@{$self->{parser}->{problems}}, ['HEREDOC', $tag]);
+		$$b =~ /(?:\G|\n).*\z/gc; # consume rest of input
+		my $body = substr($$b, $start, pos($$b) - $start);
+		$self->{lineno} += () = $body =~ /\n/sg;
+		last;
+	}
+}
+
+sub scan_token {
+	my $self = shift @_;
+	my $b = $self->{buff};
+	my $token = '';
+	my ($start, $startln);
+RESTART:
+	$startln = $self->{lineno};
+	$$b =~ /\G[ \t]+/gc; # skip whitespace (but not newline)
+	$start = pos($$b) || 0;
+	$self->{lineno}++, return ["\n", $start, pos($$b), $startln, $startln] if $$b =~ /\G#[^\n]*(?:\n|\z)/gc; # comment
+	while (1) {
+		# slurp up non-special characters
+		$token .= $1 if $$b =~ /\G([^\\;&|<>(){}'"\$\s]+)/gc;
+		# handle special characters
+		last unless $$b =~ /\G(.)/sgc;
+		my $c = $1;
+		pos($$b)--, last if $c =~ /^[ \t]$/; # whitespace ends token
+		pos($$b)--, last if length($token) && $c =~ /^[;&|<>(){}\n]$/;
+		$token .= $self->scan_sqstring(), next if $c eq "'";
+		$token .= $self->scan_dqstring(), next if $c eq '"';
+		$token .= $c . $self->scan_dollar(), next if $c eq '$';
+		$self->{lineno}++, $self->swallow_heredocs(), $token = $c, last if $c eq "\n";
+		$token = $self->scan_op($c), last if $c =~ /^[;&|<>]$/;
+		$token = $c, last if $c =~ /^[(){}]$/;
+		if ($c eq '\\') {
+			$token .= '\\', last unless $$b =~ /\G(.)/sgc;
+			$c = $1;
+			$self->{lineno}++, next if $c eq "\n" && length($token); # line splice
+			$self->{lineno}++, goto RESTART if $c eq "\n"; # line splice
+			$token .= '\\' . $c;
+			next;
+		}
+		die("internal error scanning character '$c'\n");
+	}
+	return length($token) ? [$token, $start, pos($$b), $startln, $self->{lineno}] : undef;
+}
+
+# ShellParser parses POSIX shell scripts (with minor extensions for Bash). It
+# is a recursive descent parser very roughly modeled after section 2.10 "Shell
+# Grammar" of POSIX chapter 2 "Shell Command Language".
+package ShellParser;
+
+sub new {
+	my ($class, $s) = @_;
+	my $self = bless {
+		buff => [],
+		stop => [],
+		output => [],
+		heredocs => {},
+		insubshell => 0,
+	} => $class;
+	$self->{lexer} = Lexer->new($self, $s);
+	return $self;
+}
+
+sub next_token {
+	my $self = shift @_;
+	return pop(@{$self->{buff}}) if @{$self->{buff}};
+	return $self->{lexer}->scan_token();
+}
+
+sub untoken {
+	my $self = shift @_;
+	push(@{$self->{buff}}, @_);
+}
+
+sub peek {
+	my $self = shift @_;
+	my $token = $self->next_token();
+	return undef unless defined($token);
+	$self->untoken($token);
+	return $token;
+}
+
+sub stop_at {
+	my ($self, $token) = @_;
+	return 1 unless defined($token);
+	my $stop = ${$self->{stop}}[-1] if @{$self->{stop}};
+	return defined($stop) && $token->[0] =~ $stop;
+}
+
+sub expect {
+	my ($self, $expect) = @_;
+	my $token = $self->next_token();
+	return $token if defined($token) && $token->[0] eq $expect;
+	push(@{$self->{output}}, "?!ERR?! expected '$expect' but found '" . (defined($token) ? $token->[0] : "<end-of-input>") . "'\n");
+	$self->untoken($token) if defined($token);
+	return ();
+}
+
+sub optional_newlines {
+	my $self = shift @_;
+	my @tokens;
+	while (my $token = $self->peek()) {
+		last unless $token->[0] eq "\n";
+		push(@tokens, $self->next_token());
+	}
+	return @tokens;
+}
+
+sub parse_group {
+	my $self = shift @_;
+	return ($self->parse(qr/^}$/),
+		$self->expect('}'));
+}
+
+sub parse_subshell {
+	my $self = shift @_;
+	$self->{insubshell}++;
+	my @tokens = ($self->parse(qr/^\)$/),
+		      $self->expect(')'));
+	$self->{insubshell}--;
+	return @tokens;
+}
+
+sub parse_case_pattern {
+	my $self = shift @_;
+	my @tokens;
+	while (defined(my $token = $self->next_token())) {
+		push(@tokens, $token);
+		last if $token->[0] eq ')';
+	}
+	return @tokens;
+}
+
+sub parse_case {
+	my $self = shift @_;
+	my @tokens;
+	push(@tokens,
+	     $self->next_token(), # subject
+	     $self->optional_newlines(),
+	     $self->expect('in'),
+	     $self->optional_newlines());
+	while (1) {
+		my $token = $self->peek();
+		last unless defined($token) && $token->[0] ne 'esac';
+		push(@tokens,
+		     $self->parse_case_pattern(),
+		     $self->optional_newlines(),
+		     $self->parse(qr/^(?:;;|esac)$/)); # item body
+		$token = $self->peek();
+		last unless defined($token) && $token->[0] ne 'esac';
+		push(@tokens,
+		     $self->expect(';;'),
+		     $self->optional_newlines());
+	}
+	push(@tokens, $self->expect('esac'));
+	return @tokens;
+}
+
+sub parse_for {
+	my $self = shift @_;
+	my @tokens;
+	push(@tokens,
+	     $self->next_token(), # variable
+	     $self->optional_newlines());
+	my $token = $self->peek();
+	if (defined($token) && $token->[0] eq 'in') {
+		push(@tokens,
+		     $self->expect('in'),
+		     $self->optional_newlines());
+	}
+	push(@tokens,
+	     $self->parse(qr/^do$/), # items
+	     $self->expect('do'),
+	     $self->optional_newlines(),
+	     $self->parse_loop_body(),
+	     $self->expect('done'));
+	return @tokens;
+}
+
+sub parse_if {
+	my $self = shift @_;
+	my @tokens;
+	while (1) {
+		push(@tokens,
+		     $self->parse(qr/^then$/), # if/elif condition
+		     $self->expect('then'),
+		     $self->optional_newlines(),
+		     $self->parse(qr/^(?:elif|else|fi)$/)); # if/elif body
+		my $token = $self->peek();
+		last unless defined($token) && $token->[0] eq 'elif';
+		push(@tokens, $self->expect('elif'));
+	}
+	my $token = $self->peek();
+	if (defined($token) && $token->[0] eq 'else') {
+		push(@tokens,
+		     $self->expect('else'),
+		     $self->optional_newlines(),
+		     $self->parse(qr/^fi$/)); # else body
+	}
+	push(@tokens, $self->expect('fi'));
+	return @tokens;
+}
+
+sub parse_loop_body {
+	my $self = shift @_;
+	return $self->parse(qr/^done$/);
+}
+
+sub parse_loop {
+	my $self = shift @_;
+	return ($self->parse(qr/^do$/), # condition
+		$self->expect('do'),
+		$self->optional_newlines(),
+		$self->parse_loop_body(),
+		$self->expect('done'));
+}
+
+sub parse_func {
+	my $self = shift @_;
+	return ($self->expect('('),
+		$self->expect(')'),
+		$self->optional_newlines(),
+		$self->parse_cmd()); # body
+}
+
+sub parse_bash_array_assignment {
+	my $self = shift @_;
+	my @tokens = $self->expect('(');
+	while (defined(my $token = $self->next_token())) {
+		push(@tokens, $token);
+		last if $token->[0] eq ')';
+	}
+	return @tokens;
+}
+
+my %compound = (
+	'{' => \&parse_group,
+	'(' => \&parse_subshell,
+	'case' => \&parse_case,
+	'for' => \&parse_for,
+	'if' => \&parse_if,
+	'until' => \&parse_loop,
+	'while' => \&parse_loop);
+
+sub parse_cmd {
+	my $self = shift @_;
+	my $cmd = $self->next_token();
+	return () unless defined($cmd);
+	return $cmd if $cmd->[0] eq "\n";
+
+	my $token;
+	my @tokens = $cmd;
+	if ($cmd->[0] eq '!') {
+		push(@tokens, $self->parse_cmd());
+		return @tokens;
+	} elsif (my $f = $compound{$cmd->[0]}) {
+		push(@tokens, $self->$f());
+	} elsif (defined($token = $self->peek()) && $token->[0] eq '(') {
+		if ($cmd->[0] !~ /\w=$/) {
+			push(@tokens, $self->parse_func());
+			return @tokens;
+		}
+		my @array = $self->parse_bash_array_assignment();
+		$tokens[-1]->[0] .= join(' ', map {$_->[0]} @array);
+		$tokens[-1]->[2] = $array[$#array][2] if @array;
+	}
+
+	while (defined(my $token = $self->next_token())) {
+		$self->untoken($token), last if $self->stop_at($token);
+		push(@tokens, $token);
+		last if $token->[0] =~ /^(?:[;&\n|]|&&|\|\|)$/;
+	}
+	push(@tokens, $self->next_token()) if $tokens[-1]->[0] ne "\n" && defined($token = $self->peek()) && $token->[0] eq "\n";
+	return @tokens;
+}
+
+sub accumulate {
+	my ($self, $tokens, $cmd) = @_;
+	push(@$tokens, @$cmd);
+}
+
+sub parse {
+	my ($self, $stop) = @_;
+	push(@{$self->{stop}}, $stop);
+	goto DONE if $self->stop_at($self->peek());
+	my @tokens;
+	while (my @cmd = $self->parse_cmd()) {
+		$self->accumulate(\@tokens, \@cmd);
+		last if $self->stop_at($self->peek());
+	}
+DONE:
+	pop(@{$self->{stop}});
+	return @tokens;
+}
+
+# ScriptParser is a subclass of ShellParser which identifies individual test
+# definitions within test scripts and calls check_test() for each test body
+# found.  Callers subclass ScriptParser and override check_test() to
+# implement specific checks (e.g. chainlint checks &&-chains, lint-style
+# checks grep usage).
+package ScriptParser;
+
+our @ISA = ('ShellParser');
+
+# extract the raw content of a token, which may be a single string or a
+# composition of multiple strings and non-string character runs; for instance,
+# `"test body"` unwraps to `test body`; `word"a b"42'c d'` to `worda b42c d`
+sub unwrap {
+	my $token = (@_ ? shift @_ : $_)->[0];
+	# simple case: 'sqstring' or "dqstring"
+	return $token if $token =~ s/^'([^']*)'$/$1/;
+	return $token if $token =~ s/^"([^"]*)"$/$1/;
+
+	# composite case
+	my ($s, $q, $escaped);
+	while (1) {
+		# slurp up non-special characters
+		$s .= $1 if $token =~ /\G([^\\'"]*)/gc;
+		# handle special characters
+		last unless $token =~ /\G(.)/sgc;
+		my $c = $1;
+		$q = undef, next if defined($q) && $c eq $q;
+		$q = $c, next if !defined($q) && $c =~ /^['"]$/;
+		if ($c eq '\\') {
+			last unless $token =~ /\G(.)/sgc;
+			$c = $1;
+			$s .= '\\' if $c eq "\n"; # preserve line splice
+		}
+		$s .= $c;
+	}
+	return $s
+}
+
+sub check_test {
+	# no-op; subclasses override to implement specific checks
+}
+
+sub parse_cmd {
+	my $self = shift @_;
+	my @tokens = $self->SUPER::parse_cmd();
+	return @tokens unless @tokens && $tokens[0]->[0] =~ /^test_expect_(?:success|failure)$/;
+	my $n = $#tokens;
+	$n-- while $n >= 0 && $tokens[$n]->[0] =~ /^(?:[;&\n|]|&&|\|\|)$/;
+	my $herebody;
+	if ($n >= 2 && $tokens[$n-1]->[0] eq '-' && $tokens[$n]->[0] =~ /^<<-?(.+)$/) {
+		$herebody = $self->{heredocs}->{$1};
+		$n--;
+	}
+	$self->check_test($tokens[1], $tokens[2], $herebody) if $n == 2; # title body
+	$self->check_test($tokens[2], $tokens[3], $herebody) if $n > 2;  # prereq title body
+	return @tokens;
+}
+
+1;
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH 3/6] t: fix Lexer line count for $() inside double-quoted strings
From: Michael Montalbo via GitGitGadget @ 2026-06-04  7:45 UTC (permalink / raw)
  To: git; +Cc: D. Ben Knoble, Eric Sunshine, Michael Montalbo, Michael Montalbo
In-Reply-To: <pull.2135.git.1780559158.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

scan_dqstring's post-loop newline counter re-counts newlines that
were already counted during recursive parsing of $() bodies.  This
happens because scan_dollar's returned text can contain newlines
(from token text of multi-line strings and from \n command separator
tokens), and the catch-all counter at the end of scan_dqstring
counts all of them again.

Fix this by counting newlines inline as non-special characters are
consumed, and removing the post-loop catch-all.  Each newline is
now counted exactly once: literal newlines at the inline match,
line splices at the \<newline> handler, and $() newlines by
scan_token during the recursive parse.

This does not affect chainlint's output because chainlint annotates
the original body text using byte offsets, not token line numbers.
It does matter for tools like lint-style.pl (introduced in a
subsequent commit) that use token line numbers to locate and fix
specific lines in the original file.

Add check-shell-parser.pl to verify that the Lexer reports correct
line numbers after multi-line $() in double-quoted strings.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
 t/Makefile              |  7 +++--
 t/check-shell-parser.pl | 58 +++++++++++++++++++++++++++++++++++++++++
 t/lib-shell-parser.pl   | 11 +++++---
 3 files changed, 71 insertions(+), 5 deletions(-)
 create mode 100644 t/check-shell-parser.pl

diff --git a/t/Makefile b/t/Makefile
index ab8a5b54aa..25f923fed9 100644
--- a/t/Makefile
+++ b/t/Makefile
@@ -139,7 +139,7 @@ check-meson:
 test-lint: test-lint-duplicates test-lint-executable \
 	test-lint-filenames
 ifneq ($(PERL_PATH),)
-test-lint: test-lint-shell-syntax
+test-lint: test-lint-shell-syntax check-shell-parser
 else
 GIT_TEST_CHAIN_LINT = 0
 endif
@@ -160,6 +160,8 @@ test-lint-executable:
 test-lint-shell-syntax:
 	@'$(PERL_PATH_SQ)' check-non-portable-shell.pl $(T) $(THELPERS) $(TPERF)
 
+check-shell-parser:
+	@'$(PERL_PATH_SQ)' check-shell-parser.pl
 test-lint-filenames:
 	@# We do *not* pass a glob to ls-files but use grep instead, to catch
 	@# non-ASCII characters (which are quoted within double-quotes)
@@ -185,7 +187,8 @@ perf:
 	$(MAKE) -C perf/ all
 
 .PHONY: pre-clean $(T) aggregate-results clean valgrind perf \
-	check-chainlint clean-chainlint test-chainlint $(UNIT_TESTS)
+	check-chainlint clean-chainlint test-chainlint \
+	check-shell-parser $(UNIT_TESTS)
 
 .PHONY: libgit-sys-test libgit-rs-test
 libgit-sys-test:
diff --git a/t/check-shell-parser.pl b/t/check-shell-parser.pl
new file mode 100644
index 0000000000..7d4ba6da7f
--- /dev/null
+++ b/t/check-shell-parser.pl
@@ -0,0 +1,58 @@
+#!/usr/bin/perl
+
+# Tests for the shared shell parser (lib-shell-parser.pl).
+
+use strict;
+use warnings;
+use File::Basename;
+
+my $_lib = dirname($0) . "/lib-shell-parser.pl";
+$_lib = "./$_lib" unless $_lib =~ m{^/};
+do $_lib or die "$0: failed to load $_lib: $@$!\n";
+
+my $rc = 0;
+
+sub check {
+	my ($desc, $body, $want_token, $want_line) = @_;
+	my $parser = ShellParser->new(\$body);
+	my @tokens = $parser->parse();
+	for my $t (reverse @tokens) {
+		next unless $t->[0] eq $want_token && defined $t->[3];
+		if ($t->[3] != $want_line) {
+			print STDERR "FAIL: $desc: " .
+				"'$want_token' at line $t->[3], " .
+				"expected line $want_line\n";
+			$rc = 1;
+		}
+		return;
+	}
+	print STDERR "FAIL: $desc: token '$want_token' not found\n";
+	$rc = 1;
+}
+
+# Multi-line $() inside a dq-string: MARKER should be at line 3.
+check('dq-string with multi-line $()', <<'BODY', 'MARKER', 3);
+	x="$(echo one
+	echo two)" &&
+	MARKER here
+BODY
+
+# Two multi-line $() substitutions: verifies drift does not accumulate.
+# MARKER should be at line 5.
+check('two dq-string $()', <<'BODY', 'MARKER', 5);
+	x="$(echo a
+	b)" &&
+	y="$(echo c
+	d)" &&
+	MARKER here
+BODY
+
+# $() outside a dq-string: no double-counting either way.
+# MARKER should be at line 3.
+check('bare $() spanning lines', <<'BODY', 'MARKER', 3);
+	x=$(echo one
+	echo two) &&
+	MARKER here
+BODY
+
+exit $rc;
diff --git a/t/lib-shell-parser.pl b/t/lib-shell-parser.pl
index 1e521a94f8..fa9b44d6ec 100644
--- a/t/lib-shell-parser.pl
+++ b/t/lib-shell-parser.pl
@@ -89,8 +89,14 @@ sub scan_dqstring {
 	my $b = $self->{buff};
 	my $s = '"';
 	while (1) {
-		# slurp up non-special characters
-		$s .= $1 if $$b =~ /\G([^"\$\\]+)/gc;
+		# slurp up non-special characters; count newlines
+		# inline so we don't need a catch-all counter that
+		# would miscount newlines from recursive $() parsing
+		if ($$b =~ /\G([^"\$\\]+)/gc) {
+			my $chunk = $1;
+			$self->{lineno} += () = $chunk =~ /\n/sg;
+			$s .= $chunk;
+		}
 		# handle special characters
 		last unless $$b =~ /\G(.)/sgc;
 		my $c = $1;
@@ -107,7 +113,6 @@ sub scan_dqstring {
 		}
 		die("internal error scanning dq-string '$c'\n");
 	}
-	$self->{lineno} += () = $s =~ /\n/sg;
 	return $s;
 }
 
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH 1/6] t/README: document test_grep helper
From: Michael Montalbo via GitGitGadget @ 2026-06-04  7:45 UTC (permalink / raw)
  To: git; +Cc: D. Ben Knoble, Eric Sunshine, Michael Montalbo, Michael Montalbo
In-Reply-To: <pull.2135.git.1780559158.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

test_grep is a wrapper around grep for test assertions that prints
the file contents on failure for easier debugging.  It also accepts
'!' as its first argument for negation, which preserves the
diagnostic output that '! test_grep' would suppress.

Despite being widely used (and the preferred replacement for bare
grep in assertions), test_grep has no entry in t/README alongside
the other documented helpers like test_cmp and test_line_count.
Add one.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
 t/README | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/t/README b/t/README
index adbbd9acf4..c12a1c317a 100644
--- a/t/README
+++ b/t/README
@@ -1039,6 +1039,27 @@ see test-lib-functions.sh for the full list and their options.

    Check whether a file has the length it is expected to.

+ - test_grep [!] [<grep-options>] <pattern> <file>
+
+   Check whether <file> contains a line matching <pattern>, or
+   with '!' that no line matches.  Use this instead of bare
+   'grep <pattern> <file>' in test assertions.  On failure,
+   test_grep prints the contents of <file> for easier debugging,
+   whereas a bare 'grep' would fail silently.
+
+   For negation, pass '!' as the first argument:
+
+	test_grep ! "^diff --git" actual
+
+   Do not negate by writing '! test_grep', as that suppresses the
+   diagnostic output.
+
+   test_grep should only be used as a test assertion.  When grep
+   is used as a data filter (e.g. 'grep -v "^index" actual >filtered')
+   or inside a command substitution (e.g. '$(grep -c ...)'), plain
+   'grep' is the right choice because the exit code is not the
+   assertion itself.
+
  - test_path_is_file <path>
    test_path_is_dir <path>
    test_path_is_missing <path>
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH 0/6] t: add lint-style.pl and convert grep to test_grep
From: Michael Montalbo via GitGitGadget @ 2026-06-04  7:45 UTC (permalink / raw)
  To: git; +Cc: D. Ben Knoble, Eric Sunshine, Michael Montalbo

The test suite has a test_grep wrapper that prints file contents on
assertion failure, making debugging easier. Many tests still use bare 'grep'
for assertions, which silently swallows context on failure.

This series adds a lint tool (lint-style.pl) to mechanically detect and
convert these, then applies it across the test suite.

The tool reuses the shared shell parser (t/lib-shell-parser.pl) rather than
reimplementing shell parsing. This gives us proper handling of heredocs,
$(...), pipes, quoting, case patterns, and control flow structures for free.
The only new parsing logic is the grep assertion classifier, which works on
the already-parsed token stream.

The classifier distinguishes grep used as an assertion (checking exit status
against PATTERN + FILE) from grep used as a filter (pipes, redirects, $(),
control flow conditions). Only assertions are converted. Greps at assertion
level with no file argument are flagged as likely bugs; three pre-existing
instances were found and fixed in patch 5.

The --fix mode converts mechanically: 'grep' becomes 'test_grep', '! grep'
becomes 'test_grep !'.

Structure:

1/6 t/README: document test_grep helper 2/6 t: extract chainlint's parser
into shared module 3/6 t: fix Lexer line count for $() inside double-quoted
strings 4/6 t: add lint-style.pl with test_grep negation rule 5/6 t: fix
grep assertions missing file arguments 6/6 t: lint and convert grep
assertions to test_grep

Patch 2 extracts chainlint.pl's Lexer, ShellParser, and ScriptParser into
t/lib-shell-parser.pl. ScriptParser's check_test() becomes a no-op in the
module; chainlint.pl defines ChainlintParser (a ScriptParser subclass that
runs TestParser for &&-chain detection), and lint-style.pl defines
LintParser (a ScriptParser subclass that runs grep lint rules).

Patch 3 fixes a pre-existing bug in scan_dqstring where the post-loop
newline counter re-counted newlines that were already counted during
recursive $() parsing. The fix counts newlines inline as non-special
characters are consumed, removing the catch-all counter entirely. chainlint
is unaffected (it uses byte offsets), but lint-style.pl needs accurate token
line numbers to locate and fix specific lines in the original file.

Patch 4 introduces the lint-style.pl framework (LintParser subclass,
Makefile targets, fixture infrastructure) with a small, complete rule (!
test_grep -> test_grep !) so reviewers can see the machinery in action
before the bulk conversion.

Patch 5 fixes three test bugs where grep assertions were missing their file
arguments, causing them to pass vacuously (all three pass with the corrected
arguments). These were independently discovered by the missing-file
detection rule introduced in patch 6.

Patch 6 adds the main rule, its fixtures, and the mechanical conversion of
~2800 assertions across ~340 files, including sourced test fragments in
t/t5411/ and heredoc test bodies in t/t5564/.

To verify the conversion (patch 6 adds both the rule and the mechanical
conversion in the same commit, so apply the full series and re-run --fix to
confirm it produces no further changes):

git checkout && perl t/lint-style.pl --fix t/t*.sh t/test-lib*.sh t/lib-.sh
t/-tests.sh t/perf/.sh t/t5411/.sh

As an independent completeness check:

grep -rn '^\s*!\sgrep\s' t/t*.sh t/lib-.sh t/-tests.sh
t/test-lib*.sh t/perf/*.sh | grep -v test_grep | grep -v lint-ok | grep -v
'#.*grep'

Future rules:

The framework is designed to make adding new rules cheap. Each rule is a
function that receives parsed commands and the token stream. The harness
handles tokenization, line mapping, --fix, and fixture testing. Three
natural follow-ups:

 * 'test_must_fail grep': test_must_fail distinguishes expected failures
   from crashes (signals), which only makes sense for git commands. Using it
   with grep or test_grep should be '! grep' or 'test_grep !' instead.

 * '! git cmd': should use test_must_fail, which distinguishes controlled
   failures from crashes. The README explicitly documents this as a "don't".

 * 'test -f' / 'test -d' / 'test -e': should use test_path_is_file,
   test_path_is_dir, test_path_exists. The helpers print the actual
   directory listing on failure; bare 'test' just says "failed".

Known limitations:

 * One grep in t/t1400 asserts against .git/packed-refs which does not exist
   on the reftable backend; suppressed with '# lint-ok'.

 * One grep in t/t7450 checks a path inside a clone that may have failed
   (MINGW-only test); the file may not exist, so test_grep's existence check
   would trip; suppressed with '# lint-ok'.

 * Two greps in t/t3901 inside case branches that inherit piped stdin from
   two lines above are suppressed with '# lint-ok'.

 * One grep in t/t6437 uses glob expansion (grep -q content *) which breaks
   test_grep's file check; suppressed with '# lint-ok'.

 * One grep in t/t7527 captures $? for later use rather than asserting
   inline; suppressed with '# lint-ok'.

Michael Montalbo (6):
  t/README: document test_grep helper
  t: extract chainlint's parser into shared module
  t: fix Lexer line count for $() inside double-quoted strings
  t: add lint-style.pl with test_grep negation rule
  t: fix grep assertions missing file arguments
  t: lint and convert grep assertions to test_grep

 t/.gitattributes                             |   2 +
 t/Makefile                                   |  37 +-
 t/README                                     |  21 +
 t/chainlint.pl                               | 521 +-----------------
 t/check-shell-parser.pl                      |  58 +++
 t/for-each-ref-tests.sh                      |  12 +-
 t/lib-bitmap.sh                              |  12 +-
 t/lib-bundle-uri-protocol.sh                 |  26 +-
 t/lib-httpd.sh                               |   2 +-
 t/lib-shell-parser.pl                        | 522 +++++++++++++++++++
 t/lint-style.pl                              | 425 +++++++++++++++
 t/lint-style/grep-assert.expect              |  13 +
 t/lint-style/grep-assert.test                |  24 +
 t/lint-style/grep-fix.expect                 |  16 +
 t/lint-style/grep-fix.test                   |  16 +
 t/lint-style/grep-missing-file.expect        |   4 +
 t/lint-style/grep-missing-file.test          |   6 +
 t/lint-style/grep-negated.expect             |   5 +
 t/lint-style/grep-negated.test               |   9 +
 t/lint-style/grep-not-assert.expect          |   0
 t/lint-style/grep-not-assert.test            |  43 ++
 t/lint-style/heredoc.expect                  |   3 +
 t/lint-style/heredoc.test                    |  14 +
 t/lint-style/test-grep-negation-fix.expect   |   4 +
 t/lint-style/test-grep-negation-fix.test     |   4 +
 t/lint-style/test-grep-negation.expect       |   3 +
 t/lint-style/test-grep-negation.test         |   4 +
 t/pack-refs-tests.sh                         |   2 +-
 t/show-ref-exists-tests.sh                   |   2 +-
 t/t0000-basic.sh                             |  16 +-
 t/t0001-init.sh                              |  18 +-
 t/t0008-ignores.sh                           |   8 +-
 t/t0009-git-dir-validation.sh                |   6 +-
 t/t0012-help.sh                              |   4 +-
 t/t0013-sha1dc.sh                            |   2 +-
 t/t0017-env-helper.sh                        |   4 +-
 t/t0021-conversion.sh                        |  18 +-
 t/t0029-core-unsetenvvars.sh                 |   4 +-
 t/t0030-stripspace.sh                        |   4 +-
 t/t0031-lockfile-pid.sh                      |   2 +-
 t/t0040-parse-options.sh                     |  52 +-
 t/t0041-usage.sh                             |   2 +-
 t/t0052-simple-ipc.sh                        |  10 +-
 t/t0061-run-command.sh                       |   2 +-
 t/t0066-dir-iterator.sh                      |   2 +-
 t/t0068-for-each-repo.sh                     |  16 +-
 t/t0070-fundamental.sh                       |   6 +-
 t/t0081-find-pack.sh                         |  12 +-
 t/t0091-bugreport.sh                         |  18 +-
 t/t0092-diagnose.sh                          |  12 +-
 t/t0100-previous.sh                          |   2 +-
 t/t0200-gettext-basic.sh                     |  14 +-
 t/t0203-gettext-setlocale-sanity.sh          |   4 +-
 t/t0204-gettext-reencode-sanity.sh           |   8 +-
 t/t0210-trace2-normal.sh                     |   6 +-
 t/t0211-trace2-perf.sh                       |  80 +--
 t/t0212-trace2-event.sh                      |   8 +-
 t/t0300-credentials.sh                       |   4 +-
 t/t0410-partial-clone.sh                     |  82 +--
 t/t0450-txt-doc-vs-help.sh                   |   2 +-
 t/t0500-progress-display.sh                  |  18 +-
 t/t0610-reftable-basics.sh                   |   8 +-
 t/t1004-read-tree-m-u-wf.sh                  |   8 +-
 t/t1006-cat-file.sh                          |  18 +-
 t/t1007-hash-object.sh                       |   8 +-
 t/t1011-read-tree-sparse-checkout.sh         |  10 +-
 t/t1050-large.sh                             |   6 +-
 t/t1091-sparse-checkout-builtin.sh           |  24 +-
 t/t1092-sparse-checkout-compatibility.sh     |  44 +-
 t/t1300-config.sh                            |  16 +-
 t/t1305-config-include.sh                    |   2 +-
 t/t1308-config-set.sh                        |   6 +-
 t/t1400-update-ref.sh                        | 170 +++---
 t/t1403-show-ref.sh                          |  18 +-
 t/t1410-reflog.sh                            |   4 +-
 t/t1415-worktree-refs.sh                     |   4 +-
 t/t1430-bad-ref-name.sh                      |  56 +-
 t/t1450-fsck.sh                              |  12 +-
 t/t1451-fsck-buffer.sh                       |   6 +-
 t/t1460-refs-migrate.sh                      |   2 +-
 t/t1500-rev-parse.sh                         |   6 +-
 t/t1502-rev-parse-parseopt.sh                |   2 +-
 t/t1503-rev-parse-verify.sh                  |  10 +-
 t/t1510-repo-setup.sh                        |  10 +-
 t/t1512-rev-parse-disambiguation.sh          |   4 +-
 t/t1515-rev-parse-outside-repo.sh            |   2 +-
 t/t1800-hook.sh                              |  18 +-
 t/t2004-checkout-cache-temp.sh               |   4 +-
 t/t2019-checkout-ambiguous-ref.sh            |   4 +-
 t/t2024-checkout-dwim.sh                     |   8 +-
 t/t2030-unresolve-info.sh                    |   6 +-
 t/t2060-switch.sh                            |   6 +-
 t/t2070-restore.sh                           |   2 +-
 t/t2080-parallel-checkout-basics.sh          |  14 +-
 t/t2081-parallel-checkout-collisions.sh      |  24 +-
 t/t2082-parallel-checkout-attributes.sh      |  12 +-
 t/t2103-update-index-ignore-missing.sh       |   6 +-
 t/t2200-add-update.sh                        |   2 +-
 t/t2203-add-intent.sh                        |   6 +-
 t/t2400-worktree-add.sh                      |  24 +-
 t/t2402-worktree-list.sh                     |  16 +-
 t/t2403-worktree-move.sh                     |   6 +-
 t/t2405-worktree-submodule.sh                |   6 +-
 t/t2407-worktree-heads.sh                    |  26 +-
 t/t2500-untracked-overwriting.sh             |   8 +-
 t/t2501-cwd-empty.sh                         |   4 +-
 t/t3001-ls-files-others-exclude.sh           |   6 +-
 t/t3007-ls-files-recurse-submodules.sh       |   6 +-
 t/t3200-branch.sh                            |  12 +-
 t/t3202-show-branch.sh                       |  10 +-
 t/t3203-branch-output.sh                     |   4 +-
 t/t3206-range-diff.sh                        |  78 +--
 t/t3207-branch-submodule.sh                  |   4 +-
 t/t3301-notes.sh                             |  32 +-
 t/t3310-notes-merge-manual-resolve.sh        |  16 +-
 t/t3320-notes-merge-worktrees.sh             |   2 +-
 t/t3400-rebase.sh                            |  16 +-
 t/t3402-rebase-merge.sh                      |  16 +-
 t/t3404-rebase-interactive.sh                |  72 +--
 t/t3406-rebase-message.sh                    |   6 +-
 t/t3415-rebase-autosquash.sh                 |  10 +-
 t/t3416-rebase-onto-threedots.sh             |   4 +-
 t/t3418-rebase-continue.sh                   |  10 +-
 t/t3420-rebase-autostash.sh                  |  25 +-
 t/t3422-rebase-incompatible-options.sh       |   4 +-
 t/t3429-rebase-edit-todo.sh                  |   2 +-
 t/t3430-rebase-merges.sh                     |  32 +-
 t/t3500-cherry.sh                            |   4 +-
 t/t3501-revert-cherry-pick.sh                |   6 +-
 t/t3504-cherry-pick-rerere.sh                |   6 +-
 t/t3510-cherry-pick-sequence.sh              |  24 +-
 t/t3602-rm-sparse-checkout.sh                |   4 +-
 t/t3705-add-sparse-checkout.sh               |  10 +-
 t/t3800-mktag.sh                             |   4 +-
 t/t3901-i18n-patch.sh                        |  20 +-
 t/t3903-stash.sh                             |  28 +-
 t/t3904-stash-patch.sh                       |   4 +-
 t/t3908-stash-in-worktree.sh                 |   2 +-
 t/t4000-diff-format.sh                       |   2 +-
 t/t4001-diff-rename.sh                       |   4 +-
 t/t4011-diff-symlink.sh                      |   2 +-
 t/t4013-diff-various.sh                      |   2 +-
 t/t4014-format-patch.sh                      | 344 ++++++------
 t/t4015-diff-whitespace.sh                   |  16 +-
 t/t4017-diff-retval.sh                       |   2 +-
 t/t4018-diff-funcname.sh                     |   2 +-
 t/t4019-diff-wserror.sh                      |   8 +-
 t/t4020-diff-external.sh                     |  18 +-
 t/t4021-format-patch-numbered.sh             |   4 +-
 t/t4022-diff-rewrite.sh                      |  14 +-
 t/t4028-format-patch-mime-headers.sh         |   6 +-
 t/t4031-diff-rewrite-binary.sh               |  18 +-
 t/t4033-diff-patience.sh                     |   2 +-
 t/t4036-format-patch-signer-mime.sh          |   6 +-
 t/t4038-diff-combined.sh                     |   6 +-
 t/t4051-diff-function-context.sh             |  38 +-
 t/t4053-diff-no-index.sh                     |   4 +-
 t/t4063-diff-blobs.sh                        |   2 +-
 t/t4065-diff-anchored.sh                     |  26 +-
 t/t4067-diff-partial-clone.sh                |  12 +-
 t/t4073-diff-stat-name-width.sh              |  24 +-
 t/t4103-apply-binary.sh                      |   2 +-
 t/t4120-apply-popt.sh                        |   2 +-
 t/t4124-apply-ws-rule.sh                     |  10 +-
 t/t4128-apply-root.sh                        |   2 +-
 t/t4140-apply-ita.sh                         |   4 +-
 t/t4141-apply-too-large.sh                   |   2 +-
 t/t4150-am.sh                                |  48 +-
 t/t4200-rerere.sh                            |   6 +-
 t/t4201-shortlog.sh                          |   2 +-
 t/t4202-log.sh                               |  84 +--
 t/t4204-patch-id.sh                          |   2 +-
 t/t4205-log-pretty-formats.sh                |   2 +-
 t/t4209-log-pickaxe.sh                       |  10 +-
 t/t4211-line-log.sh                          |  72 +--
 t/t4216-log-bloom.sh                         |  18 +-
 t/t4252-am-options.sh                        |  22 +-
 t/t4254-am-corrupt.sh                        |   6 +-
 t/t4258-am-quoted-cr.sh                      |   2 +-
 t/t4301-merge-tree-write-tree.sh             |  18 +-
 t/t5000-tar-tree.sh                          |  10 +-
 t/t5004-archive-corner-cases.sh              |   2 +-
 t/t5100-mailinfo.sh                          |   2 +-
 t/t5150-request-pull.sh                      |  18 +-
 t/t5300-pack-object.sh                       |  22 +-
 t/t5302-pack-index.sh                        |   6 +-
 t/t5304-prune.sh                             |   8 +-
 t/t5310-pack-bitmaps.sh                      |  14 +-
 t/t5317-pack-objects-filter-objects.sh       |  12 +-
 t/t5318-commit-graph.sh                      |   8 +-
 t/t5319-multi-pack-index.sh                  |  16 +-
 t/t5324-split-commit-graph.sh                |  10 +-
 t/t5325-reverse-index.sh                     |   2 +-
 t/t5326-multi-pack-bitmaps.sh                |  28 +-
 t/t5328-commit-graph-64bit-time.sh           |   2 +-
 t/t5329-pack-objects-cruft.sh                |   8 +-
 t/t5334-incremental-multi-pack-index.sh      |   2 +-
 t/t5335-compact-multi-pack-index.sh          |   4 +-
 t/t5351-unpack-large-objects.sh              |   2 +-
 t/t5402-post-merge-hook.sh                   |   4 +-
 t/t5403-post-checkout-hook.sh                |   2 +-
 t/t5404-tracking-branches.sh                 |   2 +-
 t/t5406-remote-rejects.sh                    |   2 +-
 t/t5407-post-rewrite-hook.sh                 |   8 +-
 t/t5409-colorize-remote-messages.sh          |  36 +-
 t/t5411/test-0013-bad-protocol.sh            |  14 +-
 t/t5411/test-0014-bad-protocol--porcelain.sh |  14 +-
 t/t5500-fetch-pack.sh                        |  38 +-
 t/t5504-fetch-receive-strict.sh              |  14 +-
 t/t5505-remote.sh                            |  20 +-
 t/t5510-fetch.sh                             |  10 +-
 t/t5512-ls-remote.sh                         |   8 +-
 t/t5514-fetch-multiple.sh                    |   2 +-
 t/t5516-fetch-push.sh                        |  20 +-
 t/t5520-pull.sh                              |   4 +-
 t/t5524-pull-msg.sh                          |   6 +-
 t/t5526-fetch-submodules.sh                  |  16 +-
 t/t5529-push-errors.sh                       |   4 +-
 t/t5530-upload-pack-error.sh                 |  18 +-
 t/t5531-deep-submodule-push.sh               |   2 +-
 t/t5532-fetch-proxy.sh                       |   2 +-
 t/t5533-push-cas.sh                          |  12 +-
 t/t5534-push-signed.sh                       |  22 +-
 t/t5537-fetch-shallow.sh                     |   2 +-
 t/t5538-push-shallow.sh                      |   2 +-
 t/t5539-fetch-http-shallow.sh                |   4 +-
 t/t5541-http-push-smart.sh                   |  32 +-
 t/t5544-pack-objects-hook.sh                 |  12 +-
 t/t5550-http-fetch-dumb.sh                   |   4 +-
 t/t5551-http-fetch-smart.sh                  |  46 +-
 t/t5552-skipping-fetch-negotiator.sh         |   6 +-
 t/t5554-noop-fetch-negotiator.sh             |   4 +-
 t/t5557-http-get.sh                          |   2 +-
 t/t5558-clone-bundle-uri.sh                  |  38 +-
 t/t5562-http-backend-content-length.sh       |   2 +-
 t/t5564-http-proxy.sh                        |  10 +-
 t/t5581-http-curl-verbose.sh                 |   2 +-
 t/t5583-push-branches.sh                     |   8 +-
 t/t5601-clone.sh                             |  28 +-
 t/t5604-clone-reference.sh                   |   8 +-
 t/t5605-clone-local.sh                       |   2 +-
 t/t5606-clone-options.sh                     |   6 +-
 t/t5612-clone-refspec.sh                     |   2 +-
 t/t5616-partial-clone.sh                     |  60 +--
 t/t5619-clone-local-ambiguous-transport.sh   |   2 +-
 t/t5620-backfill.sh                          |  12 +-
 t/t5700-protocol-v1.sh                       |  46 +-
 t/t5701-git-serve.sh                         |  14 +-
 t/t5702-protocol-v2.sh                       | 152 +++---
 t/t5703-upload-pack-ref-in-want.sh           |  22 +-
 t/t5705-session-id-in-capabilities.sh        |  12 +-
 t/t5750-bundle-uri-parse.sh                  |   8 +-
 t/t5801-remote-helpers.sh                    |   4 +-
 t/t5810-proto-disable-local.sh               |   2 +-
 t/t5813-proto-disable-ssh.sh                 |   4 +-
 t/t6000-rev-list-misc.sh                     |  26 +-
 t/t6005-rev-list-count.sh                    |   8 +-
 t/t6006-rev-list-format.sh                   |   4 +-
 t/t6009-rev-list-parent.sh                   |   4 +-
 t/t6020-bundle-misc.sh                       |  12 +-
 t/t6022-rev-list-missing.sh                  |   4 +-
 t/t6030-bisect-porcelain.sh                  | 150 +++---
 t/t6040-tracking-info.sh                     |   2 +-
 t/t6112-rev-list-filters-objects.sh          |  24 +-
 t/t6115-rev-list-du.sh                       |   4 +-
 t/t6120-describe.sh                          |  14 +-
 t/t6200-fmt-merge-msg.sh                     |  82 +--
 t/t6402-merge-rename.sh                      |   4 +-
 t/t6403-merge-file.sh                        |   6 +-
 t/t6404-recursive-merge.sh                   |   2 +-
 t/t6406-merge-attr.sh                        |  20 +-
 t/t6417-merge-ours-theirs.sh                 |  30 +-
 t/t6418-merge-text-auto.sh                   |   2 +-
 t/t6422-merge-rename-corner-cases.sh         |   8 +-
 t/t6423-merge-rename-directories.sh          |  72 +--
 t/t6424-merge-unrelated-index-changes.sh     |   6 +-
 t/t6427-diff3-conflict-markers.sh            |  10 +-
 t/t6432-merge-recursive-space-options.sh     |   4 +-
 t/t6436-merge-overwrite.sh                   |   6 +-
 t/t6437-submodule-merge.sh                   |  12 +-
 t/t6500-gc.sh                                |   8 +-
 t/t6600-test-reach.sh                        |   4 +-
 t/t7001-mv.sh                                |  16 +-
 t/t7002-mv-sparse-checkout.sh                |  38 +-
 t/t7003-filter-branch.sh                     |  16 +-
 t/t7004-tag.sh                               |   2 +-
 t/t7006-pager.sh                             |  16 +-
 t/t7012-skip-worktree-writing.sh             |   6 +-
 t/t7030-verify-tag.sh                        |  52 +-
 t/t7031-verify-tag-signed-ssh.sh             |  46 +-
 t/t7102-reset.sh                             |   2 +-
 t/t7110-reset-merge.sh                       |  40 +-
 t/t7201-co.sh                                |   6 +-
 t/t7300-clean.sh                             |   2 +-
 t/t7301-clean-interactive.sh                 |   2 +-
 t/t7400-submodule-basic.sh                   |  32 +-
 t/t7402-submodule-rebase.sh                  |   2 +-
 t/t7406-submodule-update.sh                  |  26 +-
 t/t7416-submodule-dash-url.sh                |  20 +-
 t/t7417-submodule-path-url.sh                |   2 +-
 t/t7450-bad-git-dotfiles.sh                  |  14 +-
 t/t7501-commit-basic-functionality.sh        |  14 +-
 t/t7502-commit-porcelain.sh                  |   2 +-
 t/t7507-commit-verbose.sh                    |   6 +-
 t/t7508-status.sh                            |   6 +-
 t/t7510-signed-commit.sh                     |  68 +--
 t/t7516-commit-races.sh                      |   4 +-
 t/t7519-status-fsmonitor.sh                  |  14 +-
 t/t7527-builtin-fsmonitor.sh                 |  84 +--
 t/t7528-signed-commit-ssh.sh                 |  68 +--
 t/t7600-merge.sh                             |  10 +-
 t/t7603-merge-reduce-heads.sh                |  20 +-
 t/t7606-merge-custom.sh                      |   2 +-
 t/t7607-merge-state.sh                       |   4 +-
 t/t7610-mergetool.sh                         |  18 +-
 t/t7700-repack.sh                            |  14 +-
 t/t7703-repack-geometric.sh                  |   4 +-
 t/t7704-repack-cruft.sh                      |  12 +-
 t/t7800-difftool.sh                          |  26 +-
 t/t7810-grep.sh                              |  22 +-
 t/t7814-grep-recurse-submodules.sh           |   2 +-
 t/t7900-maintenance.sh                       |  36 +-
 t/t8008-blame-formats.sh                     |   2 +-
 t/t8010-cat-file-filters.sh                  |   2 +-
 t/t8012-blame-colors.sh                      |   2 +-
 t/t9001-send-email.sh                        | 190 +++----
 t/t9003-help-autocorrect.sh                  |   6 +-
 t/t9106-git-svn-commit-diff-clobber.sh       |   2 +-
 t/t9107-git-svn-migrate.sh                   |  30 +-
 t/t9110-git-svn-use-svm-props.sh             |  20 +-
 t/t9111-git-svn-use-svnsync-props.sh         |  18 +-
 t/t9114-git-svn-dcommit-merge.sh             |   6 +-
 t/t9116-git-svn-log.sh                       |   8 +-
 t/t9117-git-svn-init-clone.sh                |  12 +-
 t/t9119-git-svn-info.sh                      |  16 +-
 t/t9122-git-svn-author.sh                    |   8 +-
 t/t9130-git-svn-authors-file.sh              |   8 +-
 t/t9138-git-svn-authors-prog.sh              |  14 +-
 t/t9140-git-svn-reset.sh                     |   4 +-
 t/t9153-git-svn-rewrite-uuid.sh              |   4 +-
 t/t9200-git-cvsexportcommit.sh               |   2 +-
 t/t9210-scalar.sh                            |  34 +-
 t/t9211-scalar-clone.sh                      |  16 +-
 t/t9300-fast-import.sh                       |  10 +-
 t/t9350-fast-export.sh                       |  54 +-
 t/t9351-fast-export-anonymize.sh             |  36 +-
 t/t9400-git-cvsserver-server.sh              |   4 +-
 t/t9501-gitweb-standalone-http-status.sh     |  58 +--
 t/t9502-gitweb-standalone-parse-output.sh    |  38 +-
 t/t9800-git-p4-basic.sh                      |  10 +-
 t/t9801-git-p4-branch.sh                     |  48 +-
 t/t9806-git-p4-options.sh                    |  10 +-
 t/t9807-git-p4-submit.sh                     |   2 +-
 t/t9810-git-p4-rcs.sh                        |   8 +-
 t/t9813-git-p4-preserve-users.sh             |   8 +-
 t/t9814-git-p4-rename.sh                     |   8 +-
 t/t9827-git-p4-change-filetype.sh            |   4 +-
 t/t9832-unshelve.sh                          |   6 +-
 t/t9833-errors.sh                            |   4 +-
 t/t9835-git-p4-metadata-encoding-python2.sh  |  36 +-
 t/t9836-git-p4-metadata-encoding-python3.sh  |  38 +-
 t/t9850-shell.sh                             |   2 +-
 t/t9902-completion.sh                        |  26 +-
 363 files changed, 4067 insertions(+), 3334 deletions(-)
 create mode 100644 t/check-shell-parser.pl
 create mode 100644 t/lib-shell-parser.pl
 create mode 100755 t/lint-style.pl
 create mode 100644 t/lint-style/grep-assert.expect
 create mode 100644 t/lint-style/grep-assert.test
 create mode 100644 t/lint-style/grep-fix.expect
 create mode 100644 t/lint-style/grep-fix.test
 create mode 100644 t/lint-style/grep-missing-file.expect
 create mode 100644 t/lint-style/grep-missing-file.test
 create mode 100644 t/lint-style/grep-negated.expect
 create mode 100644 t/lint-style/grep-negated.test
 create mode 100644 t/lint-style/grep-not-assert.expect
 create mode 100644 t/lint-style/grep-not-assert.test
 create mode 100644 t/lint-style/heredoc.expect
 create mode 100644 t/lint-style/heredoc.test
 create mode 100644 t/lint-style/test-grep-negation-fix.expect
 create mode 100644 t/lint-style/test-grep-negation-fix.test
 create mode 100644 t/lint-style/test-grep-negation.expect
 create mode 100644 t/lint-style/test-grep-negation.test


base-commit: 9ac3f193c05c2237e2b14ebaa1149e9fc8a1abe0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2135%2Fmmontalbo%2Fmm%2Ftest-grep-docs-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2135/mmontalbo/mm/test-grep-docs-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/2135
-- 
gitgitgadget

^ permalink raw reply

* Re: [PATCH 0/2] parse-options: introduce die_for_required_opt() helper
From: Christian Couder @ 2026-06-04  7:45 UTC (permalink / raw)
  To: Siddharth Shrimali; +Cc: git, gitster, toon, jn.avila
In-Reply-To: <20260603111044.39116-1-r.siddharth.shrimali@gmail.com>

On Wed, Jun 3, 2026 at 1:11 PM Siddharth Shrimali
<r.siddharth.shrimali@gmail.com> wrote:
>
> Many built-in commands in Git manually check for option prerequisites
> (i.e., option X relies on option Y being present) using explicit
> conditional blocks and duplicated error message strings.
>
> This short series comes out of a discussion with Christian about
> localization and code duplication. To address these issues, it
> introduces a centralized API helper that handles simple option
> prerequisites safely.

I think it would be nice to mention around here that the new function
was inspired by die_for_incompatible_opt2() and similar functions.

> - Patch 1 introduces the `die_for_required_opt()` helper function
>   inside parse-options.
>
> - Patch 2 cleans up `builtin/add.c` as a proof-of-concept by migrating
>   its manual prerequisite checks for '--ignore-missing' and
>   '--pathspec-file-nul' over to the new helper.
>
> If this initial approach looks good, we can later extend the helper
> to handle more complex multi-option dependencies.

Yeah, for functions with more arguments to address cases like "option
X requires both options Y and Z" or "option X requires either option Y
or option Z", I think it's not clear yet what would be the most useful
and what's the best name for such functions.

^ permalink raw reply

* Re: [PATCH] Makefile: drop duplicate %.a from link recipes
From: Harald Nordgren @ 2026-06-04  7:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Harald Nordgren via GitGitGadget, git
In-Reply-To: <CAHwyqnV6uh_yyO9FcUiXKfKPt15ojR3GOmRC06pW55f=KRu=Zw@mail.gmail.com>

Maybe we can do this to get around the brittleness for all ~10 places:

```
-LIBS = $(filter-out %.o, $(GITLIBS)) $(EXTLIBS)
+LIBS = $(filter %.a,$^) $(filter-out $(filter %.a,$^),$(filter-out
%.o,$(GITLIBS)) $(EXTLIBS))

 BASIC_CFLAGS += $(COMPAT_CFLAGS)
 LIB_OBJS += $(COMPAT_OBJS)
@@ -3392,7 +3395,7 @@ perf: all
 t/helper/test-tool$X: $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
$(UNIT_TEST_DIR)/test-lib.o

 t/helper/test-%$X: t/helper/test-%.o GIT-LDFLAGS $(GITLIBS)
- $(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter
%.o,$^) $(filter %.a,$^) $(LIBS)
+ $(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(LIBS)

 check-sha1:: t/helper/test-tool$X
  t/helper/test-sha1.sh
@@ -4015,13 +4018,13 @@ fuzz-all: $(FUZZ_PROGRAMS)
 $(FUZZ_PROGRAMS): %: %.o oss-fuzz/dummy-cmd-main.o $(GITLIBS) GIT-LDFLAGS
  $(QUIET_LINK)$(FUZZ_CXX) $(FUZZ_CXXFLAGS) -o $@ $(ALL_LDFLAGS) \
  -Wl,--allow-multiple-definition \
- $(filter %.o,$^) $(filter %.a,$^) $(LIBS) $(LIB_FUZZING_ENGINE)
+ $(filter %.o,$^) $(LIBS) $(LIB_FUZZING_ENGINE)

 $(UNIT_TEST_PROGS): $(UNIT_TEST_BIN)/%$X: $(UNIT_TEST_DIR)/%.o
$(UNIT_TEST_OBJS) \
  $(GITLIBS) GIT-LDFLAGS
  $(call mkdir_p_parent_template)
  $(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) \
- $(filter %.o,$^) $(filter %.a,$^) $(LIBS)
+ $(filter %.o,$^) $(LIBS)

 GIT-TEST-SUITES: FORCE
  @FLAGS='$(CLAR_TEST_SUITES)'; \
```


Harald

On Thu, Jun 4, 2026 at 9:06 AM Harald Nordgren <haraldnordgren@gmail.com> wrote:
>
> On Thu, Jun 4, 2026 at 2:33 AM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > "Harald Nordgren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >
> > >  t/helper/test-%$X: t/helper/test-%.o GIT-LDFLAGS $(GITLIBS)
> > > -     $(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(filter %.a,$^) $(LIBS)
> > > +     $(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(LIBS)
> >
> > I think the reason why the pattern to use only the .o files among
> > the prerequisites and then use only the .a files among the same
> > prerequisites (both filters $^) is used here is to make sure that the
> > linker sees object files first before library archives, so that by
> > the time its left-to-right scan sees the first library archive, all
> > the missing symbols in the object files are known.  The above change
> > depends on LIBS being a strict superset of all the library archive
> > files ($GITLIBS in the current code, but that can be updated in the
> > future) listed as prerequisites for the rule, but there is nothing to
> > guarantee that, so it looks brittle.
> >
> > Exact same comment applies to the other two rules touched by this patch.
>
> Hmm, there are other constructs like this that rely on $(LIBS) being a
> superset of the archives, so the three rules changed here align with
> the trend rather than introduce a new trend.
>
> Not saying we shouldn't find a way to handle the overall brittleness.
>
>
> Harald

^ permalink raw reply

* Re: [PATCH v3] index-pack: retain child bases in delta cache
From: Jeff King @ 2026-06-04  7:12 UTC (permalink / raw)
  To: Arijit Banerjee via GitGitGadget
  Cc: git, Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Derrick Stolee, Arijit Banerjee, Arijit Banerjee
In-Reply-To: <pull.2131.v3.git.1780445118653.gitgitgadget@gmail.com>

On Wed, Jun 03, 2026 at 12:05:17AM +0000, Arijit Banerjee via GitGitGadget wrote:

>      * Addressed Jeff King's review question by releasing cached base data
>        after all direct children have been dispatched, while keeping the
>        existing subtree bookkeeping intact.
>      * Re-ran t/t5302-pack-index.sh, p5302-pack-index.sh, and end-to-end
>        full clone spot checks with the precise-release version.

Thanks for humoring me. I fully expected the answer to be "it is hard to
do and doesn't show much improvement, so let's not bother". ;)

It was hard to see the difference between v2 and v3 performance (which I
tried to dig out from the range-diff below), but it looks like it was
basically none. I did my own run of p5302 between the two versions using
both git.git and linux.git, and likewise didn't find anything.

I guess it would make a difference only if we were routinely expiring
useful items out of the cache due to the limit. And even though
linux.git is a "large" repo compared to git.git, cache locality here is
mostly based on how wide the delta tree for a file gets (that is, how
often we go down one chain, caching bases, while still finding it useful
to keep earlier parts of the chain to go down a parallel path).

And that probably has less to do with overall repo size rather than with
how we tend to pack things. Though I guess a repo with a lot of large
files might see more cache pressure (just because each single entry
"costs" more). We could simulate that by dropping the cache size in
p5302, but I still couldn't find any effect even with a tiny cache.

(Actually, with a tiny cache it looked like things got ~1% slower; maybe
noise, but maybe extra thread contention due to the release code?).

So I am happy with either v2 or v3.

-Peff

^ permalink raw reply

* Re: [PATCH] read_gitfile_gently(): return non-repo path on error
From: Patrick Steinhardt @ 2026-06-04  7:08 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Tian Yuchen
In-Reply-To: <20260604062720.GA3195904@coredump.intra.peff.net>

On Thu, Jun 04, 2026 at 02:27:20AM -0400, Jeff King wrote:
> On Tue, Jun 02, 2026 at 10:36:34AM +0200, Patrick Steinhardt wrote:
> 
> > > The correct output (which this patch produces) is:
> > > 
> > >   fatal: not a git repository: /home/peff/compile/git/.git/worktrees/worktree3
> > > 
> > > And indeed, that path is missing. But why? I feel like I've run into
> > > this same problem occasionally over the last year or so, but never
> > > before. Did we get more aggressive about removing worktrees at some
> > > point? I haven't been able to reproduce whatever is killing off the
> > > worktree directory, and by the time I see the error it is long gone.
> > 
> > Both git-gc(1) and git-maintenance(1) prune orphaned worktrees that are
> > older than three months by default, which can be configured via
> > "gc.worktreePruneExpire". That logic has changed in 4dda60c9df (Merge
> > branch 'ps/maintenance-missing-tasks', 2025-05-15), which would kind of
> > match your timeline.
> > 
> > But rereading that patch series I cannot really see how it could result
> > in more aggressive pruning of worktrees. We used `git worktree prune
> > --expire <expiry>` before that series, and we still use that logic now.
> 
> Yeah, but this .git/worktrees/ directory shouldn't be pruned _at all_.
> The worktree itself is still there (which is why I'm getting the error).
> So perhaps there's a bug in checking that things are still there, or
> perhaps something is corrupting .git/worktrees/*/gitdir.

Oh, that sounds somewhat scary.

> Another option is "I moved my git checkout and the worktree prune
> couldn't find the directory as an absolute path", but I'm sure I didn't
> do that.
> 
> An even more exotic option is that I run Git's test suite a lot, and
> very occasionally bugs in the test suite cause the script to escape the
> trash directory. And some scripts do run "rm -r .git/worktrees". I find
> it pretty unlikely for that to be the culprit though.
> 
> Oh well. I don't have any good leads, so I guess I'll see if it happens
> again. But maybe now if somebody else sees it we can commiserate. :)

I'll certainly be on the watchout.

Patrick

^ permalink raw reply

* Re: [PATCH] Makefile: drop duplicate %.a from link recipes
From: Harald Nordgren @ 2026-06-04  7:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Harald Nordgren via GitGitGadget, git
In-Reply-To: <xmqqik7zqh4p.fsf@gitster.g>

On Thu, Jun 4, 2026 at 2:33 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Harald Nordgren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> >  t/helper/test-%$X: t/helper/test-%.o GIT-LDFLAGS $(GITLIBS)
> > -     $(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(filter %.a,$^) $(LIBS)
> > +     $(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(LIBS)
>
> I think the reason why the pattern to use only the .o files among
> the prerequisites and then use only the .a files among the same
> prerequisites (both filters $^) is used here is to make sure that the
> linker sees object files first before library archives, so that by
> the time its left-to-right scan sees the first library archive, all
> the missing symbols in the object files are known.  The above change
> depends on LIBS being a strict superset of all the library archive
> files ($GITLIBS in the current code, but that can be updated in the
> future) listed as prerequisites for the rule, but there is nothing to
> guarantee that, so it looks brittle.
>
> Exact same comment applies to the other two rules touched by this patch.

Hmm, there are other constructs like this that rely on $(LIBS) being a
superset of the archives, so the three rules changed here align with
the trend rather than introduce a new trend.

Not saying we shouldn't find a way to handle the overall brittleness.


Harald

^ permalink raw reply

* Re: What's cooking in git.git (Jun 2026, #02)
From: Patrick Steinhardt @ 2026-06-04  6:52 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git
In-Reply-To: <20260604062122.GB3194609@coredump.intra.peff.net>

On Thu, Jun 04, 2026 at 02:21:22AM -0400, Jeff King wrote:
> On Thu, Jun 04, 2026 at 11:35:50AM +0900, Junio C Hamano wrote:
> 
> > * ps/t7527-fix-tap-output (2026-06-02) 4 commits
> >  - t: let prove fail when parsing invalid TAP output
> >  - t/lib-git-p4: silence output when killing p4d and its watchdog
> >  - t/test-lib: silence EBUSY errors on Windows during test cleanup
> >  - t7527: fix broken TAP output
> > 
> >  A recent regression in t7527 that broke TAP output has been fixed,
> >  some other test noise that also broke TAP output has been silenced,
> >  and 'prove' is now configured to fail on invalid TAP output to
> >  prevent future regressions.
> > 
> >  Expecting a (small and hopefully final) reroll.
> >  cf. <xmqqtsrlw09t.fsf@gitster.g>
> >  source: <20260603-pks-t7527-fix-tap-output-v2-0-cf3af5694e20@pks.im>
> 
> I guess this is the source of several CI failures on jch/seen that look
> like:
> 
>     Test Summary Report
>     -------------------
>     t7810-grep.sh                                    (Wstat: 0 Tests: 264 Failed: 0)
>     Parse errors: Unknown TAP token: "/bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)"
> 
> This happens in the dockerized jobs whose images presumably have a very
> minimal locale setup. I don't know if this is a sign that the tests have
> never been doing quite what we expect on those platforms, or if it's
> simply noise that is now being caught.

Ah, thanks for flagging. I'll investigate.

Patrick

^ permalink raw reply

* Re: [PATCH v2 0/2] Small updates to SubmittingPatches
From: Patrick Steinhardt @ 2026-06-04  6:50 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Junio C Hamano, git
In-Reply-To: <c54f3571-ff7b-4caa-b75d-a739ed87ec9d@gmail.com>

On Tue, Jun 02, 2026 at 11:24:48AM -0400, Derrick Stolee wrote:
> On 6/2/2026 10:43 AM, Junio C Hamano wrote:
> > Recently I gave some advice on how a cover letter should
> > try to sell the idea to widest possible audience, and then
> > I realized that we do not seem to teach how in our guides.
> > 
> > Here is a small series to do so.
> > 
> > In this round, a few typos have been corrected, and improvements are
> > made thanks to help from Christian, Stolee, and Patrick.
> This version LGTM.

Agreed, I'm happy with this version.

Patrick

^ permalink raw reply

* [PATCH v3] git-gui: silence install recipes under "make -s"
From: Harald Nordgren via GitGitGadget @ 2026-06-04  6:48 UTC (permalink / raw)
  To: git; +Cc: Johannes Sixt, Harald Nordgren, Harald Nordgren
In-Reply-To: <pull.2318.v2.git.git.1780510415838.gitgitgadget@gmail.com>

From: Harald Nordgren <haraldnordgren@gmail.com>

Several install and uninstall recipes embed "echo" calls that fire as
part of the recipe itself, so the install banners (DEST, INSTALL,
LINK, REMOVE) were visible whenever the variables expand non-empty.

Guard the whole "ifndef V" block on "-s" so the loud variants are
selected only when "-s" is absent and V=1 is unset. The existing
"-s" check also had its findstring arguments in the wrong order
(needle "-s" never fit in haystack "s"), so swap them while moving
the check to wrap the block.

Signed-off-by: Harald Nordgren <harald.nordgren@kostdoktorn.se>
---
    git-gui: silence install recipes under "make -s"
    
    Added sentences to the commit message noting that the old findstring arg
    order was broken (needle never fit haystack).

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-2318%2FHaraldNordgren%2Fgit-gui-respect-silent-flag-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-2318/HaraldNordgren/git-gui-respect-silent-flag-v3
Pull-Request: https://github.com/git/git/pull/2318

Range-diff vs v2:

 1:  4e4029c8e8 ! 1:  1375fdc1aa git-gui: silence install recipes under "make -s"
     @@ Commit message
          LINK, REMOVE) were visible whenever the variables expand non-empty.
      
          Guard the whole "ifndef V" block on "-s" so the loud variants are
     -    selected only when "-s" is absent and V=1 is unset.
     +    selected only when "-s" is absent and V=1 is unset. The existing
     +    "-s" check also had its findstring arguments in the wrong order
     +    (needle "-s" never fit in haystack "s"), so swap them while moving
     +    the check to wrap the block.
      
          Signed-off-by: Harald Nordgren <harald.nordgren@kostdoktorn.se>
      


 git-gui/Makefile | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/git-gui/Makefile b/git-gui/Makefile
index ca01068810..d33204e875 100644
--- a/git-gui/Makefile
+++ b/git-gui/Makefile
@@ -64,6 +64,7 @@ REMOVE_F0  = $(RM_RF) # space is required here
 REMOVE_F1  =
 CLEAN_DST  = true
 
+ifneq ($(findstring s,$(firstword -$(MAKEFLAGS))),s)
 ifndef V
 	QUIET          = @
 	QUIET_GEN      = $(QUIET)echo '   ' GEN '$@' &&
@@ -89,6 +90,7 @@ ifndef V
 	REMOVE_F0 = dst=
 	REMOVE_F1 = && echo '   ' REMOVE `basename "$$dst"` && $(RM_RF) "$$dst"
 endif
+endif
 
 TCLTK_PATH ?= wish
 ifeq (./,$(dir $(TCLTK_PATH)))
@@ -97,10 +99,6 @@ else
 	TCL_PATH ?= $(dir $(TCLTK_PATH))$(notdir $(subst wish,tclsh,$(TCLTK_PATH)))
 endif
 
-ifeq ($(findstring $(firstword -$(MAKEFLAGS)),s),s)
-QUIET_GEN =
-endif
-
 -include config.mak
 
 DESTDIR_SQ = $(subst ','\'',$(DESTDIR))

base-commit: 9ac3f193c05c2237e2b14ebaa1149e9fc8a1abe0
-- 
gitgitgadget

^ permalink raw reply related

* Re: [PATCH v1 2/4] read-cache: move 'ce_mode_from_stat()' to 'read-cache.c'
From: Patrick Steinhardt @ 2026-06-04  6:47 UTC (permalink / raw)
  To: Tian Yuchen; +Cc: git, christian.couder, Ayush Chandekar, Olamide Caleb Bello
In-Reply-To: <20260530160520.77859-3-cat@malon.dev>

On Sun, May 31, 2026 at 12:05:17AM +0800, Tian Yuchen wrote:
> diff --git a/read-cache.h b/read-cache.h
> index 043da1f1aa..3c4af2faeb 100644
> --- a/read-cache.h
> +++ b/read-cache.h
> @@ -5,20 +5,8 @@
>  #include "object.h"
>  #include "pathspec.h"
>  
> -static inline unsigned int ce_mode_from_stat(const struct cache_entry *ce,
> -					     unsigned int mode)
> -{
> -	extern int trust_executable_bit, has_symlinks;
> -	if (!has_symlinks && S_ISREG(mode) &&
> -	    ce && S_ISLNK(ce->ce_mode))
> -		return ce->ce_mode;
> -	if (!trust_executable_bit && S_ISREG(mode)) {
> -		if (ce && S_ISREG(ce->ce_mode))
> -			return ce->ce_mode;
> -		return create_ce_mode(0666);
> -	}
> -	return create_ce_mode(mode);
> -}
> +unsigned int ce_mode_from_stat(const struct cache_entry *ce,
> +				unsigned int mode);

This is moving goalposts a bit, so please feel free to ignore: should we
maybe add a small comment what the function does while at it?

Patrick

^ permalink raw reply

* Re: [PATCH v2 2/4] doc: replay: improve config description
From: Kristoffer Haugsbakk @ 2026-06-04  6:31 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Junio C Hamano, Siddharth Asthana, git
In-Reply-To: <aiEa5EWeAaaMsqRR@pks.im>

On Thu, Jun 4, 2026, at 08:27, Patrick Steinhardt wrote:
> On Wed, Jun 03, 2026 at 06:04:23PM +0200,
> kristofferhaugsbakk@fastmail.com wrote:
>> From: Kristoffer Haugsbakk <code@khaugsbakk.name>
>>
>> First of all, this bullet list for `--ref-action` introduces a term with
>> a colon.  This is exactly what a description list is, structurally. Let’s
>> be sylistically consistent and use the description list markup
>
> s/sylistically/stylistically/

Thanks, I’ll make the correction.

>
>> diff --git a/Documentation/git-replay.adoc b/Documentation/git-replay.adoc
>> index f9ca2db2833..4de85088d6c 100644
>> --- a/Documentation/git-replay.adoc
>> +++ b/Documentation/git-replay.adoc
>> @@ -211,6 +211,7 @@ to use bare commit IDs instead of branch names.
>>
>>  CONFIGURATION
>>  -------------
>> +:git-replay: 1
>>  include::config/replay.adoc[]
>
> Not quite sure, but was this change supposed to be part of the preceding
> commit, where you also added the include?

No, because the conditional is only being put to use now. That was the
intention anyway. Maybe there is some reason to put it in the first
commit?

Thanks!

^ permalink raw reply

* Re: [PATCH v2 2/4] doc: replay: improve config description
From: Patrick Steinhardt @ 2026-06-04  6:27 UTC (permalink / raw)
  To: kristofferhaugsbakk
  Cc: Junio C Hamano, Kristoffer Haugsbakk, Siddharth Asthana, git
In-Reply-To: <V2_doc_replay_improve_config.769@msgid.xyz>

On Wed, Jun 03, 2026 at 06:04:23PM +0200, kristofferhaugsbakk@fastmail.com wrote:
> From: Kristoffer Haugsbakk <code@khaugsbakk.name>
> 
> First of all, this bullet list for `--ref-action` introduces a term with
> a colon.  This is exactly what a description list is, structurally. Let’s
> be sylistically consistent and use the description list markup

s/sylistically/stylistically/

> diff --git a/Documentation/git-replay.adoc b/Documentation/git-replay.adoc
> index f9ca2db2833..4de85088d6c 100644
> --- a/Documentation/git-replay.adoc
> +++ b/Documentation/git-replay.adoc
> @@ -211,6 +211,7 @@ to use bare commit IDs instead of branch names.
>  
>  CONFIGURATION
>  -------------
> +:git-replay: 1
>  include::config/replay.adoc[]

Not quite sure, but was this change supposed to be part of the preceding
commit, where you also added the include?

Patrick

^ permalink raw reply

* Re: [PATCH] read_gitfile_gently(): return non-repo path on error
From: Jeff King @ 2026-06-04  6:27 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Tian Yuchen
In-Reply-To: <ah6WEtk2pXyViEQA@pks.im>

On Tue, Jun 02, 2026 at 10:36:34AM +0200, Patrick Steinhardt wrote:

> > The correct output (which this patch produces) is:
> > 
> >   fatal: not a git repository: /home/peff/compile/git/.git/worktrees/worktree3
> > 
> > And indeed, that path is missing. But why? I feel like I've run into
> > this same problem occasionally over the last year or so, but never
> > before. Did we get more aggressive about removing worktrees at some
> > point? I haven't been able to reproduce whatever is killing off the
> > worktree directory, and by the time I see the error it is long gone.
> 
> Both git-gc(1) and git-maintenance(1) prune orphaned worktrees that are
> older than three months by default, which can be configured via
> "gc.worktreePruneExpire". That logic has changed in 4dda60c9df (Merge
> branch 'ps/maintenance-missing-tasks', 2025-05-15), which would kind of
> match your timeline.
> 
> But rereading that patch series I cannot really see how it could result
> in more aggressive pruning of worktrees. We used `git worktree prune
> --expire <expiry>` before that series, and we still use that logic now.

Yeah, but this .git/worktrees/ directory shouldn't be pruned _at all_.
The worktree itself is still there (which is why I'm getting the error).
So perhaps there's a bug in checking that things are still there, or
perhaps something is corrupting .git/worktrees/*/gitdir.

Another option is "I moved my git checkout and the worktree prune
couldn't find the directory as an absolute path", but I'm sure I didn't
do that.

An even more exotic option is that I run Git's test suite a lot, and
very occasionally bugs in the test suite cause the script to escape the
trash directory. And some scripts do run "rm -r .git/worktrees". I find
it pretty unlikely for that to be the culprit though.

Oh well. I don't have any good leads, so I guess I'll see if it happens
again. But maybe now if somebody else sees it we can commiserate. :)

-Peff

^ permalink raw reply

* Re: Mirror repositories for submodules
From: Jeff King @ 2026-06-04  6:16 UTC (permalink / raw)
  To: Simon Richter; +Cc: Junio C Hamano, Benson Muite, git
In-Reply-To: <d64e7f31-4e00-478c-ab31-b671242865fb@hogyros.de>

On Thu, Jun 04, 2026 at 02:11:38PM +0900, Simon Richter wrote:

> Cloning from our server will, depending on what upstream uses, either a
> relative URL (which will go to our server, but we have little control over
> what the name part of the repository base URL is going to be), or an
> absolute URL that instructs clients to pull from another place, which
> conflicts with our goal to have a self-contained archive.
> 
> The idea posited earlier, to have a "repository identity" that remains the
> same across forks and clones, is somewhat appealing, but the best idea I can
> come up with is generating some kind of repository UUID, and adding a
> symlink -- not a great design because it pollutes outside the repo:
> 
>     $ mkdir myproject
>     $ cd myproject
>     $ git init
>     $ ls -l ..
>     lrwxrwxrwx 1 simon simon   9 Jun  4 14:05
> 12345678-9abc-def0-1234-56789abcdef0.git -> myproject
>     drwxrwxr-x 2 simon simon  40 Jun  4 14:04 myproject
> 
> On the other hand, this can be used to construct a stable relative submodule
> URL.

Here's a thought experiment. What if you put the UUID into a URL, like:

  repoid://123456789.git

Then your in-repo .gitconfig would point to that repo id and be
consistent. Of course you need some way to tell Git how to retrieve
repoid:// URLs. You could do so with a custom remote helper
(git-remote-repoid), but presumably that helper is eventually going to
end up going over one of the normal Git protocols.

So we just need to tell Git how to resolve repo id URLs into concrete
URLs. And indeed, we have url.*.insteadOf to do rewriting already. So
for example, you can add a submodule but convert it into a uuid like
this:

  $ git submodule add https://github.com/git/git.git
  $ git config -f .gitmodules submodule.git.url
  https://github.com/git/git.git
  $ git config -f .gitmodules submodule.git.url repoid://123456789.git
  $ git commit -am 'add submodule with magic repoid'

Now if somebody else comes along and clones it naively, the repo uuid is
not useful to git by itself:

  $ git clone --recurse-submodules repo
  Submodule 'git' (repoid://123456789.git) registered for path 'git'
  Cloning into '/home/peff/tmp/repo/git'...
  fatal: transport 'repoid' not allowed
  fatal: clone of 'repoid://123456789.git' into submodule path '/home/peff/tmp/repo/git' failed

But imagine that "somehow" they have learned that 123456789.git can be
found at some URL. You can do this:

  git -c url.https://github.com/git/git.git.insteadOf=repoid://123456789.git \
      clone --recurse-submodules repo.git

which would clone from the original URL. Or you could even imagine that
they have a cache of repositories named by uuid, and then:

  git -c url.https://my/cache/.insteadOf=repoid:// ...

would rewrite all repoid://'s automatically.

The use of "-c" here is mostly for illustration. It is a per-command
config, so when you later try to update the submodule, you'd run into
the same problem. Probably you'd want to stuff your mapping into on-disk
config (either ~/.gitconfig, or if you have a lot of them, perhaps some
file included from there).

It would be nice if you could use "git clone -c" (note "-c" as an option
to "clone", not to "git") to set a permanent per-repo config variable.
But sadly the URL rewriting happens in the submodule repository, not the
parent. So it has to be a per-user setting.

Now, all of that said, do we still need uuids at all? If the canonical
submodule name is https://github.com/git/git.git, then anybody can just
rewrite that locally in the same way using url.*.insteadOf config. And I
think this is a pretty standard way of using submodules. E.g., you might
rewrite https:// into ssh:// if you prefer that protocol. Or point to a
local server if it's faster for you.

Which makes me wonder if I am missing something about the original
request that started this thread. But it sounds to me like it is just
asking for the existing URL-rewriting feature.

-Peff

^ permalink raw reply

* Re: What's cooking in git.git (Jun 2026, #02)
From: Jeff King @ 2026-06-04  6:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Patrick Steinhardt, git
In-Reply-To: <xmqq8q8vowvt.fsf@gitster.g>

On Thu, Jun 04, 2026 at 11:35:50AM +0900, Junio C Hamano wrote:

> * ps/t7527-fix-tap-output (2026-06-02) 4 commits
>  - t: let prove fail when parsing invalid TAP output
>  - t/lib-git-p4: silence output when killing p4d and its watchdog
>  - t/test-lib: silence EBUSY errors on Windows during test cleanup
>  - t7527: fix broken TAP output
> 
>  A recent regression in t7527 that broke TAP output has been fixed,
>  some other test noise that also broke TAP output has been silenced,
>  and 'prove' is now configured to fail on invalid TAP output to
>  prevent future regressions.
> 
>  Expecting a (small and hopefully final) reroll.
>  cf. <xmqqtsrlw09t.fsf@gitster.g>
>  source: <20260603-pks-t7527-fix-tap-output-v2-0-cf3af5694e20@pks.im>

I guess this is the source of several CI failures on jch/seen that look
like:

    Test Summary Report
    -------------------
    t7810-grep.sh                                    (Wstat: 0 Tests: 264 Failed: 0)
    Parse errors: Unknown TAP token: "/bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)"

This happens in the dockerized jobs whose images presumably have a very
minimal locale setup. I don't know if this is a sign that the tests have
never been doing quite what we expect on those platforms, or if it's
simply noise that is now being caught.

-Peff

^ permalink raw reply

* Re: What's cooking in git.git (Jun 2026, #02)
From: Patrick Steinhardt @ 2026-06-04  6:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqq8q8vowvt.fsf@gitster.g>

On Thu, Jun 04, 2026 at 11:35:50AM +0900, Junio C Hamano wrote:
> * ps/t7527-fix-tap-output (2026-06-02) 4 commits
>  - t: let prove fail when parsing invalid TAP output
>  - t/lib-git-p4: silence output when killing p4d and its watchdog
>  - t/test-lib: silence EBUSY errors on Windows during test cleanup
>  - t7527: fix broken TAP output
> 
>  A recent regression in t7527 that broke TAP output has been fixed,
>  some other test noise that also broke TAP output has been silenced,
>  and 'prove' is now configured to fail on invalid TAP output to
>  prevent future regressions.
> 
>  Expecting a (small and hopefully final) reroll.
>  cf. <xmqqtsrlw09t.fsf@gitster.g>
>  source: <20260603-pks-t7527-fix-tap-output-v2-0-cf3af5694e20@pks.im>

I think this status here is stale -- v2 didn't have any comments yet as
far as I can see.

Patrick

^ permalink raw reply

* Re: [PATCH v2 0/8] setup: centralize object database creation
From: Patrick Steinhardt @ 2026-06-04  6:08 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, Kristoffer Haugsbakk, Junio C Hamano
In-Reply-To: <CAOLa=ZS4mSHEThYD0GKFXxqDf1Yz9U7pQkXYQJ+54V5C2FPBOg@mail.gmail.com>

On Wed, Jun 03, 2026 at 06:04:01AM -0700, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Hi,
> >
> > this small patch series refactors the logic for how we discover and
> > configure repositories. Most importantly, this involves the following
> > two steps:
> >
> >   1. We unify the logic to apply the repository format, which is
> >      currently open-coded across multiple sites. These sites have
> >      already diverged, where some repository extensions are not
> >      consistently applied.
> >
> >   2. We then centralize creation of the object database to happen at the
> >      same time we apply the repository format.
> >
> > The end result is that we apply the repository format exactly once, and
> > that's also the point in time where we can finalize the setup of the
> > repo's data structures as we know about all details of the repo at that
> > time. Ultimately, this makes it trivial to introduce the "objectStorage"
> > extension, even though that's not part of this patch series.
> >
> > The series is built on top of aec3f58750 (Sync with 'maint', 2026-05-21)
> > with ps/setup-wo-the-repository at df69f40c34 (setup: stop using
> > `the_repository` in `init_db()`, 2026-05-19) merged into it.
> >
> 
> Apart from some questions/comments, the series looks good. Thanks

Thanks for your review! Will send v3 in a bit.

Patrick

^ permalink raw reply

* Re: [PATCH v2 4/8] repository: stop initializing the object database in `repo_set_gitdir()`
From: Patrick Steinhardt @ 2026-06-04  6:08 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, Kristoffer Haugsbakk, Junio C Hamano
In-Reply-To: <CAOLa=ZQ5u+J-f=xS7RDym0cwt+=R2dzMFo5P34cp-CBbza7NRg@mail.gmail.com>

On Wed, Jun 03, 2026 at 05:49:50AM -0700, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > diff --git a/repository.c b/repository.c
> > index 58a13f7c4f..2c2395105f 100644
> > --- a/repository.c
> > +++ b/repository.c
> > @@ -181,12 +181,6 @@ void repo_set_gitdir(struct repository *repo,
> >  	free(old_gitdir);
> >
> >  	repo_set_commondir(repo, o->commondir);
> > -
> > -	if (!repo->objects)
> > -		repo->objects = odb_new(repo, o->object_dir, o->alternate_db);
> > -	else if (!o->skip_initializing_odb)
> > -		BUG("cannot reinitialize an already-initialized object directory");
> > -
> 
> This always confuses me, so we were creating the odb even if
> `o->skip_initializing_odb` was set to true, if `repo->objects` didn't
> exist. Weird.

Agreed, it was weird. It was my first iteration towards centralizing
`odb_new()`: before we had the above logic we were basically recreating
the ODB multiple times, which was even more weird. At least things are
getting somewhat sensible with this patch now.

Patrick

^ permalink raw reply

* Re: [PATCH v2 3/8] setup: deduplicate logic to apply repository format
From: Patrick Steinhardt @ 2026-06-04  6:08 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, Kristoffer Haugsbakk, Junio C Hamano
In-Reply-To: <CAOLa=ZSnDz1+C8y7ozFDdv68vqLFk-E+FsXhAnhwbm2D6a1Fng@mail.gmail.com>

On Wed, Jun 03, 2026 at 05:43:34AM -0700, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > diff --git a/repository.c b/repository.c
> > index db57b8308b..58a13f7c4f 100644
> > --- a/repository.c
> > +++ b/repository.c
> > @@ -262,8 +262,8 @@ void repo_set_worktree(struct repository *repo, const char *path)
> >  	trace2_def_repo(repo);
> >  }
> >
> > -static int read_and_verify_repository_format(struct repository_format *format,
> > -					     const char *commondir)
> > +static int read_repository_format_from_commondir(struct repository_format *format,
> > +						 const char *commondir)
> 
> Nit: The commit explicitly calls out one rename, but this one wasn't.

Fair. I'll add a sentence or two about this.

> > @@ -272,11 +272,6 @@ static int read_and_verify_repository_format(struct repository_format *format,
> >  	read_repository_format(format, sb.buf);
> >  	strbuf_reset(&sb);
> >
> > -	if (verify_repository_format(format, &sb) < 0) {
> > -		warning("%s", sb.buf);
> > -		ret = -1;
> > -	}
> > -
> 
> So we remove this, so that the callee would independently verify the
> format I assume.
> 
> Edit: seems like we call verify_repository_format() within
> apply_repository_format() and the latter is called by the callee.
> 
> >  	strbuf_release(&sb);
> >  	return ret;
> >  }

Yeah. I guess this could be explained a bit better.

> > @@ -290,6 +285,8 @@ int repo_init(struct repository *repo,
> >  	      const char *worktree)
> >  {
> >  	struct repository_format format = REPOSITORY_FORMAT_INIT;
> > +	struct strbuf err = STRBUF_INIT;
> > +
> >  	memset(repo, 0, sizeof(*repo));
> >
> >  	initialize_repository(repo);
> > @@ -297,21 +294,13 @@ int repo_init(struct repository *repo,
> >  	if (repo_init_gitdir(repo, gitdir))
> >  		goto error;
> >
> > -	if (read_and_verify_repository_format(&format, repo->commondir))
> > +	if (read_repository_format_from_commondir(&format, repo->commondir))
> >  		goto error;
> >
> > -	repo_set_hash_algo(repo, format.hash_algo);
> > -	repo_set_compat_hash_algo(repo, format.compat_hash_algo);
> > -	repo_set_ref_storage_format(repo, format.ref_storage_format,
> > -				    format.ref_storage_payload);
> > -	repo->repository_format_worktree_config = format.worktree_config;
> > -	repo->repository_format_relative_worktrees = format.relative_worktrees;
> > -	repo->repository_format_precious_objects = format.precious_objects;
> > -	repo->repository_format_submodule_path_cfg = format.submodule_path_cfg;
> > -
> > -	/* take ownership of format.partial_clone */
> 
> I see that we now do an xstrdup for format.partial_clone, meaning we
> have our own memory segment to care about. Do we have to worry about
> format.partial_clone not being free'd?

No, `clear_repository_format()` already releases the memory for us. It
also did beforehand, but there we did the dance of just moving ownership
over. So we already had to free the string before.

> > diff --git a/setup.h b/setup.h
> > index 9409326fe4..5ed92f53fa 100644
> > --- a/setup.h
> > +++ b/setup.h
> > @@ -221,6 +221,15 @@ void clear_repository_format(struct repository_format *format);
> >  int verify_repository_format(const struct repository_format *format,
> >  			     struct strbuf *err);
> >
> > +/*
> > + * Apply the given repository format to the repo. This initializes extensions
> > + * and basic data structures required for normal operation. Returns 0 on
> > + * success, a negative error code otherwise.
> > + */
> 
> Nit: perhaps we should also mention that we verify the format?

Will do.

Patrick

^ permalink raw reply

* Re: [PATCH v2 2/3] Documentation/MyFirstContribution: recommend the use of b4
From: Toon Claes @ 2026-06-04  5:25 UTC (permalink / raw)
  To: Patrick Steinhardt, git
  Cc: Junio C Hamano, Tuomas Ahola, Weijie Yuan, Ramsay Jones
In-Reply-To: <20260603-pks-b4-v2-2-a8aea0aa2c23@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> The b4 tool originates from the Linux kernel community and is intended
> to help mailing-list based workflows. It automates a lot of the annoying
> bookkeeping tasks that contributors typically need to do: tracking the
> list of recipients, Message-IDs, range-diffs and the like. In addition
> to that, b4 also has many other subcommands that help the maintainer and
> reviewers.
>
> The Git project uses the same infrastructure as the kernel, so this tool
> is also a very good fit for us. Adapt "MyFirstContribution" to
> explicitly recommend its use.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  Documentation/MyFirstContribution.adoc | 92 ++++++++++++++++++++++++++++++++--
>  Documentation/SubmittingPatches        |  6 ++-
>  2 files changed, 93 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/MyFirstContribution.adoc b/Documentation/MyFirstContribution.adoc
> index 069020196c..fc0b06ae67 100644
> --- a/Documentation/MyFirstContribution.adoc
> +++ b/Documentation/MyFirstContribution.adoc
> @@ -833,7 +833,7 @@ This patchset is part of the MyFirstContribution tutorial and should not
>  be merged.
>  ----
>  
> -At this point the tutorial diverges, in order to demonstrate two
> +At this point the tutorial diverges, in order to demonstrate three
>  different methods of formatting your patchset and getting it reviewed.
>  
>  The first method to be covered is GitGitGadget, which is useful for those
> @@ -845,9 +845,14 @@ more fine-grained control over the emails to be sent. This method requires some
>  setup which can change depending on your system and will not be covered in this
>  tutorial.
>  
> +The third method to be covered is `b4`, which builds on top of `git
> +format-patch` and `git send-email`. This method is the recommended way to
> +submit patches via mail as it automates a lot of the bookkeeping required by
> +`git send-email`.

The GitGitGadget method includes Running CI, maybe that's worth
mentioning the user is responsible themselves to run the whole test
suite? Or is this outside the scope of this series, since `git
send-email` doesn't include that too.

-- 
Cheers,
Toon

^ permalink raw reply

* [GSoC] [Blog] week 1: Improving the new git repo command
From: K Jayatheerth @ 2026-06-04  5:23 UTC (permalink / raw)
  To: GIT Mailing-list, Justin Tobler, Lucas Seiki Oshiro

Hey everyone,

My Week 1 GSoC blog is live!
https://jayatheerth.com/blogs/gsoc/week-1-path-foundation

Feel free to give it a read and share any feedback ; )

Regards,
- K Jayatheerth

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox