From: "Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "D. Ben Knoble" <ben.knoble@gmail.com>,
Eric Sunshine <sunshine@sunshineco.com>,
Michael Montalbo <mmontalbo@gmail.com>,
Michael Montalbo <mmontalbo@gmail.com>
Subject: [PATCH 3/6] t: fix Lexer line count for $() inside double-quoted strings
Date: Thu, 04 Jun 2026 07:45:55 +0000 [thread overview]
Message-ID: <93c2b29683ff158920013af37cd28e1c2f4e2617.1780559158.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2135.git.1780559158.gitgitgadget@gmail.com>
From: Michael Montalbo <mmontalbo@gmail.com>
scan_dqstring's post-loop newline counter re-counts newlines that
were already counted during recursive parsing of $() bodies. This
happens because scan_dollar's returned text can contain newlines
(from token text of multi-line strings and from \n command separator
tokens), and the catch-all counter at the end of scan_dqstring
counts all of them again.
Fix this by counting newlines inline as non-special characters are
consumed, and removing the post-loop catch-all. Each newline is
now counted exactly once: literal newlines at the inline match,
line splices at the \<newline> handler, and $() newlines by
scan_token during the recursive parse.
This does not affect chainlint's output because chainlint annotates
the original body text using byte offsets, not token line numbers.
It does matter for tools like lint-style.pl (introduced in a
subsequent commit) that use token line numbers to locate and fix
specific lines in the original file.
Add check-shell-parser.pl to verify that the Lexer reports correct
line numbers after multi-line $() in double-quoted strings.
Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
t/Makefile | 7 +++--
t/check-shell-parser.pl | 58 +++++++++++++++++++++++++++++++++++++++++
t/lib-shell-parser.pl | 11 +++++---
3 files changed, 71 insertions(+), 5 deletions(-)
create mode 100644 t/check-shell-parser.pl
diff --git a/t/Makefile b/t/Makefile
index ab8a5b54aa..25f923fed9 100644
--- a/t/Makefile
+++ b/t/Makefile
@@ -139,7 +139,7 @@ check-meson:
test-lint: test-lint-duplicates test-lint-executable \
test-lint-filenames
ifneq ($(PERL_PATH),)
-test-lint: test-lint-shell-syntax
+test-lint: test-lint-shell-syntax check-shell-parser
else
GIT_TEST_CHAIN_LINT = 0
endif
@@ -160,6 +160,8 @@ test-lint-executable:
test-lint-shell-syntax:
@'$(PERL_PATH_SQ)' check-non-portable-shell.pl $(T) $(THELPERS) $(TPERF)
+check-shell-parser:
+ @'$(PERL_PATH_SQ)' check-shell-parser.pl
test-lint-filenames:
@# We do *not* pass a glob to ls-files but use grep instead, to catch
@# non-ASCII characters (which are quoted within double-quotes)
@@ -185,7 +187,8 @@ perf:
$(MAKE) -C perf/ all
.PHONY: pre-clean $(T) aggregate-results clean valgrind perf \
- check-chainlint clean-chainlint test-chainlint $(UNIT_TESTS)
+ check-chainlint clean-chainlint test-chainlint \
+ check-shell-parser $(UNIT_TESTS)
.PHONY: libgit-sys-test libgit-rs-test
libgit-sys-test:
diff --git a/t/check-shell-parser.pl b/t/check-shell-parser.pl
new file mode 100644
index 0000000000..7d4ba6da7f
--- /dev/null
+++ b/t/check-shell-parser.pl
@@ -0,0 +1,58 @@
+#!/usr/bin/perl
+
+# Tests for the shared shell parser (lib-shell-parser.pl).
+
+use strict;
+use warnings;
+use File::Basename;
+
+my $_lib = dirname($0) . "/lib-shell-parser.pl";
+$_lib = "./$_lib" unless $_lib =~ m{^/};
+do $_lib or die "$0: failed to load $_lib: $@$!\n";
+
+my $rc = 0;
+
+sub check {
+ my ($desc, $body, $want_token, $want_line) = @_;
+ my $parser = ShellParser->new(\$body);
+ my @tokens = $parser->parse();
+ for my $t (reverse @tokens) {
+ next unless $t->[0] eq $want_token && defined $t->[3];
+ if ($t->[3] != $want_line) {
+ print STDERR "FAIL: $desc: " .
+ "'$want_token' at line $t->[3], " .
+ "expected line $want_line\n";
+ $rc = 1;
+ }
+ return;
+ }
+ print STDERR "FAIL: $desc: token '$want_token' not found\n";
+ $rc = 1;
+}
+
+# Multi-line $() inside a dq-string: MARKER should be at line 3.
+check('dq-string with multi-line $()', <<'BODY', 'MARKER', 3);
+ x="$(echo one
+ echo two)" &&
+ MARKER here
+BODY
+
+# Two multi-line $() substitutions: verifies drift does not accumulate.
+# MARKER should be at line 5.
+check('two dq-string $()', <<'BODY', 'MARKER', 5);
+ x="$(echo a
+ b)" &&
+ y="$(echo c
+ d)" &&
+ MARKER here
+BODY
+
+# $() outside a dq-string: no double-counting either way.
+# MARKER should be at line 3.
+check('bare $() spanning lines', <<'BODY', 'MARKER', 3);
+ x=$(echo one
+ echo two) &&
+ MARKER here
+BODY
+
+exit $rc;
diff --git a/t/lib-shell-parser.pl b/t/lib-shell-parser.pl
index 1e521a94f8..fa9b44d6ec 100644
--- a/t/lib-shell-parser.pl
+++ b/t/lib-shell-parser.pl
@@ -89,8 +89,14 @@ sub scan_dqstring {
my $b = $self->{buff};
my $s = '"';
while (1) {
- # slurp up non-special characters
- $s .= $1 if $$b =~ /\G([^"\$\\]+)/gc;
+ # slurp up non-special characters; count newlines
+ # inline so we don't need a catch-all counter that
+ # would miscount newlines from recursive $() parsing
+ if ($$b =~ /\G([^"\$\\]+)/gc) {
+ my $chunk = $1;
+ $self->{lineno} += () = $chunk =~ /\n/sg;
+ $s .= $chunk;
+ }
# handle special characters
last unless $$b =~ /\G(.)/sgc;
my $c = $1;
@@ -107,7 +113,6 @@ sub scan_dqstring {
}
die("internal error scanning dq-string '$c'\n");
}
- $self->{lineno} += () = $s =~ /\n/sg;
return $s;
}
--
gitgitgadget
next prev parent reply other threads:[~2026-06-04 7:46 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-04 7:45 [PATCH 0/6] t: add lint-style.pl and convert grep to test_grep Michael Montalbo via GitGitGadget
2026-06-04 7:45 ` [PATCH 1/6] t/README: document test_grep helper Michael Montalbo via GitGitGadget
2026-06-04 7:45 ` [PATCH 2/6] t: extract chainlint's parser into shared module Michael Montalbo via GitGitGadget
2026-06-04 7:45 ` Michael Montalbo via GitGitGadget [this message]
2026-06-04 7:45 ` [PATCH 4/6] t: add lint-style.pl with test_grep negation rule Michael Montalbo via GitGitGadget
2026-06-04 7:45 ` [PATCH 5/6] t: fix grep assertions missing file arguments Michael Montalbo via GitGitGadget
2026-06-04 7:45 ` [PATCH 6/6] t: lint and convert grep assertions to test_grep Michael Montalbo via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=93c2b29683ff158920013af37cd28e1c2f4e2617.1780559158.git.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=ben.knoble@gmail.com \
--cc=git@vger.kernel.org \
--cc=mmontalbo@gmail.com \
--cc=sunshine@sunshineco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox