From: Moumita <dhar61595@gmail.com>
To: git@vger.kernel.org
Cc: Moumita <dhar61595@gmail.com>, "Johannes Sixt" <j6t@kdbg.org>,
"Eric Sunshine" <sunshine@sunshineco.com>,
"Junio C Hamano" <gitster@pobox.com>,
"René Scharfe" <l.s.r@web.de>,
"Atharva Raykar" <raykar.ath@gmail.com>,
"D. Ben Knoble" <ben.knoble@gmail.com>
Subject: [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns
Date: Sat, 29 Mar 2025 01:35:24 +0530 [thread overview]
Message-ID: <20250328200525.4437-1-dhar61595@gmail.com> (raw)
In-Reply-To: <20250218153537.16320-1-dhar61595@gmail.com>
This patch improves function detection in userdiff for Bash scripts.
The old regex tried to match function bodies explicitly, which caused
issues with line continuations (`\`) and simple command bodies. Instead,
I have replaced it with `.*$`, making it more consistent with other userdiff
drivers and ensuring we capture the full function definition line.
I also refined the word regex to better handle Bash syntax, including
parameter expansions, arithmetic expressions, and command-line options.
I have added test cases to cover these changes, making sure everything
works as expected.
Moumita Dhar (1):
userdiff: extend Bash pattern to cover more shell function forms
t/t4018/bash-bashism-style-multiline-function | 4 +++
t/t4018/bash-posix-style-multiline-function | 4 +++
.../bash-posix-style-single-command-function | 3 ++
t/t4034-diff-words.sh | 1 +
t/t4034/bash/expect | 30 +++++++++++++++++++
t/t4034/bash/post | 25 ++++++++++++++++
t/t4034/bash/pre | 25 ++++++++++++++++
userdiff.c | 24 +++++++++++----
8 files changed, 110 insertions(+), 6 deletions(-)
create mode 100644 t/t4018/bash-bashism-style-multiline-function
create mode 100644 t/t4018/bash-posix-style-multiline-function
create mode 100644 t/t4018/bash-posix-style-single-command-function
create mode 100644 t/t4034/bash/expect
create mode 100644 t/t4034/bash/post
create mode 100644 t/t4034/bash/pre
Range-diff against v2:
1: de2e8f9792 ! 1: 3d077fadc4 userdiff: extend Bash pattern to cover more shell function forms
@@ Metadata
## Commit message ##
userdiff: extend Bash pattern to cover more shell function forms
- The existing Bash userdiff pattern misses some shell function forms, such as
- `function foo()`, multi-line definitions, and extra whitespace.
+ The previous function regex required explicit matching of function
+ bodies using `{`, `(`, `((`, or `[[`, which caused several issues:
- Extend the pattern to:
- - Support `function foo()` syntax.
- - Allow spaces in `foo ( )` definitions.
- - Recognize multi-line definitions with backslashes.
- - Broaden function body detection.
+ - It failed to capture valid functions where `{` was on the next line
+ due to line continuation (`\`).
+ - It did not recognize functions with single command body, such as
+ `x () echo hello`.
+
+ Replacing the function body matching logic with `.*$`, ensures
+ that everything on the function definition line is captured,
+ aligning with other userdiff drivers and improving hunk headers in
+ `git diff`.
+
+ Additionally, the word regex is refined to better recognize shell
+ syntax, including additional parameter expansion operators and
+ command-line options, improving syntax-aware diffs.
Signed-off-by: Moumita Dhar <dhar61595@gmail.com>
+ ## t/t4018/bash-bashism-style-multiline-function (new) ##
+@@
++function RIGHT \
++{
++ echo 'ChangeMe'
++}
+ \ No newline at end of file
+
+ ## t/t4018/bash-posix-style-multiline-function (new) ##
+@@
++RIGHT() \
++{
++ ChangeMe
++}
+ \ No newline at end of file
+
+ ## t/t4018/bash-posix-style-single-command-function (new) ##
+@@
++RIGHT() echo "hello"
++
++ ChangeMe
+
+ ## t/t4034-diff-words.sh ##
+@@ t/t4034-diff-words.sh: test_expect_success 'unset default driver' '
+
+ test_language_driver ada
+ test_language_driver bibtex
++test_language_driver bash
+ test_language_driver cpp
+ test_language_driver csharp
+ test_language_driver css
+
+ ## t/t4034/bash/expect (new) ##
+@@
++<BOLD>diff --git a/pre b/post<RESET>
++<BOLD>index 09ac008..60ba6a2 100644<RESET>
++<BOLD>--- a/pre<RESET>
++<BOLD>+++ b/post<RESET>
++<CYAN>@@ -1,25 +1,25 @@<RESET>
++<RED>my_var<RESET><GREEN>new_var<RESET>=10
++x=<RED>123<RESET><GREEN>456<RESET>
++y=<RED>3.14<RESET><GREEN>2.71<RESET>
++z=<RED>.5<RESET><GREEN>.75<RESET>
++echo <RED>$USER<RESET><GREEN>$USERNAME<RESET>
++${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>}
++if [ "<RED>$a<RESET><GREEN>$x<RESET>" == "<RED>$b<RESET><GREEN>$y<RESET>" ] || [ "<RED>$c<RESET><GREEN>$x<RESET>" != "<RED>$d<RESET><GREEN>$y<RESET>" ]; then echo "OK"; fi
++((<RED>a<RESET><GREEN>x<RESET>+=<RED>b<RESET><GREEN>y<RESET>))
++((<RED>a<RESET><GREEN>x<RESET>-=<RED>b<RESET><GREEN>y<RESET>))
++$((<RED>a<RESET><GREEN>x<RESET><<<RED>b<RESET><GREEN>y<RESET>))
++$((<RED>a<RESET><GREEN>x<RESET>>><RED>b<RESET><GREEN>y<RESET>))
++${<RED>a<RESET><GREEN>x<RESET>:-<RED>b<RESET><GREEN>y<RESET>}
++${<RED>a<RESET><GREEN>x<RESET>:=<RED>b<RESET><GREEN>y<RESET>}
++${<RED>a<RESET><GREEN>x<RESET>##*/}
++${<RED>a<RESET><GREEN>x<RESET>%.*}
++${<RED>a<RESET><GREEN>x<RESET>%%.*}
++${<RED>a<RESET><GREEN>x<RESET>^^}
++${<RED>a<RESET><GREEN>x<RESET>,}
++${<RED>a<RESET><GREEN>x<RESET>,,}
++${!<RED>a<RESET><GREEN>x<RESET>}
++${<RED>a<RESET><GREEN>x<RESET>[@]}
++${<RED>a<RESET><GREEN>x<RESET>:?error message}
++${<RED>a<RESET><GREEN>x<RESET>:2:3}
++ls <RED>-a<RESET><GREEN>-x<RESET>
++ls <RED>--a<RESET><GREEN>--x<RESET>
+
+ ## t/t4034/bash/post (new) ##
+@@
++new_var=10
++x=456
++y=2.71
++z=.75
++echo $USERNAME
++${HOMEDIR}
++if [ "$x" == "$y" ] || [ "$x" != "$y" ]; then echo "OK"; fi
++((x+=y))
++((x-=y))
++$((x<<y))
++$((x>>y))
++${x:-y}
++${x:=y}
++${x##*/}
++${x%.*}
++${x%%.*}
++${x^^}
++${x,}
++${x,,}
++${!x}
++${x[@]}
++${x:?error message}
++${x:2:3}
++ls -x
++ls --x
+
+ ## t/t4034/bash/pre (new) ##
+@@
++my_var=10
++x=123
++y=3.14
++z=.5
++echo $USER
++${HOME}
++if [ "$a" == "$b" ] || [ "$c" != "$d" ]; then echo "OK"; fi
++((a+=b))
++((a-=b))
++$((a << b))
++$((a >> b))
++${a:-b}
++${a:=b}
++${a##*/}
++${a%.*}
++${a%%.*}
++${a^^}
++${a,}
++${a,,}
++${!a}
++${a[@]}
++${a:?error message}
++${a:2:3}
++ls -a
++ls --a
+
## userdiff.c ##
-@@ userdiff.c: IPATTERN("ada",
- "|[-+]?[0-9][0-9#_.aAbBcCdDeEfF]*([eE][+-]?[0-9_]+)?"
- "|=>|\\.\\.|\\*\\*|:=|/=|>=|<=|<<|>>|<>"),
- PATTERNS("bash",
-- /* Optional leading indentation */
-+ /* Optional leading indentation */
- "^[ \t]*"
-- /* Start of captured text */
-+ /* Start of captured function name */
- "("
- "("
-- /* POSIX identifier with mandatory parentheses */
-- "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
-+ /* POSIX identifier with mandatory parentheses (allow spaces inside) */
-+ "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\)"
- "|"
-- /* Bashism identifier with optional parentheses */
-- "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
-+ /* Bash-style function definitions, allowing optional `function` keyword */
-+ "(?:function[ \t]+(?=[a-zA-Z_]))?[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))?"
+@@ userdiff.c: PATTERNS("bash",
+ /* Bashism identifier with optional parentheses */
+ "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
")"
- /* Optional whitespace */
- "[ \t]*"
+- /* Optional whitespace */
+- "[ \t]*"
- /* Compound command starting with `{`, `(`, `((` or `[[` */
- "(\\{|\\(\\(?|\\[\\[)"
-- /* End of captured text */
-+ /* Allow function body to start with `{`, `(` (subshell), `[[` */
-+ "(\\{|\\(|\\[\\[)"
-+ /* End of captured function name */
++ /* Everything after the function header is captured */
++ ".*$"
+ /* End of captured text */
")",
/* -- */
- /* Characters not in the default $IFS value */
- "[^ \t]+"),
+ /* Identifiers: variable and function names */
-+ "[a-zA-Z_][a-zA-Z0-9_]*"
++ "[a-zA-Z_][a-zA-Z0-9_]*"
+ /* Numeric constants: integers and decimals */
-+ "|[-+]?[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+"
-+ /* Shell variables: `$VAR`, `${VAR}` */
-+ "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{[^}]+\\}"
-+ /* Logical and comparison operators */
++ "|[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+"
++ /* Shell variables: $VAR, ${VAR} */
++ "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{"
++ /* Logical and comparison operators */
+ "|\\|\\||&&|<<|>>|==|!=|<=|>="
+ /* Assignment and arithmetic operators */
+ "|[-+*/%&|^!=<>]=?"
-+ /* Command-line options (to avoid splitting `-option`) */
++ /* Additional parameter expansion operators */
++ "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|/[a-zA-Z0-9_-]+|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?"
++ /* Command-line options (to avoid splitting -option) */
+ "|--?[a-zA-Z0-9_-]+"
+ /* Brackets and grouping symbols */
+ "|\\(|\\)|\\{|\\}|\\[|\\]"),
--
2.48.0
next prev parent reply other threads:[~2025-03-28 20:07 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-11 11:46 [PATCH 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Moumita
2025-02-11 11:46 ` [PATCH 1/1] Added built in function recognition for shell Moumita
2025-02-15 14:37 ` Johannes Sixt
2025-02-18 15:35 ` [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Moumita
2025-02-18 15:35 ` [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
2025-02-18 19:30 ` Junio C Hamano
2025-02-22 18:15 ` Johannes Sixt
2025-02-24 16:28 ` Junio C Hamano
2025-02-18 23:38 ` Junio C Hamano
2025-02-22 18:14 ` Johannes Sixt
2025-02-18 17:30 ` [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Eric Sunshine
2025-03-28 20:05 ` Moumita [this message]
2025-03-28 20:05 ` [PATCH v3 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
2025-03-29 19:26 ` [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns Junio C Hamano
2025-03-30 12:28 ` MOUMITA DHAR
2025-03-30 13:39 ` [PATCH v4 0/1][GSOC] userdiff:Added newlines at the end of the test cases Moumita
2025-03-30 13:39 ` [PATCH v4 1/1][GSOC] userdiff: extend Bash pattern to cover more shell function forms Moumita
2025-05-02 21:27 ` Junio C Hamano
2025-05-06 16:30 ` Johannes Sixt
2025-05-10 11:37 ` MOUMITA DHAR
2025-05-10 12:40 ` Johannes Sixt
2025-05-11 12:58 ` [PATCH v5 0/1] Added the closing ")" to make sure is not unbalanced and corrected the tests for word diff Moumita
2025-05-11 12:58 ` [PATCH v5 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
2025-05-11 13:28 ` Moumita
2025-05-11 13:28 ` Moumita
2025-05-11 13:37 ` Moumita
2025-05-11 14:11 ` [PATCH v6 0/1] Added the newline after the test in t/4018 Moumita
2025-05-11 14:11 ` [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
2025-05-13 18:50 ` Junio C Hamano
2025-05-14 6:33 ` MOUMITA DHAR
2025-05-16 7:25 ` Johannes Sixt
2025-05-17 13:09 ` Junio C Hamano
2025-05-18 7:41 ` Johannes Sixt
2025-05-16 14:45 ` [PATCH v7 0/1] Updated the word diff regex for Bash scripts Moumita
2025-05-16 14:45 ` [PATCH v7 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
2025-05-16 17:45 ` Johannes Sixt
2025-05-16 21:56 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250328200525.4437-1-dhar61595@gmail.com \
--to=dhar61595@gmail.com \
--cc=ben.knoble@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=j6t@kdbg.org \
--cc=l.s.r@web.de \
--cc=raykar.ath@gmail.com \
--cc=sunshine@sunshineco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).