git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Moumita <dhar61595@gmail.com>
To: git@vger.kernel.org
Cc: Moumita <dhar61595@gmail.com>, "Johannes Sixt" <j6t@kdbg.org>,
	"Eric Sunshine" <sunshine@sunshineco.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"René Scharfe" <l.s.r@web.de>,
	"Atharva Raykar" <raykar.ath@gmail.com>,
	"D. Ben Knoble" <ben.knoble@gmail.com>
Subject: [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns
Date: Sat, 29 Mar 2025 01:35:24 +0530	[thread overview]
Message-ID: <20250328200525.4437-1-dhar61595@gmail.com> (raw)
In-Reply-To: <20250218153537.16320-1-dhar61595@gmail.com>

This patch improves function detection in userdiff for Bash scripts.  
The old regex tried to match function bodies explicitly, which caused  
issues with line continuations (`\`) and simple command bodies. Instead,  
I have replaced it with `.*$`, making it more consistent with other userdiff  
drivers and ensuring we capture the full function definition line.  

I also refined the word regex to better handle Bash syntax, including  
parameter expansions, arithmetic expressions, and command-line options.  

I have  added test cases to cover these changes, making sure everything  
works as expected.

Moumita Dhar (1):
  userdiff: extend Bash pattern to cover more shell function forms

 t/t4018/bash-bashism-style-multiline-function |  4 +++
 t/t4018/bash-posix-style-multiline-function   |  4 +++
 .../bash-posix-style-single-command-function  |  3 ++
 t/t4034-diff-words.sh                         |  1 +
 t/t4034/bash/expect                           | 30 +++++++++++++++++++
 t/t4034/bash/post                             | 25 ++++++++++++++++
 t/t4034/bash/pre                              | 25 ++++++++++++++++
 userdiff.c                                    | 24 +++++++++++----
 8 files changed, 110 insertions(+), 6 deletions(-)
 create mode 100644 t/t4018/bash-bashism-style-multiline-function
 create mode 100644 t/t4018/bash-posix-style-multiline-function
 create mode 100644 t/t4018/bash-posix-style-single-command-function
 create mode 100644 t/t4034/bash/expect
 create mode 100644 t/t4034/bash/post
 create mode 100644 t/t4034/bash/pre

Range-diff against v2:
1:  de2e8f9792 ! 1:  3d077fadc4 userdiff: extend Bash pattern to cover more shell function forms
    @@ Metadata
      ## Commit message ##
         userdiff: extend Bash pattern to cover more shell function forms
     
    -    The existing Bash userdiff pattern misses some shell function forms, such as
    -    `function foo()`, multi-line definitions, and extra whitespace.
    +    The previous function regex required explicit matching of function
    +    bodies using `{`, `(`, `((`, or `[[`, which caused several issues:
     
    -    Extend the pattern to:
    -    - Support `function foo()` syntax.
    -    - Allow spaces in `foo ( )` definitions.
    -    - Recognize multi-line definitions with backslashes.
    -    - Broaden function body detection.
    +    - It failed to capture valid functions where `{` was on the next line
    +      due to line continuation (`\`).
    +    - It did not recognize functions with single  command body, such as
    +      `x () echo hello`.
    +
    +    Replacing the function body matching logic with `.*$`, ensures
    +    that everything on the function definition line is captured,
    +    aligning with other userdiff drivers and improving hunk headers in
    +    `git diff`.
    +
    +    Additionally, the word regex is refined to better recognize shell
    +    syntax, including additional parameter expansion operators and
    +    command-line options, improving syntax-aware diffs.
     
         Signed-off-by: Moumita Dhar <dhar61595@gmail.com>
     
    + ## t/t4018/bash-bashism-style-multiline-function (new) ##
    +@@
    ++function RIGHT \
    ++{    
    ++    echo 'ChangeMe'
    ++}
    + \ No newline at end of file
    +
    + ## t/t4018/bash-posix-style-multiline-function (new) ##
    +@@
    ++RIGHT() \
    ++{
    ++    ChangeMe
    ++}
    + \ No newline at end of file
    +
    + ## t/t4018/bash-posix-style-single-command-function (new) ##
    +@@
    ++RIGHT() echo "hello"
    ++
    ++    ChangeMe
    +
    + ## t/t4034-diff-words.sh ##
    +@@ t/t4034-diff-words.sh: test_expect_success 'unset default driver' '
    + 
    + test_language_driver ada
    + test_language_driver bibtex
    ++test_language_driver bash
    + test_language_driver cpp
    + test_language_driver csharp
    + test_language_driver css
    +
    + ## t/t4034/bash/expect (new) ##
    +@@
    ++<BOLD>diff --git a/pre b/post<RESET>
    ++<BOLD>index 09ac008..60ba6a2 100644<RESET>
    ++<BOLD>--- a/pre<RESET>
    ++<BOLD>+++ b/post<RESET>
    ++<CYAN>@@ -1,25 +1,25 @@<RESET>
    ++<RED>my_var<RESET><GREEN>new_var<RESET>=10
    ++x=<RED>123<RESET><GREEN>456<RESET>
    ++y=<RED>3.14<RESET><GREEN>2.71<RESET>
    ++z=<RED>.5<RESET><GREEN>.75<RESET>
    ++echo <RED>$USER<RESET><GREEN>$USERNAME<RESET>
    ++${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>}
    ++if [ "<RED>$a<RESET><GREEN>$x<RESET>" == "<RED>$b<RESET><GREEN>$y<RESET>" ] || [ "<RED>$c<RESET><GREEN>$x<RESET>" != "<RED>$d<RESET><GREEN>$y<RESET>" ]; then echo "OK"; fi
    ++((<RED>a<RESET><GREEN>x<RESET>+=<RED>b<RESET><GREEN>y<RESET>))
    ++((<RED>a<RESET><GREEN>x<RESET>-=<RED>b<RESET><GREEN>y<RESET>))
    ++$((<RED>a<RESET><GREEN>x<RESET><<<RED>b<RESET><GREEN>y<RESET>))
    ++$((<RED>a<RESET><GREEN>x<RESET>>><RED>b<RESET><GREEN>y<RESET>))
    ++${<RED>a<RESET><GREEN>x<RESET>:-<RED>b<RESET><GREEN>y<RESET>}
    ++${<RED>a<RESET><GREEN>x<RESET>:=<RED>b<RESET><GREEN>y<RESET>}
    ++${<RED>a<RESET><GREEN>x<RESET>##*/}
    ++${<RED>a<RESET><GREEN>x<RESET>%.*}
    ++${<RED>a<RESET><GREEN>x<RESET>%%.*}
    ++${<RED>a<RESET><GREEN>x<RESET>^^}
    ++${<RED>a<RESET><GREEN>x<RESET>,}
    ++${<RED>a<RESET><GREEN>x<RESET>,,}
    ++${!<RED>a<RESET><GREEN>x<RESET>}
    ++${<RED>a<RESET><GREEN>x<RESET>[@]}
    ++${<RED>a<RESET><GREEN>x<RESET>:?error message}
    ++${<RED>a<RESET><GREEN>x<RESET>:2:3}
    ++ls <RED>-a<RESET><GREEN>-x<RESET>
    ++ls <RED>--a<RESET><GREEN>--x<RESET>
    +
    + ## t/t4034/bash/post (new) ##
    +@@
    ++new_var=10
    ++x=456
    ++y=2.71
    ++z=.75
    ++echo $USERNAME
    ++${HOMEDIR}
    ++if [ "$x" == "$y" ] || [ "$x" != "$y" ]; then echo "OK"; fi
    ++((x+=y))
    ++((x-=y))
    ++$((x<<y))
    ++$((x>>y))
    ++${x:-y}
    ++${x:=y}
    ++${x##*/}
    ++${x%.*}
    ++${x%%.*}
    ++${x^^}
    ++${x,}
    ++${x,,}
    ++${!x}
    ++${x[@]}
    ++${x:?error message}
    ++${x:2:3}
    ++ls -x
    ++ls --x
    +
    + ## t/t4034/bash/pre (new) ##
    +@@
    ++my_var=10
    ++x=123
    ++y=3.14
    ++z=.5
    ++echo $USER
    ++${HOME}
    ++if [ "$a" == "$b" ] || [ "$c" != "$d" ]; then echo "OK"; fi
    ++((a+=b))
    ++((a-=b))
    ++$((a << b))
    ++$((a >> b))
    ++${a:-b}
    ++${a:=b}
    ++${a##*/}
    ++${a%.*}
    ++${a%%.*}
    ++${a^^}
    ++${a,}
    ++${a,,}
    ++${!a}
    ++${a[@]}
    ++${a:?error message}
    ++${a:2:3}
    ++ls -a
    ++ls --a
    +
      ## userdiff.c ##
    -@@ userdiff.c: IPATTERN("ada",
    - 	 "|[-+]?[0-9][0-9#_.aAbBcCdDeEfF]*([eE][+-]?[0-9_]+)?"
    - 	 "|=>|\\.\\.|\\*\\*|:=|/=|>=|<=|<<|>>|<>"),
    - PATTERNS("bash",
    --	 /* Optional leading indentation */
    -+     /* Optional leading indentation */
    - 	 "^[ \t]*"
    --	 /* Start of captured text */
    -+	 /* Start of captured function name */
    - 	 "("
    - 	 "("
    --	     /* POSIX identifier with mandatory parentheses */
    --	     "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
    -+		 /* POSIX identifier with mandatory parentheses (allow spaces inside) */
    -+		 "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\)"
    - 	 "|"
    --	     /* Bashism identifier with optional parentheses */
    --	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
    -+		 /* Bash-style function definitions, allowing optional `function` keyword */
    -+		 "(?:function[ \t]+(?=[a-zA-Z_]))?[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))?"
    +@@ userdiff.c: PATTERNS("bash",
    + 	     /* Bashism identifier with optional parentheses */
    + 	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
      	 ")"
    - 	 /* Optional whitespace */
    - 	 "[ \t]*"
    +-	 /* Optional whitespace */
    +-	 "[ \t]*"
     -	 /* Compound command starting with `{`, `(`, `((` or `[[` */
     -	 "(\\{|\\(\\(?|\\[\\[)"
    --	 /* End of captured text */
    -+	 /* Allow function body to start with `{`, `(` (subshell), `[[` */
    -+	 "(\\{|\\(|\\[\\[)"
    -+	 /* End of captured function name */
    ++	 /* Everything after the function header is captured  */
    ++	 ".*$"
    + 	 /* End of captured text */
      	 ")",
      	 /* -- */
     -	 /* Characters not in the default $IFS value */
     -	 "[^ \t]+"),
     +	 /* Identifiers: variable and function names */
    -+	 "[a-zA-Z_][a-zA-Z0-9_]*"
    ++	  "[a-zA-Z_][a-zA-Z0-9_]*"
     +	 /* Numeric constants: integers and decimals */
    -+	 "|[-+]?[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+"
    -+	 /* Shell variables: `$VAR`, `${VAR}` */
    -+	 "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{[^}]+\\}"
    -+	 /* Logical and comparison operators */
    ++	  "|[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+"
    ++	 /* Shell variables: $VAR, ${VAR} */
    ++	  "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{"
    ++	  /* Logical and comparison operators */
     +	 "|\\|\\||&&|<<|>>|==|!=|<=|>="
     +	 /* Assignment and arithmetic operators */
     +	 "|[-+*/%&|^!=<>]=?"
    -+	 /* Command-line options (to avoid splitting `-option`) */
    ++	 /* Additional parameter expansion operators */
    ++	 "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|/[a-zA-Z0-9_-]+|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?"
    ++	 /* Command-line options (to avoid splitting -option) */
     +	 "|--?[a-zA-Z0-9_-]+"
     +	 /* Brackets and grouping symbols */
     +	 "|\\(|\\)|\\{|\\}|\\[|\\]"),
-- 
2.48.0


  parent reply	other threads:[~2025-03-28 20:07 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-11 11:46 [PATCH 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Moumita
2025-02-11 11:46 ` [PATCH 1/1] Added built in function recognition for shell Moumita
2025-02-15 14:37   ` Johannes Sixt
2025-02-18 15:35 ` [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Moumita
2025-02-18 15:35   ` [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
2025-02-18 19:30     ` Junio C Hamano
2025-02-22 18:15       ` Johannes Sixt
2025-02-24 16:28         ` Junio C Hamano
2025-02-18 23:38     ` Junio C Hamano
2025-02-22 18:14     ` Johannes Sixt
2025-02-18 17:30   ` [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Eric Sunshine
2025-03-28 20:05   ` Moumita [this message]
2025-03-28 20:05     ` [PATCH v3 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
2025-03-29 19:26     ` [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns Junio C Hamano
2025-03-30 12:28       ` MOUMITA DHAR
2025-03-30 13:39     ` [PATCH v4 0/1][GSOC] userdiff:Added newlines at the end of the test cases Moumita
2025-03-30 13:39       ` [PATCH v4 1/1][GSOC] userdiff: extend Bash pattern to cover more shell function forms Moumita
2025-05-02 21:27         ` Junio C Hamano
2025-05-06 16:30         ` Johannes Sixt
2025-05-10 11:37           ` MOUMITA DHAR
2025-05-10 12:40             ` Johannes Sixt
2025-05-11 12:58       ` [PATCH v5 0/1] Added the closing ")" to make sure is not unbalanced and corrected the tests for word diff Moumita
2025-05-11 12:58         ` [PATCH v5 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
2025-05-11 13:28         ` Moumita
2025-05-11 13:28           ` Moumita
2025-05-11 13:37         ` Moumita
2025-05-11 14:11         ` [PATCH v6 0/1] Added the newline after the test in t/4018 Moumita
2025-05-11 14:11           ` [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
2025-05-13 18:50             ` Junio C Hamano
2025-05-14  6:33               ` MOUMITA DHAR
2025-05-16  7:25             ` Johannes Sixt
2025-05-17 13:09               ` Junio C Hamano
2025-05-18  7:41                 ` Johannes Sixt
2025-05-16 14:45           ` [PATCH v7 0/1] Updated the word diff regex for Bash scripts Moumita
2025-05-16 14:45             ` [PATCH v7 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
2025-05-16 17:45               ` Johannes Sixt
2025-05-16 21:56                 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250328200525.4437-1-dhar61595@gmail.com \
    --to=dhar61595@gmail.com \
    --cc=ben.knoble@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=l.s.r@web.de \
    --cc=raykar.ath@gmail.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).