git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts
@ 2025-02-11 11:46 Moumita
  2025-02-11 11:46 ` [PATCH 1/1] Added built in function recognition for shell Moumita
  2025-02-18 15:35 ` [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Moumita
  0 siblings, 2 replies; 37+ messages in thread
From: Moumita @ 2025-02-11 11:46 UTC (permalink / raw)
  To: git; +Cc: Moumita

This patch adds a built-in userdiff pattern for shell scripts, allowing Git to recognize function names in shell scripts when displaying diffs.
This change defines a regular expression for detecting shell functions.

Moumita (1):
  Added built in function recognition for shell

 userdiff.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

-- 
2.48.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 1/1] Added built in function recognition for shell
  2025-02-11 11:46 [PATCH 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Moumita
@ 2025-02-11 11:46 ` Moumita
  2025-02-15 14:37   ` Johannes Sixt
  2025-02-18 15:35 ` [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Moumita
  1 sibling, 1 reply; 37+ messages in thread
From: Moumita @ 2025-02-11 11:46 UTC (permalink / raw)
  To: git
  Cc: Moumita, Eric Sunshine, Junio C Hamano, René Scharfe,
	Atharva Raykar, D. Ben Knoble

Introduced a built-in userdiff driver for shell scripts, enabling
accurate function name recognition in `git diff` hunk headers.

Enhancements include:
- Function name detection for both POSIX and Bash/Ksh-style functions:
  - `function_name() { ... }`
  - `function function_name { ... }`
- Exclusion of shell keywords that can resemble function names,
  preventing false matches (e.g., `if`, `for`, `while`, `return`, etc.).
- Improved tokenization support for:
  - Identifiers (variable and function names)
  - Numeric constants (integers and decimals)
  - Shell variables (`$VAR`, `${VAR}`)
  - Logical (`&&`, `||`, `==`, `!=`, `<=`, `>=`) and arithmetic operators
  - Assignment and redirection operators
  - Brackets and grouping symbols

This update improves Git’s diff readability for shell scripts,
bringing it in line with existing built-in userdiff drivers.

Signed-off-by: Moumita <dhar61595@gmail.com>
---
 userdiff.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/userdiff.c b/userdiff.c
index 340c4eb4f7..a8c14807c6 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -334,6 +334,26 @@ PATTERNS("scheme",
 	 "\\|([^\\\\]*)\\|"
 	 /* All other words should be delimited by spaces or parentheses */
 	 "|([^][)(}{[ \t])+"),
+PATTERNS("shell",
+	 /* Negate shell keywords that can look like functions */
+	 "!^[ \t]*(if|elif|else|fi|for|while|until|case|esac|then|do|done|return|break|continue)\\b\n"
+	 /* POSIX-style shell functions: function_name() { ... } */
+	 "^[ \t]*([a-zA-Z_][a-zA-Z0-9_]*)[ \t]*\\(\\)[ \t]*\\{\n"
+	 /* Bash/Ksh-style functions: function function_name { ... } */
+	 "^[ \t]*function[ \t]+([a-zA-Z_][a-zA-Z0-9_]*)[ \t]*\\{\n",
+	 /* -- */
+	 /* Identifiers: variable and function names */
+	 "[a-zA-Z_][a-zA-Z0-9_]*"
+	 /* Numeric constants: integers and decimals */
+	 "|[-+]?[0-9]+(\\.[0-9]*)?"
+	 /* Shell variables: $VAR and ${VAR} */
+	 "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{[^}]+\\}"
+	 /* Logical and comparison operators */
+	 "|\\|\\||&&|<<|>>|==|!=|<=|>="
+	 /* Assignment and arithmetic operators */
+	 "|[-+*/%&|^!=<>]=?"
+	 /* Brackets and grouping symbols */
+	 "|\\(|\\)|\\{|\\}|\\[|\\]"),
 PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
 	 "\\\\[a-zA-Z@]+|\\\\.|([a-zA-Z0-9]|[^\x01-\x7f])+"),
 { .name = "default", .binary = -1 },
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/1] Added built in function recognition for shell
  2025-02-11 11:46 ` [PATCH 1/1] Added built in function recognition for shell Moumita
@ 2025-02-15 14:37   ` Johannes Sixt
  0 siblings, 0 replies; 37+ messages in thread
From: Johannes Sixt @ 2025-02-15 14:37 UTC (permalink / raw)
  To: Moumita
  Cc: Eric Sunshine, Junio C Hamano, René Scharfe, Atharva Raykar,
	D. Ben Knoble, git

Am 11.02.25 um 12:46 schrieb Moumita:
> Introduced a built-in userdiff driver for shell scripts, enabling
> accurate function name recognition in `git diff` hunk headers.
> 
> Enhancements include:
> - Function name detection for both POSIX and Bash/Ksh-style functions:
>   - `function_name() { ... }`
>   - `function function_name { ... }`
> - Exclusion of shell keywords that can resemble function names,
>   preventing false matches (e.g., `if`, `for`, `while`, `return`, etc.).
> - Improved tokenization support for:
>   - Identifiers (variable and function names)
>   - Numeric constants (integers and decimals)
>   - Shell variables (`$VAR`, `${VAR}`)
>   - Logical (`&&`, `||`, `==`, `!=`, `<=`, `>=`) and arithmetic operators
>   - Assignment and redirection operators
>   - Brackets and grouping symbols
> 
> This update improves Git’s diff readability for shell scripts,
> bringing it in line with existing built-in userdiff drivers.

Please remove the marketing tone from the commit message.

 - "accurate function name recognition": Is not possible because the
tools (regular expressions) don't have sufficient context to allow it.

 - "Enhancements include": Either list *all* enhancements, or focus on
noteworthy ones, but don't make a long list that's still incomplete.

 - The word "improve" is never needed in the context that we see it
here, because you certainly don't want to worsen the code base.

Please study section "Describe your changes well" in
Documentation/SubmittingPatches on how to write the commit message.
(Describe the state before the change in present tense, and the changes
in imperative mood.)

An important point is *why* we want this change. In the case of a new
userdiff driver, however, the answer is usally simply "because we can",
and everybody knows it. Please don't write a long paragraph that dances
around this fact. Just don't write it; but if there are other reason,
please do say so.

> 
> Signed-off-by: Moumita <dhar61595@gmail.com>

The name given in Signed-off-by has legal meaning. If you have a given
name and a surname, please use both names here and then by extension
also as patch author.

> ---
>  userdiff.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/userdiff.c b/userdiff.c
> index 340c4eb4f7..a8c14807c6 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -334,6 +334,26 @@ PATTERNS("scheme",
>  	 "\\|([^\\\\]*)\\|"
>  	 /* All other words should be delimited by spaces or parentheses */
>  	 "|([^][)(}{[ \t])+"),
> +PATTERNS("shell",

Correctly sorted into the list. Good!

HOWEVER! We already have a diffdriver for bash. It would be better to
extend that one than to introduce a new driver.

> +	 /* Negate shell keywords that can look like functions */
> +	 "!^[ \t]*(if|elif|else|fi|for|while|until|case|esac|then|do|done|return|break|continue)\\b\n"

This list looks unnecessary. The userdiff driver can assume that it
operates on syntactically valid text.

   if () {
   }

isn't correct, and that's the case for all listed keywords.

> +	 /* POSIX-style shell functions: function_name() { ... } */
> +	 "^[ \t]*([a-zA-Z_][a-zA-Z0-9_]*)[ \t]*\\(\\)[ \t]*\\{\n"

Two nitpicks:

- There can be whitespace between the parentheses.
- The function body can also be in parentheses.

   foo ( ) (
      echo
   )

would be a correct shell function definition. Its body always runs in a
sub-shell.

> +	 /* Bash/Ksh-style functions: function function_name { ... } */
> +	 "^[ \t]*function[ \t]+([a-zA-Z_][a-zA-Z0-9_]*)[ \t]*\\{\n",

Both here and above, just let the regular expression end before the
opening bracket. Then it would also recognize this oddity:

function x \
  { echo x; }

> +	 /* -- */
> +	 /* Identifiers: variable and function names */
> +	 "[a-zA-Z_][a-zA-Z0-9_]*"
> +	 /* Numeric constants: integers and decimals */
> +	 "|[-+]?[0-9]+(\\.[0-9]*)?"

Typically, a - or + isn't part of the number, but a separate operator. I
suggest to drop the leading '[-+]?'. But see below.

> +	 /* Shell variables: $VAR and ${VAR} */
> +	 "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{[^}]+\\}"

The uses of ${name} are rather exceptional. The cases where ${ is used
are the complex cases, for example:

    base=${fname##*/}
    base=${base%."$ext"}

In such cases, I would prefer that the constituents of the expression
are their own tokens and not lumped into a single "variable name" token.

> +	 /* Logical and comparison operators */
> +	 "|\\|\\||&&|<<|>>|==|!=|<=|>="
> +	 /* Assignment and arithmetic operators */
> +	 "|[-+*/%&|^!=<>]=?"

Which makes me think: Text in shell commands has only very little
restrictions. A typical case are command options starting with one or
two dashes. Do we want to separate the dash from the option name? This
regular expression would do so, and I would consider it a deficiency.

> +	 /* Brackets and grouping symbols */
> +	 "|\\(|\\)|\\{|\\}|\\[|\\]"),
>  PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
>  	 "\\\\[a-zA-Z@]+|\\\\.|([a-zA-Z0-9]|[^\x01-\x7f])+"),
>  { .name = "default", .binary = -1 },

It would be nice to have some tests. Have a look at t/t4018/bash-* and
t/t4034/* (no bash there, yet).

-- Hannes


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts
  2025-02-11 11:46 [PATCH 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Moumita
  2025-02-11 11:46 ` [PATCH 1/1] Added built in function recognition for shell Moumita
@ 2025-02-18 15:35 ` Moumita
  2025-02-18 15:35   ` [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
                     ` (2 more replies)
  1 sibling, 3 replies; 37+ messages in thread
From: Moumita @ 2025-02-18 15:35 UTC (permalink / raw)
  To: git
  Cc: Moumita, Johannes Sixt, Eric Sunshine, Junio C Hamano,
	René Scharfe, Atharva Raykar, D. Ben Knoble

The modifications that I made were -
"^[ \t]*([a-zA-Z_][a-zA-Z0-9_]*)[ \t]*\\([ \t]*\\)[ \t]*"  - so that is allows foo() and foo ( ).
"^[ \t]*([a-zA-Z_][a-zA-Z0-9_]*)[ \t]*\\([ \t]*\\)[ \t]*(\\{|\\(|\\[\\[)" - so that it recognises {, (, or [[ as function bodies
"^[ \t]*([a-zA-Z_][a-zA-Z0-9_]*)[ \t]*\\([ \t]*\\)[ \t]*(\\{|\\(|\\[\\[)|\\\\\n"-  so that it matches "function foo \
  { echo "hello"; }"
"(?:function[ \t]+(?=[a-zA-Z_]))?[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))?" - so that it matches function foo() { echo "hello"; }
and also function bar { echo "no parens"; }
"|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{[^}]+\\}"- It seperates ${} expansions
"|--?[a-zA-Z0-9_-]+" - It ensures -opt and --long-opt are treated as whole tokens.
"|[0-9]+(\\.[0-9]*)?" - It matches numbers without +/-


Moumita Dhar (1):
  userdiff: extend Bash pattern to cover more shell function forms

 userdiff.c | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

Range-diff against v1:
1:  683ea08819 ! 1:  de2e8f9792 userdiff : Added built in function recognition for shell
    @@
      ## Metadata ##
    -Author: Moumita <dhar61595@gmail.com>
    +Author: Moumita Dhar <dhar61595@gmail.com>
     
      ## Commit message ##
    -    userdiff : Added built in function recognition for shell
    +    userdiff: extend Bash pattern to cover more shell function forms
     
    -    Introduced a built-in userdiff driver for shell scripts, enabling
    -    accurate function name recognition in `git diff` hunk headers.
    +    The existing Bash userdiff pattern misses some shell function forms, such as
    +    `function foo()`, multi-line definitions, and extra whitespace.
     
    -    Enhancements include:
    -    - Function name detection for both POSIX and Bash/Ksh-style functions:
    -      - `function_name() { ... }`
    -      - `function function_name { ... }`
    -    - Exclusion of shell keywords that can resemble function names,
    -      preventing false matches (e.g., `if`, `for`, `while`, `return`, etc.).
    -    - Improved tokenization support for:
    -      - Identifiers (variable and function names)
    -      - Numeric constants (integers and decimals)
    -      - Shell variables (`$VAR`, `${VAR}`)
    -      - Logical (`&&`, `||`, `==`, `!=`, `<=`, `>=`) and arithmetic operators
    -      - Assignment and redirection operators
    -      - Brackets and grouping symbols
    +    Extend the pattern to:
    +    - Support `function foo()` syntax.
    +    - Allow spaces in `foo ( )` definitions.
    +    - Recognize multi-line definitions with backslashes.
    +    - Broaden function body detection.
     
    -    This update improves Git’s diff readability for shell scripts,
    -    bringing it in line with existing built-in userdiff drivers.
    -
    -    Signed-off-by: Moumita <dhar61595@gmail.com>
    +    Signed-off-by: Moumita Dhar <dhar61595@gmail.com>
     
      ## userdiff.c ##
    -@@ userdiff.c: PATTERNS("scheme",
    - 	 "\\|([^\\\\]*)\\|"
    - 	 /* All other words should be delimited by spaces or parentheses */
    - 	 "|([^][)(}{[ \t])+"),
    -+PATTERNS("shell",
    -+	 /* Negate shell keywords that can look like functions */
    -+	  "!^[ \t]*(if|elif|else|fi|for|while|until|case|esac|then|do|done|return|break|continue)\\b\n"
    -+	 /* POSIX-style shell functions: function_name() { ... } */
    -+	 "^[ \t]*([a-zA-Z_][a-zA-Z0-9_]*)[ \t]*\\(\\)[ \t]*\\{\n"
    -+	 /* Bash/Ksh-style functions: function function_name { ... } */
    -+	 "^[ \t]*function[ \t]+([a-zA-Z_][a-zA-Z0-9_]*)[ \t]*\\{\n",
    -+	 /* -- */
    +@@ userdiff.c: IPATTERN("ada",
    + 	 "|[-+]?[0-9][0-9#_.aAbBcCdDeEfF]*([eE][+-]?[0-9_]+)?"
    + 	 "|=>|\\.\\.|\\*\\*|:=|/=|>=|<=|<<|>>|<>"),
    + PATTERNS("bash",
    +-	 /* Optional leading indentation */
    ++     /* Optional leading indentation */
    + 	 "^[ \t]*"
    +-	 /* Start of captured text */
    ++	 /* Start of captured function name */
    + 	 "("
    + 	 "("
    +-	     /* POSIX identifier with mandatory parentheses */
    +-	     "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
    ++		 /* POSIX identifier with mandatory parentheses (allow spaces inside) */
    ++		 "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\)"
    + 	 "|"
    +-	     /* Bashism identifier with optional parentheses */
    +-	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
    ++		 /* Bash-style function definitions, allowing optional `function` keyword */
    ++		 "(?:function[ \t]+(?=[a-zA-Z_]))?[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))?"
    + 	 ")"
    + 	 /* Optional whitespace */
    + 	 "[ \t]*"
    +-	 /* Compound command starting with `{`, `(`, `((` or `[[` */
    +-	 "(\\{|\\(\\(?|\\[\\[)"
    +-	 /* End of captured text */
    ++	 /* Allow function body to start with `{`, `(` (subshell), `[[` */
    ++	 "(\\{|\\(|\\[\\[)"
    ++	 /* End of captured function name */
    + 	 ")",
    + 	 /* -- */
    +-	 /* Characters not in the default $IFS value */
    +-	 "[^ \t]+"),
     +	 /* Identifiers: variable and function names */
     +	 "[a-zA-Z_][a-zA-Z0-9_]*"
     +	 /* Numeric constants: integers and decimals */
    -+	 "|[-+]?[0-9]+(\\.[0-9]*)?"
    -+	 /* Shell variables: $VAR and ${VAR} */
    ++	 "|[-+]?[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+"
    ++	 /* Shell variables: `$VAR`, `${VAR}` */
     +	 "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{[^}]+\\}"
     +	 /* Logical and comparison operators */
     +	 "|\\|\\||&&|<<|>>|==|!=|<=|>="
     +	 /* Assignment and arithmetic operators */
     +	 "|[-+*/%&|^!=<>]=?"
    ++	 /* Command-line options (to avoid splitting `-option`) */
    ++	 "|--?[a-zA-Z0-9_-]+"
     +	 /* Brackets and grouping symbols */
     +	 "|\\(|\\)|\\{|\\}|\\[|\\]"),
    - PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
    - 	 "\\\\[a-zA-Z@]+|\\\\.|([a-zA-Z0-9]|[^\x01-\x7f])+"),
    - { .name = "default", .binary = -1 },
    + PATTERNS("bibtex",
    + 	 "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
    + 	 /* -- */
-- 
2.48.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms
  2025-02-18 15:35 ` [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Moumita
@ 2025-02-18 15:35   ` Moumita
  2025-02-18 19:30     ` Junio C Hamano
                       ` (2 more replies)
  2025-02-18 17:30   ` [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Eric Sunshine
  2025-03-28 20:05   ` [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns Moumita
  2 siblings, 3 replies; 37+ messages in thread
From: Moumita @ 2025-02-18 15:35 UTC (permalink / raw)
  To: git; +Cc: Moumita Dhar

From: Moumita Dhar <dhar61595@gmail.com>

The existing Bash userdiff pattern misses some shell function forms, such as
`function foo()`, multi-line definitions, and extra whitespace.

Extend the pattern to:
- Support `function foo()` syntax.
- Allow spaces in `foo ( )` definitions.
- Recognize multi-line definitions with backslashes.
- Broaden function body detection.

Signed-off-by: Moumita Dhar <dhar61595@gmail.com>
---
 userdiff.c | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/userdiff.c b/userdiff.c
index 340c4eb4f7..194e28883d 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -53,26 +53,38 @@ IPATTERN("ada",
 	 "|[-+]?[0-9][0-9#_.aAbBcCdDeEfF]*([eE][+-]?[0-9_]+)?"
 	 "|=>|\\.\\.|\\*\\*|:=|/=|>=|<=|<<|>>|<>"),
 PATTERNS("bash",
-	 /* Optional leading indentation */
+     /* Optional leading indentation */
 	 "^[ \t]*"
-	 /* Start of captured text */
+	 /* Start of captured function name */
 	 "("
 	 "("
-	     /* POSIX identifier with mandatory parentheses */
-	     "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
+		 /* POSIX identifier with mandatory parentheses (allow spaces inside) */
+		 "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\)"
 	 "|"
-	     /* Bashism identifier with optional parentheses */
-	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
+		 /* Bash-style function definitions, allowing optional `function` keyword */
+		 "(?:function[ \t]+(?=[a-zA-Z_]))?[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))?"
 	 ")"
 	 /* Optional whitespace */
 	 "[ \t]*"
-	 /* Compound command starting with `{`, `(`, `((` or `[[` */
-	 "(\\{|\\(\\(?|\\[\\[)"
-	 /* End of captured text */
+	 /* Allow function body to start with `{`, `(` (subshell), `[[` */
+	 "(\\{|\\(|\\[\\[)"
+	 /* End of captured function name */
 	 ")",
 	 /* -- */
-	 /* Characters not in the default $IFS value */
-	 "[^ \t]+"),
+	 /* Identifiers: variable and function names */
+	 "[a-zA-Z_][a-zA-Z0-9_]*"
+	 /* Numeric constants: integers and decimals */
+	 "|[-+]?[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+"
+	 /* Shell variables: `$VAR`, `${VAR}` */
+	 "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{[^}]+\\}"
+	 /* Logical and comparison operators */
+	 "|\\|\\||&&|<<|>>|==|!=|<=|>="
+	 /* Assignment and arithmetic operators */
+	 "|[-+*/%&|^!=<>]=?"
+	 /* Command-line options (to avoid splitting `-option`) */
+	 "|--?[a-zA-Z0-9_-]+"
+	 /* Brackets and grouping symbols */
+	 "|\\(|\\)|\\{|\\}|\\[|\\]"),
 PATTERNS("bibtex",
 	 "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
 	 /* -- */
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts
  2025-02-18 15:35 ` [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Moumita
  2025-02-18 15:35   ` [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
@ 2025-02-18 17:30   ` Eric Sunshine
  2025-03-28 20:05   ` [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns Moumita
  2 siblings, 0 replies; 37+ messages in thread
From: Eric Sunshine @ 2025-02-18 17:30 UTC (permalink / raw)
  To: Moumita
  Cc: git, Johannes Sixt, Junio C Hamano, René Scharfe,
	Atharva Raykar, D. Ben Knoble

On Tue, Feb 18, 2025 at 10:36 AM Moumita <dhar61595@gmail.com> wrote:
> The modifications that I made were -
> "^[ \t]*([a-zA-Z_][a-zA-Z0-9_]*)[ \t]*\\([ \t]*\\)[ \t]*"  - so that is allows foo() and foo ( ).
> "^[ \t]*([a-zA-Z_][a-zA-Z0-9_]*)[ \t]*\\([ \t]*\\)[ \t]*(\\{|\\(|\\[\\[)" - so that it recognises {, (, or [[ as function bodies

Regarding this last point, for completeness, I had meant to respond to
j6t's review of your patch by saying that, according to POSIX, a
function body can be any compound command, which means the body does
not need to be encased in braces. For instance, this works:

    $ foo() echo nothing
    $ foo
    nothing
    $

That said, I doubt that this usage is common, thus is probably not
worth worrying about.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms
  2025-02-18 15:35   ` [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
@ 2025-02-18 19:30     ` Junio C Hamano
  2025-02-22 18:15       ` Johannes Sixt
  2025-02-18 23:38     ` Junio C Hamano
  2025-02-22 18:14     ` Johannes Sixt
  2 siblings, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2025-02-18 19:30 UTC (permalink / raw)
  To: Moumita; +Cc: git

Moumita <dhar61595@gmail.com> writes:

>  PATTERNS("bash",
> -	 /* Optional leading indentation */
> +     /* Optional leading indentation */

What is this change about?

>  	 "^[ \t]*"
> -	 /* Start of captured text */
> +	 /* Start of captured function name */
>  	 "("
>  	 "("
> -	     /* POSIX identifier with mandatory parentheses */
> -	     "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
> +		 /* POSIX identifier with mandatory parentheses (allow spaces inside) */
> +		 "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\)"

Is indentation-change intended and required for this patch to work correctly?

>  	 "|"
> -	     /* Bashism identifier with optional parentheses */
> -	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
> +		 /* Bash-style function definitions, allowing optional `function` keyword */
> +		 "(?:function[ \t]+(?=[a-zA-Z_]))?[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))?"

Ditto.

Regular expressions are write-only language; please make sure that
you do not add any unnecessary changes to distract eyes of
reviewers from spotting the _real_ changes that improves the current
codebase.

>  	 ")"
>  	 /* Optional whitespace */
>  	 "[ \t]*"
> -	 /* Compound command starting with `{`, `(`, `((` or `[[` */
> -	 "(\\{|\\(\\(?|\\[\\[)"
> -	 /* End of captured text */
> +	 /* Allow function body to start with `{`, `(` (subshell), `[[` */
> +	 "(\\{|\\(|\\[\\[)"
> +	 /* End of captured function name */
>  	 ")",


>  	 /* -- */
> -	 /* Characters not in the default $IFS value */
> -	 "[^ \t]+"),

We used to pretty-much use "a run of non-whitespace characters is a
token".  Now we are a bit more picky.

Which may or may not be good, but it is hard to tell if it is an
improvement.

> +	 /* Identifiers: variable and function names */
> +	 "[a-zA-Z_][a-zA-Z0-9_]*"
> +	 /* Numeric constants: integers and decimals */
> +	 "|[-+]?[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+"
> +	 /* Shell variables: `$VAR`, `${VAR}` */
> +	 "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{[^}]+\\}"
> +	 /* Logical and comparison operators */
> +	 "|\\|\\||&&|<<|>>|==|!=|<=|>="
> +	 /* Assignment and arithmetic operators */
> +	 "|[-+*/%&|^!=<>]=?"
> +	 /* Command-line options (to avoid splitting `-option`) */
> +	 "|--?[a-zA-Z0-9_-]+"
> +	 /* Brackets and grouping symbols */
> +	 "|\\(|\\)|\\{|\\}|\\[|\\]"),

The fact that this patch does not have any changes to "t/" hierarchy
suggests me that we do not have existing tests to see how sample
text files in the supported languages are tokenized (otherwise the
above changes would require adjusting such existing tests), so I
think it should be left outside of this topic, but I wonder if
adding such tests gives us a good way to demonstrate the effect of
these changes to userdiff patterns.

Thanks.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms
  2025-02-18 15:35   ` [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
  2025-02-18 19:30     ` Junio C Hamano
@ 2025-02-18 23:38     ` Junio C Hamano
  2025-02-22 18:14     ` Johannes Sixt
  2 siblings, 0 replies; 37+ messages in thread
From: Junio C Hamano @ 2025-02-18 23:38 UTC (permalink / raw)
  To: Moumita; +Cc: git

Moumita <dhar61595@gmail.com> writes:

> From: Moumita Dhar <dhar61595@gmail.com>
>
> The existing Bash userdiff pattern misses some shell function forms, such as
> `function foo()`, multi-line definitions, and extra whitespace.
>
> Extend the pattern to:
> - Support `function foo()` syntax.
> - Allow spaces in `foo ( )` definitions.
> - Recognize multi-line definitions with backslashes.
> - Broaden function body detection.
>
> Signed-off-by: Moumita Dhar <dhar61595@gmail.com>

Applied to any one of the recent tips of 'master', this seemed break
tests and the reproduction seems to be quite easy.

    $ make 
    $ cd t && sh t4018-*.sh -i -v
    ...
    test_expect_code: command exited with 128, we wanted 1 ...
    not ok 6 - builtin bash pattern compiles
    #...
    #			test_grep ! fatal msg &&
    #			test_grep ! error msg
    #		
    $ cat t/trash*.t4018*/msg
    fatal: Invalid regexp to look for hunk header: ^[ 	]*(([a-z...

Please make it a habit to run tests after you modified the code
before sending out patches with the modifications.

Thanks.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms
  2025-02-18 15:35   ` [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
  2025-02-18 19:30     ` Junio C Hamano
  2025-02-18 23:38     ` Junio C Hamano
@ 2025-02-22 18:14     ` Johannes Sixt
  2 siblings, 0 replies; 37+ messages in thread
From: Johannes Sixt @ 2025-02-22 18:14 UTC (permalink / raw)
  To: Moumita; +Cc: git

Thank you for this follow-up.

After reviewing my text below I find that the words may sound harsh.
Please don't let you turn down by the lack of friendlyness. In technical
reviews I (and very likely many other reviewers on this list) prefer to
get to the point right away and not waste our time by talking in circles.

I also do not repeat what others have said in their review.

Am 18.02.25 um 16:35 schrieb Moumita:
> From: Moumita Dhar <dhar61595@gmail.com>
> 
> The existing Bash userdiff pattern misses some shell function forms, such as
> `function foo()`, multi-line definitions, and extra whitespace.
> 
> Extend the pattern to:
> - Support `function foo()` syntax.
> - Allow spaces in `foo ( )` definitions.
> - Recognize multi-line definitions with backslashes.
> - Broaden function body detection.

Please be accurate what you mention here. The first two are already
present and not new, the third one I cannot find, and the last one is
not immediately visible if not wrong (you are removing the '((' introducer).

> 
> Signed-off-by: Moumita Dhar <dhar61595@gmail.com>
> ---
>  userdiff.c | 34 +++++++++++++++++++++++-----------
>  1 file changed, 23 insertions(+), 11 deletions(-)
> 
> diff --git a/userdiff.c b/userdiff.c
> index 340c4eb4f7..194e28883d 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -53,26 +53,38 @@ IPATTERN("ada",
>  	 "|[-+]?[0-9][0-9#_.aAbBcCdDeEfF]*([eE][+-]?[0-9_]+)?"
>  	 "|=>|\\.\\.|\\*\\*|:=|/=|>=|<=|<<|>>|<>"),
>  PATTERNS("bash",
> -	 /* Optional leading indentation */
> +     /* Optional leading indentation */
>  	 "^[ \t]*"
> -	 /* Start of captured text */
> +	 /* Start of captured function name */
>  	 "("
>  	 "("
> -	     /* POSIX identifier with mandatory parentheses */
> -	     "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
> +		 /* POSIX identifier with mandatory parentheses (allow spaces inside) */
> +		 "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\)"
>  	 "|"
> -	     /* Bashism identifier with optional parentheses */
> -	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
> +		 /* Bash-style function definitions, allowing optional `function` keyword */
> +		 "(?:function[ \t]+(?=[a-zA-Z_]))?[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))?"
>  	 ")"

You inserted (?=[a-zA-Z_]) after the function. What does this do? Then
you made 'function identifier' optional. Why?

You also added a ? at the end. Is that to make the parenthesis optional?
But look closely: they are already optional, because we already have

  ( space_opt "(" space_opt ")" | space_opt )

It would be a minor improvement to turn this into

  space_opt ( "(" space_opt ")" )?

Make a mental bookmark here.

>  	 /* Optional whitespace */
>  	 "[ \t]*"
> -	 /* Compound command starting with `{`, `(`, `((` or `[[` */
> -	 "(\\{|\\(\\(?|\\[\\[)"
> -	 /* End of captured text */
> +	 /* Allow function body to start with `{`, `(` (subshell), `[[` */
> +	 "(\\{|\\(|\\[\\[)"
> +	 /* End of captured function name */

What is the justification that you removed "((" as introduction of the
function body?

>  	 ")",

So, we capture only what we expect to be a function header in typical cases.

Let me make a suggestion. What if we replaced everything from the mental
bookmark above up to the closing parenthesis by .*$ like we do in all
other builtin drivers? Then everything on the line that contains the
function header would be used as hunk header, and we do not care what
the function body looks like. It would also cover the case mentioned by
Eric elsewhere where the body is just a simple command.

If that leads to new cases that are detected, it would be really nice to
add corresponding test cases in t/t4018/.

>  	 /* -- */
> -	 /* Characters not in the default $IFS value */
> -	 "[^ \t]+"),
> +	 /* Identifiers: variable and function names */
> +	 "[a-zA-Z_][a-zA-Z0-9_]*"
> +	 /* Numeric constants: integers and decimals */
> +	 "|[-+]?[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+"
> +	 /* Shell variables: `$VAR`, `${VAR}` */
> +	 "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{[^}]+\\}"
> +	 /* Logical and comparison operators */
> +	 "|\\|\\||&&|<<|>>|==|!=|<=|>="
> +	 /* Assignment and arithmetic operators */
> +	 "|[-+*/%&|^!=<>]=?"
> +	 /* Command-line options (to avoid splitting `-option`) */
> +	 "|--?[a-zA-Z0-9_-]+"
> +	 /* Brackets and grouping symbols */
> +	 "|\\(|\\)|\\{|\\}|\\[|\\]"),

As far as I can see, you introduced a pattern for options, but otherwise
left the patterns the same as in the earlier round. Of course, you are
not obliged to integrate every suggestion that is made by a reviewer,
but it is good tone that you leave a comment explaining why you
dismissed a suggestion.

To expand on my suggestion how to treat ${var} substitution:

- Replace \\$\\{[^}]+\\} by just \\$\\{ to match only the introducer.
(The closing brace is already matched by a later pattern.)
- Add operators := :- :+ :? # ## %% that are used frequently in these
complex expansions.
- Have a look at section "Shell Parameter Expansion" in the bash
manual[1] about the multitude of operators that can occur after ${.
Maybe you want to add more of them.

[1]
https://www.gnu.org/software/bash/manual/bash.html#Shell-Parameter-Expansion

-- Hannes


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms
  2025-02-18 19:30     ` Junio C Hamano
@ 2025-02-22 18:15       ` Johannes Sixt
  2025-02-24 16:28         ` Junio C Hamano
  0 siblings, 1 reply; 37+ messages in thread
From: Johannes Sixt @ 2025-02-22 18:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Moumita

Am 18.02.25 um 20:30 schrieb Junio C Hamano:
> Moumita <dhar61595@gmail.com> writes:
>>  	 /* -- */
>> -	 /* Characters not in the default $IFS value */
>> -	 "[^ \t]+"),
> 
> We used to pretty-much use "a run of non-whitespace characters is a
> token".  Now we are a bit more picky.
> 
> Which may or may not be good, but it is hard to tell if it is an
> improvement.

It is only a stand-in, because every built-in userdiff driver must have
a word pattern. See the old thread here:
https://lore.kernel.org/git/373640ea4d95f3b279b9d460d9a8889b4030b4e9.camel@engmark.name/

-- Hannes


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms
  2025-02-22 18:15       ` Johannes Sixt
@ 2025-02-24 16:28         ` Junio C Hamano
  0 siblings, 0 replies; 37+ messages in thread
From: Junio C Hamano @ 2025-02-24 16:28 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Moumita

Johannes Sixt <j6t@kdbg.org> writes:

> Am 18.02.25 um 20:30 schrieb Junio C Hamano:
>> Moumita <dhar61595@gmail.com> writes:
>>>  	 /* -- */
>>> -	 /* Characters not in the default $IFS value */
>>> -	 "[^ \t]+"),
>> 
>> We used to pretty-much use "a run of non-whitespace characters is a
>> token".  Now we are a bit more picky.
>> 
>> Which may or may not be good, but it is hard to tell if it is an
>> improvement.
>
> It is only a stand-in, because every built-in userdiff driver must have
> a word pattern.

Yeah, I know.  I was merely saying that it was not obvious that the
new pattern, which is way more elaborate, is improvement over that
stand-in pattern.  As these patterns are meant to be applied to only
syntactically valid text, by going more specific pattern from simple
and lenient pattern, we stop recognising some word that we used to
take as a word (i.e. specific patterns need to worry about false
negatives, while simpler patterns only have to avoid egregious false
positives).

> See the old thread here:
> https://lore.kernel.org/git/373640ea4d95f3b279b9d460d9a8889b4030b4e9.camel@engmark.name/

Yup, 2ff6c346 (userdiff: support Bash, 2020-10-22) is where the
stand-in pattern came from, which is the v3 iteration of that patch.

Thanks.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns
  2025-02-18 15:35 ` [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Moumita
  2025-02-18 15:35   ` [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
  2025-02-18 17:30   ` [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Eric Sunshine
@ 2025-03-28 20:05   ` Moumita
  2025-03-28 20:05     ` [PATCH v3 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
                       ` (2 more replies)
  2 siblings, 3 replies; 37+ messages in thread
From: Moumita @ 2025-03-28 20:05 UTC (permalink / raw)
  To: git
  Cc: Moumita, Johannes Sixt, Eric Sunshine, Junio C Hamano,
	René Scharfe, Atharva Raykar, D. Ben Knoble

This patch improves function detection in userdiff for Bash scripts.  
The old regex tried to match function bodies explicitly, which caused  
issues with line continuations (`\`) and simple command bodies. Instead,  
I have replaced it with `.*$`, making it more consistent with other userdiff  
drivers and ensuring we capture the full function definition line.  

I also refined the word regex to better handle Bash syntax, including  
parameter expansions, arithmetic expressions, and command-line options.  

I have  added test cases to cover these changes, making sure everything  
works as expected.

Moumita Dhar (1):
  userdiff: extend Bash pattern to cover more shell function forms

 t/t4018/bash-bashism-style-multiline-function |  4 +++
 t/t4018/bash-posix-style-multiline-function   |  4 +++
 .../bash-posix-style-single-command-function  |  3 ++
 t/t4034-diff-words.sh                         |  1 +
 t/t4034/bash/expect                           | 30 +++++++++++++++++++
 t/t4034/bash/post                             | 25 ++++++++++++++++
 t/t4034/bash/pre                              | 25 ++++++++++++++++
 userdiff.c                                    | 24 +++++++++++----
 8 files changed, 110 insertions(+), 6 deletions(-)
 create mode 100644 t/t4018/bash-bashism-style-multiline-function
 create mode 100644 t/t4018/bash-posix-style-multiline-function
 create mode 100644 t/t4018/bash-posix-style-single-command-function
 create mode 100644 t/t4034/bash/expect
 create mode 100644 t/t4034/bash/post
 create mode 100644 t/t4034/bash/pre

Range-diff against v2:
1:  de2e8f9792 ! 1:  3d077fadc4 userdiff: extend Bash pattern to cover more shell function forms
    @@ Metadata
      ## Commit message ##
         userdiff: extend Bash pattern to cover more shell function forms
     
    -    The existing Bash userdiff pattern misses some shell function forms, such as
    -    `function foo()`, multi-line definitions, and extra whitespace.
    +    The previous function regex required explicit matching of function
    +    bodies using `{`, `(`, `((`, or `[[`, which caused several issues:
     
    -    Extend the pattern to:
    -    - Support `function foo()` syntax.
    -    - Allow spaces in `foo ( )` definitions.
    -    - Recognize multi-line definitions with backslashes.
    -    - Broaden function body detection.
    +    - It failed to capture valid functions where `{` was on the next line
    +      due to line continuation (`\`).
    +    - It did not recognize functions with single  command body, such as
    +      `x () echo hello`.
    +
    +    Replacing the function body matching logic with `.*$`, ensures
    +    that everything on the function definition line is captured,
    +    aligning with other userdiff drivers and improving hunk headers in
    +    `git diff`.
    +
    +    Additionally, the word regex is refined to better recognize shell
    +    syntax, including additional parameter expansion operators and
    +    command-line options, improving syntax-aware diffs.
     
         Signed-off-by: Moumita Dhar <dhar61595@gmail.com>
     
    + ## t/t4018/bash-bashism-style-multiline-function (new) ##
    +@@
    ++function RIGHT \
    ++{    
    ++    echo 'ChangeMe'
    ++}
    + \ No newline at end of file
    +
    + ## t/t4018/bash-posix-style-multiline-function (new) ##
    +@@
    ++RIGHT() \
    ++{
    ++    ChangeMe
    ++}
    + \ No newline at end of file
    +
    + ## t/t4018/bash-posix-style-single-command-function (new) ##
    +@@
    ++RIGHT() echo "hello"
    ++
    ++    ChangeMe
    +
    + ## t/t4034-diff-words.sh ##
    +@@ t/t4034-diff-words.sh: test_expect_success 'unset default driver' '
    + 
    + test_language_driver ada
    + test_language_driver bibtex
    ++test_language_driver bash
    + test_language_driver cpp
    + test_language_driver csharp
    + test_language_driver css
    +
    + ## t/t4034/bash/expect (new) ##
    +@@
    ++<BOLD>diff --git a/pre b/post<RESET>
    ++<BOLD>index 09ac008..60ba6a2 100644<RESET>
    ++<BOLD>--- a/pre<RESET>
    ++<BOLD>+++ b/post<RESET>
    ++<CYAN>@@ -1,25 +1,25 @@<RESET>
    ++<RED>my_var<RESET><GREEN>new_var<RESET>=10
    ++x=<RED>123<RESET><GREEN>456<RESET>
    ++y=<RED>3.14<RESET><GREEN>2.71<RESET>
    ++z=<RED>.5<RESET><GREEN>.75<RESET>
    ++echo <RED>$USER<RESET><GREEN>$USERNAME<RESET>
    ++${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>}
    ++if [ "<RED>$a<RESET><GREEN>$x<RESET>" == "<RED>$b<RESET><GREEN>$y<RESET>" ] || [ "<RED>$c<RESET><GREEN>$x<RESET>" != "<RED>$d<RESET><GREEN>$y<RESET>" ]; then echo "OK"; fi
    ++((<RED>a<RESET><GREEN>x<RESET>+=<RED>b<RESET><GREEN>y<RESET>))
    ++((<RED>a<RESET><GREEN>x<RESET>-=<RED>b<RESET><GREEN>y<RESET>))
    ++$((<RED>a<RESET><GREEN>x<RESET><<<RED>b<RESET><GREEN>y<RESET>))
    ++$((<RED>a<RESET><GREEN>x<RESET>>><RED>b<RESET><GREEN>y<RESET>))
    ++${<RED>a<RESET><GREEN>x<RESET>:-<RED>b<RESET><GREEN>y<RESET>}
    ++${<RED>a<RESET><GREEN>x<RESET>:=<RED>b<RESET><GREEN>y<RESET>}
    ++${<RED>a<RESET><GREEN>x<RESET>##*/}
    ++${<RED>a<RESET><GREEN>x<RESET>%.*}
    ++${<RED>a<RESET><GREEN>x<RESET>%%.*}
    ++${<RED>a<RESET><GREEN>x<RESET>^^}
    ++${<RED>a<RESET><GREEN>x<RESET>,}
    ++${<RED>a<RESET><GREEN>x<RESET>,,}
    ++${!<RED>a<RESET><GREEN>x<RESET>}
    ++${<RED>a<RESET><GREEN>x<RESET>[@]}
    ++${<RED>a<RESET><GREEN>x<RESET>:?error message}
    ++${<RED>a<RESET><GREEN>x<RESET>:2:3}
    ++ls <RED>-a<RESET><GREEN>-x<RESET>
    ++ls <RED>--a<RESET><GREEN>--x<RESET>
    +
    + ## t/t4034/bash/post (new) ##
    +@@
    ++new_var=10
    ++x=456
    ++y=2.71
    ++z=.75
    ++echo $USERNAME
    ++${HOMEDIR}
    ++if [ "$x" == "$y" ] || [ "$x" != "$y" ]; then echo "OK"; fi
    ++((x+=y))
    ++((x-=y))
    ++$((x<<y))
    ++$((x>>y))
    ++${x:-y}
    ++${x:=y}
    ++${x##*/}
    ++${x%.*}
    ++${x%%.*}
    ++${x^^}
    ++${x,}
    ++${x,,}
    ++${!x}
    ++${x[@]}
    ++${x:?error message}
    ++${x:2:3}
    ++ls -x
    ++ls --x
    +
    + ## t/t4034/bash/pre (new) ##
    +@@
    ++my_var=10
    ++x=123
    ++y=3.14
    ++z=.5
    ++echo $USER
    ++${HOME}
    ++if [ "$a" == "$b" ] || [ "$c" != "$d" ]; then echo "OK"; fi
    ++((a+=b))
    ++((a-=b))
    ++$((a << b))
    ++$((a >> b))
    ++${a:-b}
    ++${a:=b}
    ++${a##*/}
    ++${a%.*}
    ++${a%%.*}
    ++${a^^}
    ++${a,}
    ++${a,,}
    ++${!a}
    ++${a[@]}
    ++${a:?error message}
    ++${a:2:3}
    ++ls -a
    ++ls --a
    +
      ## userdiff.c ##
    -@@ userdiff.c: IPATTERN("ada",
    - 	 "|[-+]?[0-9][0-9#_.aAbBcCdDeEfF]*([eE][+-]?[0-9_]+)?"
    - 	 "|=>|\\.\\.|\\*\\*|:=|/=|>=|<=|<<|>>|<>"),
    - PATTERNS("bash",
    --	 /* Optional leading indentation */
    -+     /* Optional leading indentation */
    - 	 "^[ \t]*"
    --	 /* Start of captured text */
    -+	 /* Start of captured function name */
    - 	 "("
    - 	 "("
    --	     /* POSIX identifier with mandatory parentheses */
    --	     "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
    -+		 /* POSIX identifier with mandatory parentheses (allow spaces inside) */
    -+		 "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\)"
    - 	 "|"
    --	     /* Bashism identifier with optional parentheses */
    --	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
    -+		 /* Bash-style function definitions, allowing optional `function` keyword */
    -+		 "(?:function[ \t]+(?=[a-zA-Z_]))?[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))?"
    +@@ userdiff.c: PATTERNS("bash",
    + 	     /* Bashism identifier with optional parentheses */
    + 	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
      	 ")"
    - 	 /* Optional whitespace */
    - 	 "[ \t]*"
    +-	 /* Optional whitespace */
    +-	 "[ \t]*"
     -	 /* Compound command starting with `{`, `(`, `((` or `[[` */
     -	 "(\\{|\\(\\(?|\\[\\[)"
    --	 /* End of captured text */
    -+	 /* Allow function body to start with `{`, `(` (subshell), `[[` */
    -+	 "(\\{|\\(|\\[\\[)"
    -+	 /* End of captured function name */
    ++	 /* Everything after the function header is captured  */
    ++	 ".*$"
    + 	 /* End of captured text */
      	 ")",
      	 /* -- */
     -	 /* Characters not in the default $IFS value */
     -	 "[^ \t]+"),
     +	 /* Identifiers: variable and function names */
    -+	 "[a-zA-Z_][a-zA-Z0-9_]*"
    ++	  "[a-zA-Z_][a-zA-Z0-9_]*"
     +	 /* Numeric constants: integers and decimals */
    -+	 "|[-+]?[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+"
    -+	 /* Shell variables: `$VAR`, `${VAR}` */
    -+	 "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{[^}]+\\}"
    -+	 /* Logical and comparison operators */
    ++	  "|[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+"
    ++	 /* Shell variables: $VAR, ${VAR} */
    ++	  "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{"
    ++	  /* Logical and comparison operators */
     +	 "|\\|\\||&&|<<|>>|==|!=|<=|>="
     +	 /* Assignment and arithmetic operators */
     +	 "|[-+*/%&|^!=<>]=?"
    -+	 /* Command-line options (to avoid splitting `-option`) */
    ++	 /* Additional parameter expansion operators */
    ++	 "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|/[a-zA-Z0-9_-]+|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?"
    ++	 /* Command-line options (to avoid splitting -option) */
     +	 "|--?[a-zA-Z0-9_-]+"
     +	 /* Brackets and grouping symbols */
     +	 "|\\(|\\)|\\{|\\}|\\[|\\]"),
-- 
2.48.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3 1/1] userdiff: extend Bash pattern to cover more shell function forms
  2025-03-28 20:05   ` [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns Moumita
@ 2025-03-28 20:05     ` Moumita
  2025-03-29 19:26     ` [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns Junio C Hamano
  2025-03-30 13:39     ` [PATCH v4 0/1][GSOC] userdiff:Added newlines at the end of the test cases Moumita
  2 siblings, 0 replies; 37+ messages in thread
From: Moumita @ 2025-03-28 20:05 UTC (permalink / raw)
  To: git; +Cc: Moumita Dhar

From: Moumita Dhar <dhar61595@gmail.com>

The previous function regex required explicit matching of function
bodies using `{`, `(`, `((`, or `[[`, which caused several issues:

- It failed to capture valid functions where `{` was on the next line
  due to line continuation (`\`).
- It did not recognize functions with single  command body, such as
  `x () echo hello`.

Replacing the function body matching logic with `.*$`, ensures
that everything on the function definition line is captured,
aligning with other userdiff drivers and improving hunk headers in
`git diff`.

Additionally, the word regex is refined to better recognize shell
syntax, including additional parameter expansion operators and
command-line options, improving syntax-aware diffs.

Signed-off-by: Moumita Dhar <dhar61595@gmail.com>
---
 t/t4018/bash-bashism-style-multiline-function |  4 +++
 t/t4018/bash-posix-style-multiline-function   |  4 +++
 .../bash-posix-style-single-command-function  |  3 ++
 t/t4034-diff-words.sh                         |  1 +
 t/t4034/bash/expect                           | 30 +++++++++++++++++++
 t/t4034/bash/post                             | 25 ++++++++++++++++
 t/t4034/bash/pre                              | 25 ++++++++++++++++
 userdiff.c                                    | 24 +++++++++++----
 8 files changed, 110 insertions(+), 6 deletions(-)
 create mode 100644 t/t4018/bash-bashism-style-multiline-function
 create mode 100644 t/t4018/bash-posix-style-multiline-function
 create mode 100644 t/t4018/bash-posix-style-single-command-function
 create mode 100644 t/t4034/bash/expect
 create mode 100644 t/t4034/bash/post
 create mode 100644 t/t4034/bash/pre

diff --git a/t/t4018/bash-bashism-style-multiline-function b/t/t4018/bash-bashism-style-multiline-function
new file mode 100644
index 0000000000..0800daa156
--- /dev/null
+++ b/t/t4018/bash-bashism-style-multiline-function
@@ -0,0 +1,4 @@
+function RIGHT \
+{    
+    echo 'ChangeMe'
+}
\ No newline at end of file
diff --git a/t/t4018/bash-posix-style-multiline-function b/t/t4018/bash-posix-style-multiline-function
new file mode 100644
index 0000000000..756f21524b
--- /dev/null
+++ b/t/t4018/bash-posix-style-multiline-function
@@ -0,0 +1,4 @@
+RIGHT() \
+{
+    ChangeMe
+}
\ No newline at end of file
diff --git a/t/t4018/bash-posix-style-single-command-function b/t/t4018/bash-posix-style-single-command-function
new file mode 100644
index 0000000000..398ae1c5d2
--- /dev/null
+++ b/t/t4018/bash-posix-style-single-command-function
@@ -0,0 +1,3 @@
+RIGHT() echo "hello"
+
+    ChangeMe
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index f51d3557f1..0be647c2fb 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -320,6 +320,7 @@ test_expect_success 'unset default driver' '
 
 test_language_driver ada
 test_language_driver bibtex
+test_language_driver bash
 test_language_driver cpp
 test_language_driver csharp
 test_language_driver css
diff --git a/t/t4034/bash/expect b/t/t4034/bash/expect
new file mode 100644
index 0000000000..a0f7cbd5a3
--- /dev/null
+++ b/t/t4034/bash/expect
@@ -0,0 +1,30 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 09ac008..60ba6a2 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,25 +1,25 @@<RESET>
+<RED>my_var<RESET><GREEN>new_var<RESET>=10
+x=<RED>123<RESET><GREEN>456<RESET>
+y=<RED>3.14<RESET><GREEN>2.71<RESET>
+z=<RED>.5<RESET><GREEN>.75<RESET>
+echo <RED>$USER<RESET><GREEN>$USERNAME<RESET>
+${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>}
+if [ "<RED>$a<RESET><GREEN>$x<RESET>" == "<RED>$b<RESET><GREEN>$y<RESET>" ] || [ "<RED>$c<RESET><GREEN>$x<RESET>" != "<RED>$d<RESET><GREEN>$y<RESET>" ]; then echo "OK"; fi
+((<RED>a<RESET><GREEN>x<RESET>+=<RED>b<RESET><GREEN>y<RESET>))
+((<RED>a<RESET><GREEN>x<RESET>-=<RED>b<RESET><GREEN>y<RESET>))
+$((<RED>a<RESET><GREEN>x<RESET><<<RED>b<RESET><GREEN>y<RESET>))
+$((<RED>a<RESET><GREEN>x<RESET>>><RED>b<RESET><GREEN>y<RESET>))
+${<RED>a<RESET><GREEN>x<RESET>:-<RED>b<RESET><GREEN>y<RESET>}
+${<RED>a<RESET><GREEN>x<RESET>:=<RED>b<RESET><GREEN>y<RESET>}
+${<RED>a<RESET><GREEN>x<RESET>##*/}
+${<RED>a<RESET><GREEN>x<RESET>%.*}
+${<RED>a<RESET><GREEN>x<RESET>%%.*}
+${<RED>a<RESET><GREEN>x<RESET>^^}
+${<RED>a<RESET><GREEN>x<RESET>,}
+${<RED>a<RESET><GREEN>x<RESET>,,}
+${!<RED>a<RESET><GREEN>x<RESET>}
+${<RED>a<RESET><GREEN>x<RESET>[@]}
+${<RED>a<RESET><GREEN>x<RESET>:?error message}
+${<RED>a<RESET><GREEN>x<RESET>:2:3}
+ls <RED>-a<RESET><GREEN>-x<RESET>
+ls <RED>--a<RESET><GREEN>--x<RESET>
diff --git a/t/t4034/bash/post b/t/t4034/bash/post
new file mode 100644
index 0000000000..60ba6a2e75
--- /dev/null
+++ b/t/t4034/bash/post
@@ -0,0 +1,25 @@
+new_var=10
+x=456
+y=2.71
+z=.75
+echo $USERNAME
+${HOMEDIR}
+if [ "$x" == "$y" ] || [ "$x" != "$y" ]; then echo "OK"; fi
+((x+=y))
+((x-=y))
+$((x<<y))
+$((x>>y))
+${x:-y}
+${x:=y}
+${x##*/}
+${x%.*}
+${x%%.*}
+${x^^}
+${x,}
+${x,,}
+${!x}
+${x[@]}
+${x:?error message}
+${x:2:3}
+ls -x
+ls --x
diff --git a/t/t4034/bash/pre b/t/t4034/bash/pre
new file mode 100644
index 0000000000..09ac008a83
--- /dev/null
+++ b/t/t4034/bash/pre
@@ -0,0 +1,25 @@
+my_var=10
+x=123
+y=3.14
+z=.5
+echo $USER
+${HOME}
+if [ "$a" == "$b" ] || [ "$c" != "$d" ]; then echo "OK"; fi
+((a+=b))
+((a-=b))
+$((a << b))
+$((a >> b))
+${a:-b}
+${a:=b}
+${a##*/}
+${a%.*}
+${a%%.*}
+${a^^}
+${a,}
+${a,,}
+${!a}
+${a[@]}
+${a:?error message}
+${a:2:3}
+ls -a
+ls --a
diff --git a/userdiff.c b/userdiff.c
index 340c4eb4f7..4c77c7e0f6 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -64,15 +64,27 @@ PATTERNS("bash",
 	     /* Bashism identifier with optional parentheses */
 	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
 	 ")"
-	 /* Optional whitespace */
-	 "[ \t]*"
-	 /* Compound command starting with `{`, `(`, `((` or `[[` */
-	 "(\\{|\\(\\(?|\\[\\[)"
+	 /* Everything after the function header is captured  */
+	 ".*$"
 	 /* End of captured text */
 	 ")",
 	 /* -- */
-	 /* Characters not in the default $IFS value */
-	 "[^ \t]+"),
+	 /* Identifiers: variable and function names */
+	  "[a-zA-Z_][a-zA-Z0-9_]*"
+	 /* Numeric constants: integers and decimals */
+	  "|[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+"
+	 /* Shell variables: $VAR, ${VAR} */
+	  "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{"
+	  /* Logical and comparison operators */
+	 "|\\|\\||&&|<<|>>|==|!=|<=|>="
+	 /* Assignment and arithmetic operators */
+	 "|[-+*/%&|^!=<>]=?"
+	 /* Additional parameter expansion operators */
+	 "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|/[a-zA-Z0-9_-]+|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?"
+	 /* Command-line options (to avoid splitting -option) */
+	 "|--?[a-zA-Z0-9_-]+"
+	 /* Brackets and grouping symbols */
+	 "|\\(|\\)|\\{|\\}|\\[|\\]"),
 PATTERNS("bibtex",
 	 "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
 	 /* -- */
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns
  2025-03-28 20:05   ` [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns Moumita
  2025-03-28 20:05     ` [PATCH v3 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
@ 2025-03-29 19:26     ` Junio C Hamano
  2025-03-30 12:28       ` MOUMITA DHAR
  2025-03-30 13:39     ` [PATCH v4 0/1][GSOC] userdiff:Added newlines at the end of the test cases Moumita
  2 siblings, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2025-03-29 19:26 UTC (permalink / raw)
  To: Moumita
  Cc: git, Johannes Sixt, Eric Sunshine, René Scharfe,
	Atharva Raykar, D. Ben Knoble

Moumita <dhar61595@gmail.com> writes:

>     + ## t/t4018/bash-bashism-style-multiline-function (new) ##
>     +@@
>     ++function RIGHT \
>     ++{    
>     ++    echo 'ChangeMe'
>     ++}
>     + \ No newline at end of file
>     +
>     + ## t/t4018/bash-posix-style-multiline-function (new) ##
>     +@@
>     ++RIGHT() \
>     ++{
>     ++    ChangeMe
>     ++}
>     + \ No newline at end of file

For these new test, is it essential that these sample files end in
incomplete lines?  In other words, are these tests trying to make
sure that the function line is correctly found even if the function
body is at the end of the file that lack the final terminating LF?

If that is what they are testing, please add comments near the
beginning of the file to tell future developers that it is essential
that they keep these files end in incomplete lines and why.

If that is not what these tests are checking, then make these lines
complete lines instead, as they waste future developers' time making
them wonder if there are valid reasons why these files must end in
incomplete lines.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns
  2025-03-29 19:26     ` [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns Junio C Hamano
@ 2025-03-30 12:28       ` MOUMITA DHAR
  0 siblings, 0 replies; 37+ messages in thread
From: MOUMITA DHAR @ 2025-03-30 12:28 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Johannes Sixt, Eric Sunshine, René Scharfe,
	Atharva Raykar, D. Ben Knoble

On Sun, 30 Mar 2025 at 00:56, Junio C Hamano <gitster@pobox.com> wrote:
>
> Moumita <dhar61595@gmail.com> writes:
>
> >     + ## t/t4018/bash-bashism-style-multiline-function (new) ##
> >     +@@
> >     ++function RIGHT \
> >     ++{
> >     ++    echo 'ChangeMe'
> >     ++}
> >     + \ No newline at end of file
> >     +
> >     + ## t/t4018/bash-posix-style-multiline-function (new) ##
> >     +@@
> >     ++RIGHT() \
> >     ++{
> >     ++    ChangeMe
> >     ++}
> >     + \ No newline at end of file
>
> For these new test, is it essential that these sample files end in
> incomplete lines?  In other words, are these tests trying to make
> sure that the function line is correctly found even if the function
> body is at the end of the file that lack the final terminating LF?

>No , the skipping of newline at the end of the test files is not intended.The tests are only meant to check multiline function detection, ensuring that a function body starting on the next line after a line continuation \ is correctly recognized.

> If that is what they are testing, please add comments near the
> beginning of the file to tell future developers that it is essential
> that they keep these files end in incomplete lines and why.
>
> If that is not what these tests are checking, then make these lines
> complete lines instead, as they waste future developers' time making
> them wonder if there are valid reasons why these files must end in
> incomplete lines.
> Yes I will do that and send the patch again . Thank you for the feedback.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v4 0/1][GSOC] userdiff:Added newlines at the end of the test cases
  2025-03-28 20:05   ` [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns Moumita
  2025-03-28 20:05     ` [PATCH v3 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
  2025-03-29 19:26     ` [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns Junio C Hamano
@ 2025-03-30 13:39     ` Moumita
  2025-03-30 13:39       ` [PATCH v4 1/1][GSOC] userdiff: extend Bash pattern to cover more shell function forms Moumita
  2025-05-11 12:58       ` [PATCH v5 0/1] Added the closing ")" to make sure is not unbalanced and corrected the tests for word diff Moumita
  2 siblings, 2 replies; 37+ messages in thread
From: Moumita @ 2025-03-30 13:39 UTC (permalink / raw)
  To: git
  Cc: Moumita, Johannes Sixt, Eric Sunshine, Junio C Hamano,
	René Scharfe, Atharva Raykar, D. Ben Knoble

While making this change, I  considered whether shell scripts can lack a final newline and whether it would be useful to test this case. Technically, a shell script can be missing a final newline, 
and shells like Bash can still process it. However, this is uncommon and generally discouraged due to POSIX and Unix conventions and
also although Bash can process such files, missing a final newline can cause unexpected behavior in edge cases—such as when concatenating scripts, where the last 
line of one script might merge with the first line of another.

Given these considerations, I have added the missing newline to the test files to align with standard conventions and avoid
 unnecessary confusion for future developers.

If testing function detection in a file without a final newline is considered useful
, I can add a separate test specifically for that edge case. Let me know if that would be helpful.

Moumita Dhar (1):
  userdiff: extend Bash pattern to cover more shell function forms

 t/t4018/bash-bashism-style-multiline-function |  4 +++
 t/t4018/bash-posix-style-multiline-function   |  4 +++
 .../bash-posix-style-single-command-function  |  3 ++
 t/t4034-diff-words.sh                         |  1 +
 t/t4034/bash/expect                           | 30 +++++++++++++++++++
 t/t4034/bash/post                             | 25 ++++++++++++++++
 t/t4034/bash/pre                              | 25 ++++++++++++++++
 userdiff.c                                    | 24 +++++++++++----
 8 files changed, 110 insertions(+), 6 deletions(-)
 create mode 100644 t/t4018/bash-bashism-style-multiline-function
 create mode 100644 t/t4018/bash-posix-style-multiline-function
 create mode 100644 t/t4018/bash-posix-style-single-command-function
 create mode 100644 t/t4034/bash/expect
 create mode 100644 t/t4034/bash/post
 create mode 100644 t/t4034/bash/pre

Range-diff against v3:
1:  3d077fadc4 ! 1:  40cffd3b4a userdiff: extend Bash pattern to cover more shell function forms
    @@ t/t4018/bash-bashism-style-multiline-function (new)
     +{    
     +    echo 'ChangeMe'
     +}
    - \ No newline at end of file
     
      ## t/t4018/bash-posix-style-multiline-function (new) ##
     @@
    @@ t/t4018/bash-posix-style-multiline-function (new)
     +{
     +    ChangeMe
     +}
    - \ No newline at end of file
     
      ## t/t4018/bash-posix-style-single-command-function (new) ##
     @@
-- 
2.48.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v4 1/1][GSOC] userdiff: extend Bash pattern to cover more shell function forms
  2025-03-30 13:39     ` [PATCH v4 0/1][GSOC] userdiff:Added newlines at the end of the test cases Moumita
@ 2025-03-30 13:39       ` Moumita
  2025-05-02 21:27         ` Junio C Hamano
  2025-05-06 16:30         ` Johannes Sixt
  2025-05-11 12:58       ` [PATCH v5 0/1] Added the closing ")" to make sure is not unbalanced and corrected the tests for word diff Moumita
  1 sibling, 2 replies; 37+ messages in thread
From: Moumita @ 2025-03-30 13:39 UTC (permalink / raw)
  To: git
  Cc: Moumita Dhar, Johannes Sixt, Eric Sunshine, Junio C Hamano,
	René Scharfe, Atharva Raykar, D. Ben Knoble

From: Moumita Dhar <dhar61595@gmail.com>

The previous function regex required explicit matching of function
bodies using `{`, `(`, `((`, or `[[`, which caused several issues:

- It failed to capture valid functions where `{` was on the next line
  due to line continuation (`\`).
- It did not recognize functions with single  command body, such as
  `x () echo hello`.

Replacing the function body matching logic with `.*$`, ensures
that everything on the function definition line is captured,
aligning with other userdiff drivers and improving hunk headers in
`git diff`.

Additionally, the word regex is refined to better recognize shell
syntax, including additional parameter expansion operators and
command-line options, improving syntax-aware diffs.

Signed-off-by: Moumita Dhar <dhar61595@gmail.com>
---
 t/t4018/bash-bashism-style-multiline-function |  4 +++
 t/t4018/bash-posix-style-multiline-function   |  4 +++
 .../bash-posix-style-single-command-function  |  3 ++
 t/t4034-diff-words.sh                         |  1 +
 t/t4034/bash/expect                           | 30 +++++++++++++++++++
 t/t4034/bash/post                             | 25 ++++++++++++++++
 t/t4034/bash/pre                              | 25 ++++++++++++++++
 userdiff.c                                    | 24 +++++++++++----
 8 files changed, 110 insertions(+), 6 deletions(-)
 create mode 100644 t/t4018/bash-bashism-style-multiline-function
 create mode 100644 t/t4018/bash-posix-style-multiline-function
 create mode 100644 t/t4018/bash-posix-style-single-command-function
 create mode 100644 t/t4034/bash/expect
 create mode 100644 t/t4034/bash/post
 create mode 100644 t/t4034/bash/pre

diff --git a/t/t4018/bash-bashism-style-multiline-function b/t/t4018/bash-bashism-style-multiline-function
new file mode 100644
index 0000000000..284d50dd99
--- /dev/null
+++ b/t/t4018/bash-bashism-style-multiline-function
@@ -0,0 +1,4 @@
+function RIGHT \
+{    
+    echo 'ChangeMe'
+}
diff --git a/t/t4018/bash-posix-style-multiline-function b/t/t4018/bash-posix-style-multiline-function
new file mode 100644
index 0000000000..cc8727cbcd
--- /dev/null
+++ b/t/t4018/bash-posix-style-multiline-function
@@ -0,0 +1,4 @@
+RIGHT() \
+{
+    ChangeMe
+}
diff --git a/t/t4018/bash-posix-style-single-command-function b/t/t4018/bash-posix-style-single-command-function
new file mode 100644
index 0000000000..398ae1c5d2
--- /dev/null
+++ b/t/t4018/bash-posix-style-single-command-function
@@ -0,0 +1,3 @@
+RIGHT() echo "hello"
+
+    ChangeMe
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index f51d3557f1..0be647c2fb 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -320,6 +320,7 @@ test_expect_success 'unset default driver' '
 
 test_language_driver ada
 test_language_driver bibtex
+test_language_driver bash
 test_language_driver cpp
 test_language_driver csharp
 test_language_driver css
diff --git a/t/t4034/bash/expect b/t/t4034/bash/expect
new file mode 100644
index 0000000000..a0f7cbd5a3
--- /dev/null
+++ b/t/t4034/bash/expect
@@ -0,0 +1,30 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 09ac008..60ba6a2 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,25 +1,25 @@<RESET>
+<RED>my_var<RESET><GREEN>new_var<RESET>=10
+x=<RED>123<RESET><GREEN>456<RESET>
+y=<RED>3.14<RESET><GREEN>2.71<RESET>
+z=<RED>.5<RESET><GREEN>.75<RESET>
+echo <RED>$USER<RESET><GREEN>$USERNAME<RESET>
+${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>}
+if [ "<RED>$a<RESET><GREEN>$x<RESET>" == "<RED>$b<RESET><GREEN>$y<RESET>" ] || [ "<RED>$c<RESET><GREEN>$x<RESET>" != "<RED>$d<RESET><GREEN>$y<RESET>" ]; then echo "OK"; fi
+((<RED>a<RESET><GREEN>x<RESET>+=<RED>b<RESET><GREEN>y<RESET>))
+((<RED>a<RESET><GREEN>x<RESET>-=<RED>b<RESET><GREEN>y<RESET>))
+$((<RED>a<RESET><GREEN>x<RESET><<<RED>b<RESET><GREEN>y<RESET>))
+$((<RED>a<RESET><GREEN>x<RESET>>><RED>b<RESET><GREEN>y<RESET>))
+${<RED>a<RESET><GREEN>x<RESET>:-<RED>b<RESET><GREEN>y<RESET>}
+${<RED>a<RESET><GREEN>x<RESET>:=<RED>b<RESET><GREEN>y<RESET>}
+${<RED>a<RESET><GREEN>x<RESET>##*/}
+${<RED>a<RESET><GREEN>x<RESET>%.*}
+${<RED>a<RESET><GREEN>x<RESET>%%.*}
+${<RED>a<RESET><GREEN>x<RESET>^^}
+${<RED>a<RESET><GREEN>x<RESET>,}
+${<RED>a<RESET><GREEN>x<RESET>,,}
+${!<RED>a<RESET><GREEN>x<RESET>}
+${<RED>a<RESET><GREEN>x<RESET>[@]}
+${<RED>a<RESET><GREEN>x<RESET>:?error message}
+${<RED>a<RESET><GREEN>x<RESET>:2:3}
+ls <RED>-a<RESET><GREEN>-x<RESET>
+ls <RED>--a<RESET><GREEN>--x<RESET>
diff --git a/t/t4034/bash/post b/t/t4034/bash/post
new file mode 100644
index 0000000000..60ba6a2e75
--- /dev/null
+++ b/t/t4034/bash/post
@@ -0,0 +1,25 @@
+new_var=10
+x=456
+y=2.71
+z=.75
+echo $USERNAME
+${HOMEDIR}
+if [ "$x" == "$y" ] || [ "$x" != "$y" ]; then echo "OK"; fi
+((x+=y))
+((x-=y))
+$((x<<y))
+$((x>>y))
+${x:-y}
+${x:=y}
+${x##*/}
+${x%.*}
+${x%%.*}
+${x^^}
+${x,}
+${x,,}
+${!x}
+${x[@]}
+${x:?error message}
+${x:2:3}
+ls -x
+ls --x
diff --git a/t/t4034/bash/pre b/t/t4034/bash/pre
new file mode 100644
index 0000000000..09ac008a83
--- /dev/null
+++ b/t/t4034/bash/pre
@@ -0,0 +1,25 @@
+my_var=10
+x=123
+y=3.14
+z=.5
+echo $USER
+${HOME}
+if [ "$a" == "$b" ] || [ "$c" != "$d" ]; then echo "OK"; fi
+((a+=b))
+((a-=b))
+$((a << b))
+$((a >> b))
+${a:-b}
+${a:=b}
+${a##*/}
+${a%.*}
+${a%%.*}
+${a^^}
+${a,}
+${a,,}
+${!a}
+${a[@]}
+${a:?error message}
+${a:2:3}
+ls -a
+ls --a
diff --git a/userdiff.c b/userdiff.c
index 340c4eb4f7..4c77c7e0f6 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -64,15 +64,27 @@ PATTERNS("bash",
 	     /* Bashism identifier with optional parentheses */
 	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
 	 ")"
-	 /* Optional whitespace */
-	 "[ \t]*"
-	 /* Compound command starting with `{`, `(`, `((` or `[[` */
-	 "(\\{|\\(\\(?|\\[\\[)"
+	 /* Everything after the function header is captured  */
+	 ".*$"
 	 /* End of captured text */
 	 ")",
 	 /* -- */
-	 /* Characters not in the default $IFS value */
-	 "[^ \t]+"),
+	 /* Identifiers: variable and function names */
+	  "[a-zA-Z_][a-zA-Z0-9_]*"
+	 /* Numeric constants: integers and decimals */
+	  "|[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+"
+	 /* Shell variables: $VAR, ${VAR} */
+	  "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{"
+	  /* Logical and comparison operators */
+	 "|\\|\\||&&|<<|>>|==|!=|<=|>="
+	 /* Assignment and arithmetic operators */
+	 "|[-+*/%&|^!=<>]=?"
+	 /* Additional parameter expansion operators */
+	 "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|/[a-zA-Z0-9_-]+|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?"
+	 /* Command-line options (to avoid splitting -option) */
+	 "|--?[a-zA-Z0-9_-]+"
+	 /* Brackets and grouping symbols */
+	 "|\\(|\\)|\\{|\\}|\\[|\\]"),
 PATTERNS("bibtex",
 	 "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
 	 /* -- */
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 1/1][GSOC] userdiff: extend Bash pattern to cover more shell function forms
  2025-03-30 13:39       ` [PATCH v4 1/1][GSOC] userdiff: extend Bash pattern to cover more shell function forms Moumita
@ 2025-05-02 21:27         ` Junio C Hamano
  2025-05-06 16:30         ` Johannes Sixt
  1 sibling, 0 replies; 37+ messages in thread
From: Junio C Hamano @ 2025-05-02 21:27 UTC (permalink / raw)
  To: git
  Cc: Moumita, Johannes Sixt, Eric Sunshine, René Scharfe,
	Atharva Raykar, D. Ben Knoble

Moumita <dhar61595@gmail.com> writes:

> From: Moumita Dhar <dhar61595@gmail.com>
>
> The previous function regex required explicit matching of function
> bodies using `{`, `(`, `((`, or `[[`, which caused several issues:
>
> - It failed to capture valid functions where `{` was on the next line
>   due to line continuation (`\`).
> - It did not recognize functions with single  command body, such as
>   `x () echo hello`.
>
> Replacing the function body matching logic with `.*$`, ensures
> that everything on the function definition line is captured,
> aligning with other userdiff drivers and improving hunk headers in
> `git diff`.
>
> Additionally, the word regex is refined to better recognize shell
> syntax, including additional parameter expansion operators and
> command-line options, improving syntax-aware diffs.
>
> Signed-off-by: Moumita Dhar <dhar61595@gmail.com>
> ---

This iteration hasn't seen any reviews or responses.  Is everybody
who participated in reviews in the previous rounds happy with it?

Thanks.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 1/1][GSOC] userdiff: extend Bash pattern to cover more shell function forms
  2025-03-30 13:39       ` [PATCH v4 1/1][GSOC] userdiff: extend Bash pattern to cover more shell function forms Moumita
  2025-05-02 21:27         ` Junio C Hamano
@ 2025-05-06 16:30         ` Johannes Sixt
  2025-05-10 11:37           ` MOUMITA DHAR
  1 sibling, 1 reply; 37+ messages in thread
From: Johannes Sixt @ 2025-05-06 16:30 UTC (permalink / raw)
  To: Moumita
  Cc: Eric Sunshine, Junio C Hamano, René Scharfe, Atharva Raykar,
	D. Ben Knoble, git

Am 30.03.25 um 15:39 schrieb Moumita:
> From: Moumita Dhar <dhar61595@gmail.com>
> 
> The previous function regex required explicit matching of function
> bodies using `{`, `(`, `((`, or `[[`, which caused several issues:
> 
> - It failed to capture valid functions where `{` was on the next line
>   due to line continuation (`\`).
> - It did not recognize functions with single  command body, such as
>   `x () echo hello`.

Good.

> Replacing the function body matching logic with `.*$`, ensures
> that everything on the function definition line is captured,
> aligning with other userdiff drivers and improving hunk headers in
> `git diff`.

Personally, I am a bit allergic against marketing speak. *Of course* do
we intend improve the code. No need to mention it.

> Additionally, the word regex is refined to better recognize shell
> syntax, including additional parameter expansion operators and
> command-line options, improving syntax-aware diffs.

Good. (But see above about "improving".)

> 
> Signed-off-by: Moumita Dhar <dhar61595@gmail.com>
> ---

> diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
> index f51d3557f1..0be647c2fb 100755
> --- a/t/t4034-diff-words.sh
> +++ b/t/t4034-diff-words.sh
> @@ -320,6 +320,7 @@ test_expect_success 'unset default driver' '
>  
>  test_language_driver ada
>  test_language_driver bibtex
> +test_language_driver bash
>  test_language_driver cpp
>  test_language_driver csharp
>  test_language_driver css

Including a test for word-diff is very much appreciated!

> diff --git a/t/t4034/bash/expect b/t/t4034/bash/expect
> new file mode 100644
> index 0000000000..a0f7cbd5a3
> --- /dev/null
> +++ b/t/t4034/bash/expect
> @@ -0,0 +1,30 @@
> +<BOLD>diff --git a/pre b/post<RESET>
> +<BOLD>index 09ac008..60ba6a2 100644<RESET>
> +<BOLD>--- a/pre<RESET>
> +<BOLD>+++ b/post<RESET>
> +<CYAN>@@ -1,25 +1,25 @@<RESET>
> +<RED>my_var<RESET><GREEN>new_var<RESET>=10
> +x=<RED>123<RESET><GREEN>456<RESET>
> +y=<RED>3.14<RESET><GREEN>2.71<RESET>
> +z=<RED>.5<RESET><GREEN>.75<RESET>

OK.

> +echo <RED>$USER<RESET><GREEN>$USERNAME<RESET>

This tests a typical variable expansion. Good.

> +${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>}

A more elaborate variable expansion does not include the surrounding ${
}. Good.

> +if [ "<RED>$a<RESET><GREEN>$x<RESET>" == "<RED>$b<RESET><GREEN>$y<RESET>" ] || [ "<RED>$c<RESET><GREEN>$x<RESET>" != "<RED>$d<RESET><GREEN>$y<RESET>" ]; then echo "OK"; fi

This line also only tests variable expansions and is quite redundant. It
could test the operators that we see, but it doesn't. See below for
ideas how to do that.

And all from here...

> +((<RED>a<RESET><GREEN>x<RESET>+=<RED>b<RESET><GREEN>y<RESET>))
> +((<RED>a<RESET><GREEN>x<RESET>-=<RED>b<RESET><GREEN>y<RESET>))
> +$((<RED>a<RESET><GREEN>x<RESET><<<RED>b<RESET><GREEN>y<RESET>))
> +$((<RED>a<RESET><GREEN>x<RESET>>><RED>b<RESET><GREEN>y<RESET>))
> +${<RED>a<RESET><GREEN>x<RESET>:-<RED>b<RESET><GREEN>y<RESET>}
> +${<RED>a<RESET><GREEN>x<RESET>:=<RED>b<RESET><GREEN>y<RESET>}
> +${<RED>a<RESET><GREEN>x<RESET>##*/}
> +${<RED>a<RESET><GREEN>x<RESET>%.*}
> +${<RED>a<RESET><GREEN>x<RESET>%%.*}
> +${<RED>a<RESET><GREEN>x<RESET>^^}
> +${<RED>a<RESET><GREEN>x<RESET>,}
> +${<RED>a<RESET><GREEN>x<RESET>,,}

... to here also only test variable expansions, but not the operators.
As written, they are totally redundant and should be dropped or improved.

To test, for example, that '##' is kept together as an operator, you
have to have

   ${x#*/}

in the pre-image and

   ${x##*/}

in the post-image, so that we see

   ${x<RED>#<RESET><GREEN>##<RESET>*/}

in the result. If '##' were two tokens (due to a bug in the pattern), we
would see

   ${x#<GREEN>#<RESET>*/}

in the result.

You should go through all not-single-character operators and have a
pre-image and a post-image where one characters is added or removed. The
following is not a correct test:

pre:    ((x+=b))
post:   ((x-=b))
result: ((x<RED>+=<RESET><GREEN>-=<RESET>))

because we would see this result even if the pattern has a bug that
recognizes only one of += or -=, but would split the other one. A
correct test would be:

pre:    ((x+b))
post:   ((x+=b))
result: ((x<RED>+<RESET><GREEN>+=<RESET>))

> +${!<RED>a<RESET><GREEN>x<RESET>}

Here, you should have no '!' in the pre-image and the '!' in the post
image and not change the name. Then the test demonstrates that '!' is
its own token and not merged with '${' (if that is the intent of the
test; otherwise it is a redunant test).

> +${<RED>a<RESET><GREEN>x<RESET>[@]}

If you want to test that '@' is not merged with the brackets, then you
can have, for example '*' in the pre-image and '@' in the post image.

> +${<RED>a<RESET><GREEN>x<RESET>:?error message}

Either redundant or another two-character operator to test.

> +${<RED>a<RESET><GREEN>x<RESET>:2:3}

Redundant.

> +ls <RED>-a<RESET><GREEN>-x<RESET>
> +ls <RED>--a<RESET><GREEN>--x<RESET>

Both are good tests.

> diff --git a/userdiff.c b/userdiff.c
> index 340c4eb4f7..4c77c7e0f6 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -64,15 +64,27 @@ PATTERNS("bash",
>  	     /* Bashism identifier with optional parentheses */
>  	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
>  	 ")"
> -	 /* Optional whitespace */
> -	 "[ \t]*"
> -	 /* Compound command starting with `{`, `(`, `((` or `[[` */
> -	 "(\\{|\\(\\(?|\\[\\[)"
> +	 /* Everything after the function header is captured  */
> +	 ".*$"

I remember suggesting to capture everything after the function header.
However, If I am not mistaken, this does not do what I intended (and as
written it means that a pointless matching operation happens). The hunk
header shows everything that is in the outermost parentheses (group).
This catch-all, however, is even outside the outermost group and not
captured. It should be above the closing parenthesis that we see in the
context.

To test for this, you can have one test where "RIGHT" is not the
function name, but in the comment on the function header line.

>  	 /* End of captured text */
>  	 ")",
>  	 /* -- */
> -	 /* Characters not in the default $IFS value */
> -	 "[^ \t]+"),
> +	 /* Identifiers: variable and function names */
> +	  "[a-zA-Z_][a-zA-Z0-9_]*"
> +	 /* Numeric constants: integers and decimals */
> +	  "|[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+"
> +	 /* Shell variables: $VAR, ${VAR} */
> +	  "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{"
> +	  /* Logical and comparison operators */
> +	 "|\\|\\||&&|<<|>>|==|!=|<=|>="
> +	 /* Assignment and arithmetic operators */
> +	 "|[-+*/%&|^!=<>]=?"
> +	 /* Additional parameter expansion operators */
> +	 "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|/[a-zA-Z0-9_-]+|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?"

What is the purpose of this "/[a-zA-Z0-9_-]+"? It would mean, for
example that changes in alphanumeric path names would include the slash
in the changed part. I don't think this should be here.

> +	 /* Command-line options (to avoid splitting -option) */
> +	 "|--?[a-zA-Z0-9_-]+"
> +	 /* Brackets and grouping symbols */
> +	 "|\\(|\\)|\\{|\\}|\\[|\\]"),
>  PATTERNS("bibtex",
>  	 "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
>  	 /* -- */

-- Hannes


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 1/1][GSOC] userdiff: extend Bash pattern to cover more shell function forms
  2025-05-06 16:30         ` Johannes Sixt
@ 2025-05-10 11:37           ` MOUMITA DHAR
  2025-05-10 12:40             ` Johannes Sixt
  0 siblings, 1 reply; 37+ messages in thread
From: MOUMITA DHAR @ 2025-05-10 11:37 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: Eric Sunshine, Junio C Hamano, René Scharfe, Atharva Raykar,
	D. Ben Knoble, git

On Tue, 6 May 2025 at 22:00, Johannes Sixt <j6t@kdbg.org> wrote:
>
> Am 30.03.25 um 15:39 schrieb Moumita:
> > From: Moumita Dhar <dhar61595@gmail.com>
> >
> > The previous function regex required explicit matching of function
> > bodies using `{`, `(`, `((`, or `[[`, which caused several issues:
> >
> > - It failed to capture valid functions where `{` was on the next line
> >   due to line continuation (`\`).
> > - It did not recognize functions with single  command body, such as
> >   `x () echo hello`.
>
> Good.
>
> > Replacing the function body matching logic with `.*$`, ensures
> > that everything on the function definition line is captured,
> > aligning with other userdiff drivers and improving hunk headers in
> > `git diff`.
>
> Personally, I am a bit allergic against marketing speak. *Of course* do
> we intend improve the code. No need to mention it.
>
> > Additionally, the word regex is refined to better recognize shell
> > syntax, including additional parameter expansion operators and
> > command-line options, improving syntax-aware diffs.
>
> Good. (But see above about "improving".)
>
> > Yes I will be careful about the commit messages.

> > Signed-off-by: Moumita Dhar <dhar61595@gmail.com>
> > ---
>
> > diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
> > index f51d3557f1..0be647c2fb 100755
> > --- a/t/t4034-diff-words.sh
> > +++ b/t/t4034-diff-words.sh
> > @@ -320,6 +320,7 @@ test_expect_success 'unset default driver' '
> >
> >  test_language_driver ada
> >  test_language_driver bibtex
> > +test_language_driver bash
> >  test_language_driver cpp
> >  test_language_driver csharp
> >  test_language_driver css
>
> Including a test for word-diff is very much appreciated!
>
> > diff --git a/t/t4034/bash/expect b/t/t4034/bash/expect
> > new file mode 100644
> > index 0000000000..a0f7cbd5a3
> > --- /dev/null
> > +++ b/t/t4034/bash/expect
> > @@ -0,0 +1,30 @@
> > +<BOLD>diff --git a/pre b/post<RESET>
> > +<BOLD>index 09ac008..60ba6a2 100644<RESET>
> > +<BOLD>--- a/pre<RESET>
> > +<BOLD>+++ b/post<RESET>
> > +<CYAN>@@ -1,25 +1,25 @@<RESET>
> > +<RED>my_var<RESET><GREEN>new_var<RESET>=10
> > +x=<RED>123<RESET><GREEN>456<RESET>
> > +y=<RED>3.14<RESET><GREEN>2.71<RESET>
> > +z=<RED>.5<RESET><GREEN>.75<RESET>
>
> OK.
>
> > +echo <RED>$USER<RESET><GREEN>$USERNAME<RESET>
>
> This tests a typical variable expansion. Good.
>
> > +${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>}
>
> A more elaborate variable expansion does not include the surrounding ${
> }. Good.
>
> > +if [ "<RED>$a<RESET><GREEN>$x<RESET>" == "<RED>$b<RESET><GREEN>$y<RESET>" ] || [ "<RED>$c<RESET><GREEN>$x<RESET>" != "<RED>$d<RESET><GREEN>$y<RESET>" ]; then echo "OK"; fi
>
> This line also only tests variable expansions and is quite redundant. It
> could test the operators that we see, but it doesn't. See below for
> ideas how to do that.
>
> And all from here...
>
> > +((<RED>a<RESET><GREEN>x<RESET>+=<RED>b<RESET><GREEN>y<RESET>))
> > +((<RED>a<RESET><GREEN>x<RESET>-=<RED>b<RESET><GREEN>y<RESET>))
> > +$((<RED>a<RESET><GREEN>x<RESET><<<RED>b<RESET><GREEN>y<RESET>))
> > +$((<RED>a<RESET><GREEN>x<RESET>>><RED>b<RESET><GREEN>y<RESET>))
> > +${<RED>a<RESET><GREEN>x<RESET>:-<RED>b<RESET><GREEN>y<RESET>}
> > +${<RED>a<RESET><GREEN>x<RESET>:=<RED>b<RESET><GREEN>y<RESET>}
> > +${<RED>a<RESET><GREEN>x<RESET>##*/}
> > +${<RED>a<RESET><GREEN>x<RESET>%.*}
> > +${<RED>a<RESET><GREEN>x<RESET>%%.*}
> > +${<RED>a<RESET><GREEN>x<RESET>^^}
> > +${<RED>a<RESET><GREEN>x<RESET>,}
> > +${<RED>a<RESET><GREEN>x<RESET>,,}
>
> ... to here also only test variable expansions, but not the operators.
> As written, they are totally redundant and should be dropped or improved.
>
> To test, for example, that '##' is kept together as an operator, you
> have to have
>
>    ${x#*/}
>
> in the pre-image and
>
>    ${x##*/}
>
> in the post-image, so that we see
>
>    ${x<RED>#<RESET><GREEN>##<RESET>*/}
>
> in the result. If '##' were two tokens (due to a bug in the pattern), we
> would see
>
>    ${x#<GREEN>#<RESET>*/}
>
> in the result.
>
> You should go through all not-single-character operators and have a
> pre-image and a post-image where one characters is added or removed. The
> following is not a correct test:
>
> pre:    ((x+=b))
> post:   ((x-=b))
> result: ((x<RED>+=<RESET><GREEN>-=<RESET>))
>
> because we would see this result even if the pattern has a bug that
> recognizes only one of += or -=, but would split the other one. A
> correct test would be:
>
> pre:    ((x+b))
> post:   ((x+=b))
> result: ((x<RED>+<RESET><GREEN>+=<RESET>))
>
> > +${!<RED>a<RESET><GREEN>x<RESET>}
>
> Here, you should have no '!' in the pre-image and the '!' in the post
> image and not change the name. Then the test demonstrates that '!' is
> its own token and not merged with '${' (if that is the intent of the
> test; otherwise it is a redunant test).
>
> > +${<RED>a<RESET><GREEN>x<RESET>[@]}
>
> If you want to test that '@' is not merged with the brackets, then you
> can have, for example '*' in the pre-image and '@' in the post image.

>
> > +${<RED>a<RESET><GREEN>x<RESET>:?error message}
>
> Either redundant or another two-character operator to test.

Ok I got it.
>
> > +${<RED>a<RESET><GREEN>x<RESET>:2:3}
>
> Redundant.
>
I understand I will remove it.

> > +ls <RED>-a<RESET><GREEN>-x<RESET>
> > +ls <RED>--a<RESET><GREEN>--x<RESET>
>
> Both are good tests.

Thank you. I understand the suggestions about the word diff tests and
I will remove the redundant ones and write the correct tests .
>
> > diff --git a/userdiff.c b/userdiff.c
> > index 340c4eb4f7..4c77c7e0f6 100644
> > --- a/userdiff.c
> > +++ b/userdiff.c
> > @@ -64,15 +64,27 @@ PATTERNS("bash",
> >            /* Bashism identifier with optional parentheses */
> >            "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
> >        ")"
> > -      /* Optional whitespace */
> > -      "[ \t]*"
> > -      /* Compound command starting with `{`, `(`, `((` or `[[` */
> > -      "(\\{|\\(\\(?|\\[\\[)"
> > +      /* Everything after the function header is captured  */
> > +      ".*$"
>
> I remember suggesting to capture everything after the function header.
> However, If I am not mistaken, this does not do what I intended (and as
> written it means that a pointless matching operation happens). The hunk
> header shows everything that is in the outermost parentheses (group).
> This catch-all, however, is even outside the outermost group and not
> captured. It should be above the closing parenthesis that we see in the
> context.

> I am sorry I want to understand a thing  , we want everything from the function name to the end of the line to be the hunk header right ? So in the pattern "^[ \t]*"
/* Start of captured text */
"("
"("
     /* POSIX identifier with mandatory parentheses */
     "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
"|"
     /* Bashism identifier with optional parentheses */
     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
")"
/* Optional whitespace */
"[ \t]*"
/* Compound command starting with `{`, `(`, `((` or `[[` */
"(\\{|\\(\\(?|\\[\\[)"
/* End of captured text */
")"
if I replace
"[ \t]*"
"(\\{|\\(\\(?|\\[\\[)"  part with .*$ then will it not capture the
entire line ? I mean the outermost group ends here-

 /* End of captured text */
")"
right ? What am I getting wrong ?

> To test for this, you can have one test where "RIGHT" is not the
> function name, but in the comment on the function header line.
>
> >        /* End of captured text */
> >        ")",
> >        /* -- */
> > -      /* Characters not in the default $IFS value */
> > -      "[^ \t]+"),
> > +      /* Identifiers: variable and function names */
> > +       "[a-zA-Z_][a-zA-Z0-9_]*"
> > +      /* Numeric constants: integers and decimals */
> > +       "|[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+"
> > +      /* Shell variables: $VAR, ${VAR} */
> > +       "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{"
> > +       /* Logical and comparison operators */
> > +      "|\\|\\||&&|<<|>>|==|!=|<=|>="
> > +      /* Assignment and arithmetic operators */
> > +      "|[-+*/%&|^!=<>]=?"
> > +      /* Additional parameter expansion operators */
> > +      "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|/[a-zA-Z0-9_-]+|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?"
>
> What is the purpose of this "/[a-zA-Z0-9_-]+"? It would mean, for
> example that changes in alphanumeric path names would include the slash
> in the changed part. I don't think this should be here.

> Ok I got it.

> > +      /* Command-line options (to avoid splitting -option) */
> > +      "|--?[a-zA-Z0-9_-]+"
> > +      /* Brackets and grouping symbols */
> > +      "|\\(|\\)|\\{|\\}|\\[|\\]"),
> >  PATTERNS("bibtex",
> >        "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
> >        /* -- */
>
> -- Hannes
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 1/1][GSOC] userdiff: extend Bash pattern to cover more shell function forms
  2025-05-10 11:37           ` MOUMITA DHAR
@ 2025-05-10 12:40             ` Johannes Sixt
  0 siblings, 0 replies; 37+ messages in thread
From: Johannes Sixt @ 2025-05-10 12:40 UTC (permalink / raw)
  To: MOUMITA DHAR
  Cc: Eric Sunshine, Junio C Hamano, René Scharfe, Atharva Raykar,
	D. Ben Knoble, git

The way how you cited the message makes it very difficult to see what is
citation, code, or your message text. Please insert sufficient blank
lines in your replies. Also, indentation may help to declare what is the
code that you talk about.

Am 10.05.25 um 13:37 schrieb MOUMITA DHAR:
> On Tue, 6 May 2025 at 22:00, Johannes Sixt <j6t@kdbg.org> wrote:
>>> diff --git a/userdiff.c b/userdiff.c
>>> index 340c4eb4f7..4c77c7e0f6 100644
>>> --- a/userdiff.c
>>> +++ b/userdiff.c
>>> @@ -64,15 +64,27 @@ PATTERNS("bash",
>>>            /* Bashism identifier with optional parentheses */
>>>            "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
>>>        ")"
>>> -      /* Optional whitespace */
>>> -      "[ \t]*"
>>> -      /* Compound command starting with `{`, `(`, `((` or `[[` */
>>> -      "(\\{|\\(\\(?|\\[\\[)"
>>> +      /* Everything after the function header is captured  */
>>> +      ".*$"
>>
>> I remember suggesting to capture everything after the function header.
>> However, If I am not mistaken, this does not do what I intended (and as
>> written it means that a pointless matching operation happens). The hunk
>> header shows everything that is in the outermost parentheses (group).
>> This catch-all, however, is even outside the outermost group and not
>> captured. It should be above the closing parenthesis that we see in the
>> context.
> > I am sorry I want to understand a thing  , we want everything from
> the function name to the end of the line to be the hunk header
> right ? So in the pattern

> "^[ \t]*"> /* Start of captured text */
> "("
> "("
>      /* POSIX identifier with mandatory parentheses */
>      "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
> "|"
>      /* Bashism identifier with optional parentheses */
>      "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
> ")"
> /* Optional whitespace */
> "[ \t]*"
> /* Compound command starting with `{`, `(`, `((` or `[[` */
> "(\\{|\\(\\(?|\\[\\[)"
> /* End of captured text */
> ")"

> if I replace

> "[ \t]*"
> "(\\{|\\(\\(?|\\[\\[)"

> part with .*$ then will it not capture the
> entire line ? I mean the outermost group ends here-
> 
>  /* End of captured text */
> ")"
> right ? What am I getting wrong ?

It is my error, sorry. I see now that your intended change was correct.

However, in my tests, a comment after the function name is not captured.
The reason is that the parentheses are not balanced. My editor tells me
that the parenthesis in "(function" is matched with the lone ")" in the
next line, but the latter is intended to match up with the "(" above the
'/* POSIX' comment. There is something fishy going on and needs a fix.

To test, apply this:

diff --git a/t/t4018/bash-posix-style-function b/t/t4018/bash-posix-style-function
index a4d144856e..673c51b89e 100644
--- a/t/t4018/bash-posix-style-function
+++ b/t/t4018/bash-posix-style-function
@@ -1,4 +1,4 @@
-RIGHT() {
+afunc () { # RIGHT
 
     ChangeMe
 }

-- Hannes


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 0/1] Added the closing ")" to make sure is not unbalanced and corrected the tests for word diff
  2025-03-30 13:39     ` [PATCH v4 0/1][GSOC] userdiff:Added newlines at the end of the test cases Moumita
  2025-03-30 13:39       ` [PATCH v4 1/1][GSOC] userdiff: extend Bash pattern to cover more shell function forms Moumita
@ 2025-05-11 12:58       ` Moumita
  2025-05-11 12:58         ` [PATCH v5 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
                           ` (3 more replies)
  1 sibling, 4 replies; 37+ messages in thread
From: Moumita @ 2025-05-11 12:58 UTC (permalink / raw)
  To: git; +Cc: Moumita, Johannes Sixt, Eric Sunshine, Junio C Hamano

I wrapped the POSIX function pattern in an extra set of parentheses and added 
a closing parenthesis in the Bashism function case to balance the opening group ,this makes the grouping explicit and consistent
with the outer structure of the regex. 

Also I added the test case so that everything on the function definition line is captured correctly.
And corrected the word diff test so that the operators like '+=' are treated as a single token.

Moumita Dhar (1):
  userdiff: extend Bash pattern to cover more shell function forms

 t/t4018/bash-bashism-style-multiline-function |  4 ++
 .../bash-hunk-header-complete-line-capture    |  4 ++
 t/t4018/bash-posix-style-multiline-function   |  4 ++
 .../bash-posix-style-single-command-function  |  3 ++
 t/t4034-diff-words.sh                         |  1 +
 t/t4034/bash/expect                           | 38 +++++++++++++++++++
 t/t4034/bash/post                             | 33 ++++++++++++++++
 t/t4034/bash/pre                              | 33 ++++++++++++++++
 userdiff.c                                    | 28 ++++++++++----
 9 files changed, 140 insertions(+), 8 deletions(-)
 create mode 100644 t/t4018/bash-bashism-style-multiline-function
 create mode 100644 t/t4018/bash-hunk-header-complete-line-capture
 create mode 100644 t/t4018/bash-posix-style-multiline-function
 create mode 100644 t/t4018/bash-posix-style-single-command-function
 create mode 100644 t/t4034/bash/expect
 create mode 100644 t/t4034/bash/post
 create mode 100644 t/t4034/bash/pre

Range-diff against v4:
1:  40cffd3b4a ! 1:  478b77e0c8 userdiff: extend Bash pattern to cover more shell function forms
    @@ Commit message
           `x () echo hello`.
     
         Replacing the function body matching logic with `.*$`, ensures
    -    that everything on the function definition line is captured,
    -    aligning with other userdiff drivers and improving hunk headers in
    -    `git diff`.
    +    that everything on the function definition line is captured.
     
         Additionally, the word regex is refined to better recognize shell
         syntax, including additional parameter expansion operators and
    -    command-line options, improving syntax-aware diffs.
    +    command-line options.
     
         Signed-off-by: Moumita Dhar <dhar61595@gmail.com>
     
    @@ t/t4018/bash-bashism-style-multiline-function (new)
     +    echo 'ChangeMe'
     +}
     
    + ## t/t4018/bash-hunk-header-complete-line-capture (new) ##
    +@@
    ++func() { # RIGHT
    ++
    ++    ChangeMe
    ++}
    + \ No newline at end of file
    +
      ## t/t4018/bash-posix-style-multiline-function (new) ##
     @@
     +RIGHT() \
    @@ t/t4034/bash/expect (new)
     +<BOLD>index 09ac008..60ba6a2 100644<RESET>
     +<BOLD>--- a/pre<RESET>
     +<BOLD>+++ b/post<RESET>
    -+<CYAN>@@ -1,25 +1,25 @@<RESET>
    ++<CYAN>@@ -1,33 +1,33 @@<RESET>
     +<RED>my_var<RESET><GREEN>new_var<RESET>=10
     +x=<RED>123<RESET><GREEN>456<RESET>
     +y=<RED>3.14<RESET><GREEN>2.71<RESET>
     +z=<RED>.5<RESET><GREEN>.75<RESET>
     +echo <RED>$USER<RESET><GREEN>$USERNAME<RESET>
     +${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>}
    -+if [ "<RED>$a<RESET><GREEN>$x<RESET>" == "<RED>$b<RESET><GREEN>$y<RESET>" ] || [ "<RED>$c<RESET><GREEN>$x<RESET>" != "<RED>$d<RESET><GREEN>$y<RESET>" ]; then echo "OK"; fi
    -+((<RED>a<RESET><GREEN>x<RESET>+=<RED>b<RESET><GREEN>y<RESET>))
    -+((<RED>a<RESET><GREEN>x<RESET>-=<RED>b<RESET><GREEN>y<RESET>))
    -+$((<RED>a<RESET><GREEN>x<RESET><<<RED>b<RESET><GREEN>y<RESET>))
    -+$((<RED>a<RESET><GREEN>x<RESET>>><RED>b<RESET><GREEN>y<RESET>))
    -+${<RED>a<RESET><GREEN>x<RESET>:-<RED>b<RESET><GREEN>y<RESET>}
    -+${<RED>a<RESET><GREEN>x<RESET>:=<RED>b<RESET><GREEN>y<RESET>}
    -+${<RED>a<RESET><GREEN>x<RESET>##*/}
    -+${<RED>a<RESET><GREEN>x<RESET>%.*}
    -+${<RED>a<RESET><GREEN>x<RESET>%%.*}
    -+${<RED>a<RESET><GREEN>x<RESET>^^}
    -+${<RED>a<RESET><GREEN>x<RESET>,}
    -+${<RED>a<RESET><GREEN>x<RESET>,,}
    -+${!<RED>a<RESET><GREEN>x<RESET>}
    -+${<RED>a<RESET><GREEN>x<RESET>[@]}
    -+${<RED>a<RESET><GREEN>x<RESET>:?error message}
    -+${<RED>a<RESET><GREEN>x<RESET>:2:3}
    ++((a<RED>+<RESET><GREEN>+=<RESET>b))
    ++((a<RED>*<RESET><GREEN>*=<RESET>b))
    ++((a<RED>/<RESET><GREEN>/=<RESET>b))
    ++((a<RED>%<RESET><GREEN>%=<RESET>b))
    ++((a<RED>|<RESET><GREEN>|=<RESET>b))
    ++((a<RED>^<RESET><GREEN>^=<RESET>b))
    ++((a<RED>=<RESET><GREEN>==<RESET>b))
    ++((a<RED>!<RESET><GREEN>!=<RESET>b))
    ++((a<RED><<RESET><GREEN><=<RESET>b))
    ++((a<RED>><RESET><GREEN>>=<RESET>b))
    ++$((a<RED><<RESET><GREEN><<<RESET>b))
    ++$((a<RED>><RESET><GREEN>>><RESET>b))
    ++$((a<RED>&<RESET><GREEN>&&<RESET>b))
    ++$((a<RED>|<RESET><GREEN>||<RESET>b))
    ++${a<RED>:<RESET><GREEN>:-<RESET>b}
    ++${a<RED>:<RESET><GREEN>:=<RESET>b}
    ++${a<RED>:<RESET><GREEN>:+<RESET>b}
    ++${a<RED>:<RESET><GREEN>:?<RESET>b}
    ++${a<RED>#<RESET><GREEN>##<RESET>*/}
    ++${a<RED>%<RESET><GREEN>%%<RESET>.*}
    ++${a<RED>^<RESET><GREEN>^^<RESET>}
    ++${a<RED>,<RESET><GREEN>,,<RESET>}
    ++${<GREEN>!<RESET>a}
    ++${a[<RED>*<RESET><GREEN>@<RESET>]}
    ++${a<RED>:2:3<RESET><GREEN>:4:6<RESET>}
     +ls <RED>-a<RESET><GREEN>-x<RESET>
     +ls <RED>--a<RESET><GREEN>--x<RESET>
     
    @@ t/t4034/bash/post (new)
     +z=.75
     +echo $USERNAME
     +${HOMEDIR}
    -+if [ "$x" == "$y" ] || [ "$x" != "$y" ]; then echo "OK"; fi
    -+((x+=y))
    -+((x-=y))
    -+$((x<<y))
    -+$((x>>y))
    -+${x:-y}
    -+${x:=y}
    -+${x##*/}
    -+${x%.*}
    -+${x%%.*}
    -+${x^^}
    -+${x,}
    -+${x,,}
    -+${!x}
    -+${x[@]}
    -+${x:?error message}
    -+${x:2:3}
    ++((a+=b))
    ++((a*=b))
    ++((a/=b))
    ++((a%=b))
    ++((a|=b))
    ++((a^=b))
    ++((a==b))
    ++((a!=b))
    ++((a<=b))
    ++((a>=b))
    ++$((a<<b))
    ++$((a>>b))
    ++$((a&&b))
    ++$((a||b))
    ++${a:-b}
    ++${a:=b}
    ++${a:+b}
    ++${a:?b}
    ++${a##*/}
    ++${a%%.*}
    ++${a^^}
    ++${a,,}
    ++${!a}
    ++${a[@]}
    ++${a:4:6}
     +ls -x
     +ls --x
     
    @@ t/t4034/bash/pre (new)
     +z=.5
     +echo $USER
     +${HOME}
    -+if [ "$a" == "$b" ] || [ "$c" != "$d" ]; then echo "OK"; fi
    -+((a+=b))
    -+((a-=b))
    -+$((a << b))
    -+$((a >> b))
    -+${a:-b}
    -+${a:=b}
    -+${a##*/}
    ++((a+b))
    ++((a*b))
    ++((a/b))
    ++((a%b))
    ++((a|b))
    ++((a^b))
    ++((a=b))
    ++((a!b))
    ++((a<b))
    ++((a>b))
    ++$((a<b))
    ++$((a>b))
    ++$((a&b))
    ++$((a|b))
    ++${a:b}
    ++${a:b}
    ++${a:b}
    ++${a:b}
    ++${a#*/}
     +${a%.*}
    -+${a%%.*}
    -+${a^^}
    ++${a^}
     +${a,}
    -+${a,,}
    -+${!a}
    -+${a[@]}
    -+${a:?error message}
    ++${a}
    ++${a[*]}
     +${a:2:3}
     +ls -a
     +ls --a
     
      ## userdiff.c ##
     @@ userdiff.c: PATTERNS("bash",
    + 	 "("
    + 	 "("
    + 	     /* POSIX identifier with mandatory parentheses */
    +-	     "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
    ++	     "([a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
    + 	 "|"
      	     /* Bashism identifier with optional parentheses */
    - 	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
    +-	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
    ++	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+)))"
      	 ")"
     -	 /* Optional whitespace */
     -	 "[ \t]*"
    @@ userdiff.c: PATTERNS("bash",
     +	 /* Assignment and arithmetic operators */
     +	 "|[-+*/%&|^!=<>]=?"
     +	 /* Additional parameter expansion operators */
    -+	 "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|/[a-zA-Z0-9_-]+|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?"
    ++	 "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?"
     +	 /* Command-line options (to avoid splitting -option) */
     +	 "|--?[a-zA-Z0-9_-]+"
     +	 /* Brackets and grouping symbols */
-- 
2.48.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v5 1/1] userdiff: extend Bash pattern to cover more shell function forms
  2025-05-11 12:58       ` [PATCH v5 0/1] Added the closing ")" to make sure is not unbalanced and corrected the tests for word diff Moumita
@ 2025-05-11 12:58         ` Moumita
  2025-05-11 13:28         ` Moumita
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 37+ messages in thread
From: Moumita @ 2025-05-11 12:58 UTC (permalink / raw)
  To: git; +Cc: Moumita Dhar, Johannes Sixt, Eric Sunshine, Junio C Hamano

From: Moumita Dhar <dhar61595@gmail.com>

The previous function regex required explicit matching of function
bodies using `{`, `(`, `((`, or `[[`, which caused several issues:

- It failed to capture valid functions where `{` was on the next line
  due to line continuation (`\`).
- It did not recognize functions with single  command body, such as
  `x () echo hello`.

Replacing the function body matching logic with `.*$`, ensures
that everything on the function definition line is captured.

Additionally, the word regex is refined to better recognize shell
syntax, including additional parameter expansion operators and
command-line options.

Signed-off-by: Moumita Dhar <dhar61595@gmail.com>
---
 t/t4018/bash-bashism-style-multiline-function |  4 ++
 .../bash-hunk-header-complete-line-capture    |  4 ++
 t/t4018/bash-posix-style-multiline-function   |  4 ++
 .../bash-posix-style-single-command-function  |  3 ++
 t/t4034-diff-words.sh                         |  1 +
 t/t4034/bash/expect                           | 38 +++++++++++++++++++
 t/t4034/bash/post                             | 33 ++++++++++++++++
 t/t4034/bash/pre                              | 33 ++++++++++++++++
 userdiff.c                                    | 28 ++++++++++----
 9 files changed, 140 insertions(+), 8 deletions(-)
 create mode 100644 t/t4018/bash-bashism-style-multiline-function
 create mode 100644 t/t4018/bash-hunk-header-complete-line-capture
 create mode 100644 t/t4018/bash-posix-style-multiline-function
 create mode 100644 t/t4018/bash-posix-style-single-command-function
 create mode 100644 t/t4034/bash/expect
 create mode 100644 t/t4034/bash/post
 create mode 100644 t/t4034/bash/pre

diff --git a/t/t4018/bash-bashism-style-multiline-function b/t/t4018/bash-bashism-style-multiline-function
new file mode 100644
index 0000000000..284d50dd99
--- /dev/null
+++ b/t/t4018/bash-bashism-style-multiline-function
@@ -0,0 +1,4 @@
+function RIGHT \
+{    
+    echo 'ChangeMe'
+}
diff --git a/t/t4018/bash-hunk-header-complete-line-capture b/t/t4018/bash-hunk-header-complete-line-capture
new file mode 100644
index 0000000000..818c8c5a5f
--- /dev/null
+++ b/t/t4018/bash-hunk-header-complete-line-capture
@@ -0,0 +1,4 @@
+func() { # RIGHT
+
+    ChangeMe
+}
\ No newline at end of file
diff --git a/t/t4018/bash-posix-style-multiline-function b/t/t4018/bash-posix-style-multiline-function
new file mode 100644
index 0000000000..cc8727cbcd
--- /dev/null
+++ b/t/t4018/bash-posix-style-multiline-function
@@ -0,0 +1,4 @@
+RIGHT() \
+{
+    ChangeMe
+}
diff --git a/t/t4018/bash-posix-style-single-command-function b/t/t4018/bash-posix-style-single-command-function
new file mode 100644
index 0000000000..398ae1c5d2
--- /dev/null
+++ b/t/t4018/bash-posix-style-single-command-function
@@ -0,0 +1,3 @@
+RIGHT() echo "hello"
+
+    ChangeMe
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index f51d3557f1..0be647c2fb 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -320,6 +320,7 @@ test_expect_success 'unset default driver' '
 
 test_language_driver ada
 test_language_driver bibtex
+test_language_driver bash
 test_language_driver cpp
 test_language_driver csharp
 test_language_driver css
diff --git a/t/t4034/bash/expect b/t/t4034/bash/expect
new file mode 100644
index 0000000000..17755e455f
--- /dev/null
+++ b/t/t4034/bash/expect
@@ -0,0 +1,38 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 09ac008..60ba6a2 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,33 +1,33 @@<RESET>
+<RED>my_var<RESET><GREEN>new_var<RESET>=10
+x=<RED>123<RESET><GREEN>456<RESET>
+y=<RED>3.14<RESET><GREEN>2.71<RESET>
+z=<RED>.5<RESET><GREEN>.75<RESET>
+echo <RED>$USER<RESET><GREEN>$USERNAME<RESET>
+${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>}
+((a<RED>+<RESET><GREEN>+=<RESET>b))
+((a<RED>*<RESET><GREEN>*=<RESET>b))
+((a<RED>/<RESET><GREEN>/=<RESET>b))
+((a<RED>%<RESET><GREEN>%=<RESET>b))
+((a<RED>|<RESET><GREEN>|=<RESET>b))
+((a<RED>^<RESET><GREEN>^=<RESET>b))
+((a<RED>=<RESET><GREEN>==<RESET>b))
+((a<RED>!<RESET><GREEN>!=<RESET>b))
+((a<RED><<RESET><GREEN><=<RESET>b))
+((a<RED>><RESET><GREEN>>=<RESET>b))
+$((a<RED><<RESET><GREEN><<<RESET>b))
+$((a<RED>><RESET><GREEN>>><RESET>b))
+$((a<RED>&<RESET><GREEN>&&<RESET>b))
+$((a<RED>|<RESET><GREEN>||<RESET>b))
+${a<RED>:<RESET><GREEN>:-<RESET>b}
+${a<RED>:<RESET><GREEN>:=<RESET>b}
+${a<RED>:<RESET><GREEN>:+<RESET>b}
+${a<RED>:<RESET><GREEN>:?<RESET>b}
+${a<RED>#<RESET><GREEN>##<RESET>*/}
+${a<RED>%<RESET><GREEN>%%<RESET>.*}
+${a<RED>^<RESET><GREEN>^^<RESET>}
+${a<RED>,<RESET><GREEN>,,<RESET>}
+${<GREEN>!<RESET>a}
+${a[<RED>*<RESET><GREEN>@<RESET>]}
+${a<RED>:2:3<RESET><GREEN>:4:6<RESET>}
+ls <RED>-a<RESET><GREEN>-x<RESET>
+ls <RED>--a<RESET><GREEN>--x<RESET>
diff --git a/t/t4034/bash/post b/t/t4034/bash/post
new file mode 100644
index 0000000000..669e218c30
--- /dev/null
+++ b/t/t4034/bash/post
@@ -0,0 +1,33 @@
+new_var=10
+x=456
+y=2.71
+z=.75
+echo $USERNAME
+${HOMEDIR}
+((a+=b))
+((a*=b))
+((a/=b))
+((a%=b))
+((a|=b))
+((a^=b))
+((a==b))
+((a!=b))
+((a<=b))
+((a>=b))
+$((a<<b))
+$((a>>b))
+$((a&&b))
+$((a||b))
+${a:-b}
+${a:=b}
+${a:+b}
+${a:?b}
+${a##*/}
+${a%%.*}
+${a^^}
+${a,,}
+${!a}
+${a[@]}
+${a:4:6}
+ls -x
+ls --x
diff --git a/t/t4034/bash/pre b/t/t4034/bash/pre
new file mode 100644
index 0000000000..ada8470bac
--- /dev/null
+++ b/t/t4034/bash/pre
@@ -0,0 +1,33 @@
+my_var=10
+x=123
+y=3.14
+z=.5
+echo $USER
+${HOME}
+((a+b))
+((a*b))
+((a/b))
+((a%b))
+((a|b))
+((a^b))
+((a=b))
+((a!b))
+((a<b))
+((a>b))
+$((a<b))
+$((a>b))
+$((a&b))
+$((a|b))
+${a:b}
+${a:b}
+${a:b}
+${a:b}
+${a#*/}
+${a%.*}
+${a^}
+${a,}
+${a}
+${a[*]}
+${a:2:3}
+ls -a
+ls --a
diff --git a/userdiff.c b/userdiff.c
index 340c4eb4f7..655c8fe0b1 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -59,20 +59,32 @@ PATTERNS("bash",
 	 "("
 	 "("
 	     /* POSIX identifier with mandatory parentheses */
-	     "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
+	     "([a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
 	 "|"
 	     /* Bashism identifier with optional parentheses */
-	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
+	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+)))"
 	 ")"
-	 /* Optional whitespace */
-	 "[ \t]*"
-	 /* Compound command starting with `{`, `(`, `((` or `[[` */
-	 "(\\{|\\(\\(?|\\[\\[)"
+	 /* Everything after the function header is captured  */
+	 ".*$"
 	 /* End of captured text */
 	 ")",
 	 /* -- */
-	 /* Characters not in the default $IFS value */
-	 "[^ \t]+"),
+	 /* Identifiers: variable and function names */
+	  "[a-zA-Z_][a-zA-Z0-9_]*"
+	 /* Numeric constants: integers and decimals */
+	  "|[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+"
+	 /* Shell variables: $VAR, ${VAR} */
+	  "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{"
+	  /* Logical and comparison operators */
+	 "|\\|\\||&&|<<|>>|==|!=|<=|>="
+	 /* Assignment and arithmetic operators */
+	 "|[-+*/%&|^!=<>]=?"
+	 /* Additional parameter expansion operators */
+	 "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?"
+	 /* Command-line options (to avoid splitting -option) */
+	 "|--?[a-zA-Z0-9_-]+"
+	 /* Brackets and grouping symbols */
+	 "|\\(|\\)|\\{|\\}|\\[|\\]"),
 PATTERNS("bibtex",
 	 "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
 	 /* -- */
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* (no subject)
  2025-05-11 12:58       ` [PATCH v5 0/1] Added the closing ")" to make sure is not unbalanced and corrected the tests for word diff Moumita
  2025-05-11 12:58         ` [PATCH v5 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
@ 2025-05-11 13:28         ` Moumita
  2025-05-11 13:28           ` Moumita
  2025-05-11 13:37         ` Moumita
  2025-05-11 14:11         ` [PATCH v6 0/1] Added the newline after the test in t/4018 Moumita
  3 siblings, 1 reply; 37+ messages in thread
From: Moumita @ 2025-05-11 13:28 UTC (permalink / raw)
  To: git; +Cc: Moumita, Johannes Sixt, Eric Sunshine, Junio C Hamano

Subject: [PATCH v6 0/1] Added the newline after the test in t/4018

Sorry , I forgot to add the newline after the test in t/4018 again so I had send the patch agaain.

Moumita Dhar (1):
  userdiff: extend Bash pattern to cover more shell function forms

 t/t4018/bash-bashism-style-multiline-function |  4 ++
 .../bash-hunk-header-complete-line-capture    |  4 ++
 t/t4018/bash-posix-style-multiline-function   |  4 ++
 .../bash-posix-style-single-command-function  |  3 ++
 t/t4034-diff-words.sh                         |  1 +
 t/t4034/bash/expect                           | 38 +++++++++++++++++++
 t/t4034/bash/post                             | 33 ++++++++++++++++
 t/t4034/bash/pre                              | 33 ++++++++++++++++
 userdiff.c                                    | 28 ++++++++++----
 9 files changed, 140 insertions(+), 8 deletions(-)
 create mode 100644 t/t4018/bash-bashism-style-multiline-function
 create mode 100644 t/t4018/bash-hunk-header-complete-line-capture
 create mode 100644 t/t4018/bash-posix-style-multiline-function
 create mode 100644 t/t4018/bash-posix-style-single-command-function
 create mode 100644 t/t4034/bash/expect
 create mode 100644 t/t4034/bash/post
 create mode 100644 t/t4034/bash/pre

Range-diff against v5:
1:  40cffd3b4a ! 1:  464cb8a1eb userdiff: extend Bash pattern to cover more shell function forms
    @@ Commit message
           `x () echo hello`.
     
         Replacing the function body matching logic with `.*$`, ensures
    -    that everything on the function definition line is captured,
    -    aligning with other userdiff drivers and improving hunk headers in
    -    `git diff`.
    +    that everything on the function definition line is captured.
     
         Additionally, the word regex is refined to better recognize shell
         syntax, including additional parameter expansion operators and
    -    command-line options, improving syntax-aware diffs.
    +    command-line options.
     
         Signed-off-by: Moumita Dhar <dhar61595@gmail.com>
     
    @@ t/t4018/bash-bashism-style-multiline-function (new)
     +function RIGHT \
     +{    
     +    echo 'ChangeMe'
    ++}
    +
    + ## t/t4018/bash-hunk-header-complete-line-capture (new) ##
    +@@
    ++func() { # RIGHT
    ++
    ++    ChangeMe
     +}
     
      ## t/t4018/bash-posix-style-multiline-function (new) ##
    @@ t/t4034/bash/expect (new)
     +<BOLD>index 09ac008..60ba6a2 100644<RESET>
     +<BOLD>--- a/pre<RESET>
     +<BOLD>+++ b/post<RESET>
    -+<CYAN>@@ -1,25 +1,25 @@<RESET>
    ++<CYAN>@@ -1,33 +1,33 @@<RESET>
     +<RED>my_var<RESET><GREEN>new_var<RESET>=10
     +x=<RED>123<RESET><GREEN>456<RESET>
     +y=<RED>3.14<RESET><GREEN>2.71<RESET>
     +z=<RED>.5<RESET><GREEN>.75<RESET>
     +echo <RED>$USER<RESET><GREEN>$USERNAME<RESET>
     +${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>}
    -+if [ "<RED>$a<RESET><GREEN>$x<RESET>" == "<RED>$b<RESET><GREEN>$y<RESET>" ] || [ "<RED>$c<RESET><GREEN>$x<RESET>" != "<RED>$d<RESET><GREEN>$y<RESET>" ]; then echo "OK"; fi
    -+((<RED>a<RESET><GREEN>x<RESET>+=<RED>b<RESET><GREEN>y<RESET>))
    -+((<RED>a<RESET><GREEN>x<RESET>-=<RED>b<RESET><GREEN>y<RESET>))
    -+$((<RED>a<RESET><GREEN>x<RESET><<<RED>b<RESET><GREEN>y<RESET>))
    -+$((<RED>a<RESET><GREEN>x<RESET>>><RED>b<RESET><GREEN>y<RESET>))
    -+${<RED>a<RESET><GREEN>x<RESET>:-<RED>b<RESET><GREEN>y<RESET>}
    -+${<RED>a<RESET><GREEN>x<RESET>:=<RED>b<RESET><GREEN>y<RESET>}
    -+${<RED>a<RESET><GREEN>x<RESET>##*/}
    -+${<RED>a<RESET><GREEN>x<RESET>%.*}
    -+${<RED>a<RESET><GREEN>x<RESET>%%.*}
    -+${<RED>a<RESET><GREEN>x<RESET>^^}
    -+${<RED>a<RESET><GREEN>x<RESET>,}
    -+${<RED>a<RESET><GREEN>x<RESET>,,}
    -+${!<RED>a<RESET><GREEN>x<RESET>}
    -+${<RED>a<RESET><GREEN>x<RESET>[@]}
    -+${<RED>a<RESET><GREEN>x<RESET>:?error message}
    -+${<RED>a<RESET><GREEN>x<RESET>:2:3}
    ++((a<RED>+<RESET><GREEN>+=<RESET>b))
    ++((a<RED>*<RESET><GREEN>*=<RESET>b))
    ++((a<RED>/<RESET><GREEN>/=<RESET>b))
    ++((a<RED>%<RESET><GREEN>%=<RESET>b))
    ++((a<RED>|<RESET><GREEN>|=<RESET>b))
    ++((a<RED>^<RESET><GREEN>^=<RESET>b))
    ++((a<RED>=<RESET><GREEN>==<RESET>b))
    ++((a<RED>!<RESET><GREEN>!=<RESET>b))
    ++((a<RED><<RESET><GREEN><=<RESET>b))
    ++((a<RED>><RESET><GREEN>>=<RESET>b))
    ++$((a<RED><<RESET><GREEN><<<RESET>b))
    ++$((a<RED>><RESET><GREEN>>><RESET>b))
    ++$((a<RED>&<RESET><GREEN>&&<RESET>b))
    ++$((a<RED>|<RESET><GREEN>||<RESET>b))
    ++${a<RED>:<RESET><GREEN>:-<RESET>b}
    ++${a<RED>:<RESET><GREEN>:=<RESET>b}
    ++${a<RED>:<RESET><GREEN>:+<RESET>b}
    ++${a<RED>:<RESET><GREEN>:?<RESET>b}
    ++${a<RED>#<RESET><GREEN>##<RESET>*/}
    ++${a<RED>%<RESET><GREEN>%%<RESET>.*}
    ++${a<RED>^<RESET><GREEN>^^<RESET>}
    ++${a<RED>,<RESET><GREEN>,,<RESET>}
    ++${<GREEN>!<RESET>a}
    ++${a[<RED>*<RESET><GREEN>@<RESET>]}
    ++${a<RED>:2:3<RESET><GREEN>:4:6<RESET>}
     +ls <RED>-a<RESET><GREEN>-x<RESET>
     +ls <RED>--a<RESET><GREEN>--x<RESET>
     
    @@ t/t4034/bash/post (new)
     +z=.75
     +echo $USERNAME
     +${HOMEDIR}
    -+if [ "$x" == "$y" ] || [ "$x" != "$y" ]; then echo "OK"; fi
    -+((x+=y))
    -+((x-=y))
    -+$((x<<y))
    -+$((x>>y))
    -+${x:-y}
    -+${x:=y}
    -+${x##*/}
    -+${x%.*}
    -+${x%%.*}
    -+${x^^}
    -+${x,}
    -+${x,,}
    -+${!x}
    -+${x[@]}
    -+${x:?error message}
    -+${x:2:3}
    ++((a+=b))
    ++((a*=b))
    ++((a/=b))
    ++((a%=b))
    ++((a|=b))
    ++((a^=b))
    ++((a==b))
    ++((a!=b))
    ++((a<=b))
    ++((a>=b))
    ++$((a<<b))
    ++$((a>>b))
    ++$((a&&b))
    ++$((a||b))
    ++${a:-b}
    ++${a:=b}
    ++${a:+b}
    ++${a:?b}
    ++${a##*/}
    ++${a%%.*}
    ++${a^^}
    ++${a,,}
    ++${!a}
    ++${a[@]}
    ++${a:4:6}
     +ls -x
     +ls --x
     
    @@ t/t4034/bash/pre (new)
     +z=.5
     +echo $USER
     +${HOME}
    -+if [ "$a" == "$b" ] || [ "$c" != "$d" ]; then echo "OK"; fi
    -+((a+=b))
    -+((a-=b))
    -+$((a << b))
    -+$((a >> b))
    -+${a:-b}
    -+${a:=b}
    -+${a##*/}
    ++((a+b))
    ++((a*b))
    ++((a/b))
    ++((a%b))
    ++((a|b))
    ++((a^b))
    ++((a=b))
    ++((a!b))
    ++((a<b))
    ++((a>b))
    ++$((a<b))
    ++$((a>b))
    ++$((a&b))
    ++$((a|b))
    ++${a:b}
    ++${a:b}
    ++${a:b}
    ++${a:b}
    ++${a#*/}
     +${a%.*}
    -+${a%%.*}
    -+${a^^}
    ++${a^}
     +${a,}
    -+${a,,}
    -+${!a}
    -+${a[@]}
    -+${a:?error message}
    ++${a}
    ++${a[*]}
     +${a:2:3}
     +ls -a
     +ls --a
     
      ## userdiff.c ##
     @@ userdiff.c: PATTERNS("bash",
    + 	 "("
    + 	 "("
    + 	     /* POSIX identifier with mandatory parentheses */
    +-	     "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
    ++	     "([a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
    + 	 "|"
      	     /* Bashism identifier with optional parentheses */
    - 	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
    +-	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
    ++	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+)))"
      	 ")"
     -	 /* Optional whitespace */
     -	 "[ \t]*"
    @@ userdiff.c: PATTERNS("bash",
     +	 /* Assignment and arithmetic operators */
     +	 "|[-+*/%&|^!=<>]=?"
     +	 /* Additional parameter expansion operators */
    -+	 "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|/[a-zA-Z0-9_-]+|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?"
    ++	 "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?"
     +	 /* Command-line options (to avoid splitting -option) */
     +	 "|--?[a-zA-Z0-9_-]+"
     +	 /* Brackets and grouping symbols */
-- 
2.48.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* (no subject)
  2025-05-11 13:28         ` Moumita
@ 2025-05-11 13:28           ` Moumita
  0 siblings, 0 replies; 37+ messages in thread
From: Moumita @ 2025-05-11 13:28 UTC (permalink / raw)
  To: git; +Cc: Moumita Dhar, Johannes Sixt, Eric Sunshine, Junio C Hamano

From: Moumita Dhar <dhar61595@gmail.com>

Subject: [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell
 function forms

The previous function regex required explicit matching of function
bodies using `{`, `(`, `((`, or `[[`, which caused several issues:

- It failed to capture valid functions where `{` was on the next line
  due to line continuation (`\`).
- It did not recognize functions with single  command body, such as
  `x () echo hello`.

Replacing the function body matching logic with `.*$`, ensures
that everything on the function definition line is captured.

Additionally, the word regex is refined to better recognize shell
syntax, including additional parameter expansion operators and
command-line options.

Signed-off-by: Moumita Dhar <dhar61595@gmail.com>
---
 t/t4018/bash-bashism-style-multiline-function |  4 ++
 .../bash-hunk-header-complete-line-capture    |  4 ++
 t/t4018/bash-posix-style-multiline-function   |  4 ++
 .../bash-posix-style-single-command-function  |  3 ++
 t/t4034-diff-words.sh                         |  1 +
 t/t4034/bash/expect                           | 38 +++++++++++++++++++
 t/t4034/bash/post                             | 33 ++++++++++++++++
 t/t4034/bash/pre                              | 33 ++++++++++++++++
 userdiff.c                                    | 28 ++++++++++----
 9 files changed, 140 insertions(+), 8 deletions(-)
 create mode 100644 t/t4018/bash-bashism-style-multiline-function
 create mode 100644 t/t4018/bash-hunk-header-complete-line-capture
 create mode 100644 t/t4018/bash-posix-style-multiline-function
 create mode 100644 t/t4018/bash-posix-style-single-command-function
 create mode 100644 t/t4034/bash/expect
 create mode 100644 t/t4034/bash/post
 create mode 100644 t/t4034/bash/pre

diff --git a/t/t4018/bash-bashism-style-multiline-function b/t/t4018/bash-bashism-style-multiline-function
new file mode 100644
index 0000000000..284d50dd99
--- /dev/null
+++ b/t/t4018/bash-bashism-style-multiline-function
@@ -0,0 +1,4 @@
+function RIGHT \
+{    
+    echo 'ChangeMe'
+}
diff --git a/t/t4018/bash-hunk-header-complete-line-capture b/t/t4018/bash-hunk-header-complete-line-capture
new file mode 100644
index 0000000000..b56942f322
--- /dev/null
+++ b/t/t4018/bash-hunk-header-complete-line-capture
@@ -0,0 +1,4 @@
+func() { # RIGHT
+
+    ChangeMe
+}
diff --git a/t/t4018/bash-posix-style-multiline-function b/t/t4018/bash-posix-style-multiline-function
new file mode 100644
index 0000000000..cc8727cbcd
--- /dev/null
+++ b/t/t4018/bash-posix-style-multiline-function
@@ -0,0 +1,4 @@
+RIGHT() \
+{
+    ChangeMe
+}
diff --git a/t/t4018/bash-posix-style-single-command-function b/t/t4018/bash-posix-style-single-command-function
new file mode 100644
index 0000000000..398ae1c5d2
--- /dev/null
+++ b/t/t4018/bash-posix-style-single-command-function
@@ -0,0 +1,3 @@
+RIGHT() echo "hello"
+
+    ChangeMe
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index f51d3557f1..0be647c2fb 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -320,6 +320,7 @@ test_expect_success 'unset default driver' '
 
 test_language_driver ada
 test_language_driver bibtex
+test_language_driver bash
 test_language_driver cpp
 test_language_driver csharp
 test_language_driver css
diff --git a/t/t4034/bash/expect b/t/t4034/bash/expect
new file mode 100644
index 0000000000..17755e455f
--- /dev/null
+++ b/t/t4034/bash/expect
@@ -0,0 +1,38 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 09ac008..60ba6a2 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,33 +1,33 @@<RESET>
+<RED>my_var<RESET><GREEN>new_var<RESET>=10
+x=<RED>123<RESET><GREEN>456<RESET>
+y=<RED>3.14<RESET><GREEN>2.71<RESET>
+z=<RED>.5<RESET><GREEN>.75<RESET>
+echo <RED>$USER<RESET><GREEN>$USERNAME<RESET>
+${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>}
+((a<RED>+<RESET><GREEN>+=<RESET>b))
+((a<RED>*<RESET><GREEN>*=<RESET>b))
+((a<RED>/<RESET><GREEN>/=<RESET>b))
+((a<RED>%<RESET><GREEN>%=<RESET>b))
+((a<RED>|<RESET><GREEN>|=<RESET>b))
+((a<RED>^<RESET><GREEN>^=<RESET>b))
+((a<RED>=<RESET><GREEN>==<RESET>b))
+((a<RED>!<RESET><GREEN>!=<RESET>b))
+((a<RED><<RESET><GREEN><=<RESET>b))
+((a<RED>><RESET><GREEN>>=<RESET>b))
+$((a<RED><<RESET><GREEN><<<RESET>b))
+$((a<RED>><RESET><GREEN>>><RESET>b))
+$((a<RED>&<RESET><GREEN>&&<RESET>b))
+$((a<RED>|<RESET><GREEN>||<RESET>b))
+${a<RED>:<RESET><GREEN>:-<RESET>b}
+${a<RED>:<RESET><GREEN>:=<RESET>b}
+${a<RED>:<RESET><GREEN>:+<RESET>b}
+${a<RED>:<RESET><GREEN>:?<RESET>b}
+${a<RED>#<RESET><GREEN>##<RESET>*/}
+${a<RED>%<RESET><GREEN>%%<RESET>.*}
+${a<RED>^<RESET><GREEN>^^<RESET>}
+${a<RED>,<RESET><GREEN>,,<RESET>}
+${<GREEN>!<RESET>a}
+${a[<RED>*<RESET><GREEN>@<RESET>]}
+${a<RED>:2:3<RESET><GREEN>:4:6<RESET>}
+ls <RED>-a<RESET><GREEN>-x<RESET>
+ls <RED>--a<RESET><GREEN>--x<RESET>
diff --git a/t/t4034/bash/post b/t/t4034/bash/post
new file mode 100644
index 0000000000..669e218c30
--- /dev/null
+++ b/t/t4034/bash/post
@@ -0,0 +1,33 @@
+new_var=10
+x=456
+y=2.71
+z=.75
+echo $USERNAME
+${HOMEDIR}
+((a+=b))
+((a*=b))
+((a/=b))
+((a%=b))
+((a|=b))
+((a^=b))
+((a==b))
+((a!=b))
+((a<=b))
+((a>=b))
+$((a<<b))
+$((a>>b))
+$((a&&b))
+$((a||b))
+${a:-b}
+${a:=b}
+${a:+b}
+${a:?b}
+${a##*/}
+${a%%.*}
+${a^^}
+${a,,}
+${!a}
+${a[@]}
+${a:4:6}
+ls -x
+ls --x
diff --git a/t/t4034/bash/pre b/t/t4034/bash/pre
new file mode 100644
index 0000000000..ada8470bac
--- /dev/null
+++ b/t/t4034/bash/pre
@@ -0,0 +1,33 @@
+my_var=10
+x=123
+y=3.14
+z=.5
+echo $USER
+${HOME}
+((a+b))
+((a*b))
+((a/b))
+((a%b))
+((a|b))
+((a^b))
+((a=b))
+((a!b))
+((a<b))
+((a>b))
+$((a<b))
+$((a>b))
+$((a&b))
+$((a|b))
+${a:b}
+${a:b}
+${a:b}
+${a:b}
+${a#*/}
+${a%.*}
+${a^}
+${a,}
+${a}
+${a[*]}
+${a:2:3}
+ls -a
+ls --a
diff --git a/userdiff.c b/userdiff.c
index 340c4eb4f7..655c8fe0b1 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -59,20 +59,32 @@ PATTERNS("bash",
 	 "("
 	 "("
 	     /* POSIX identifier with mandatory parentheses */
-	     "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
+	     "([a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
 	 "|"
 	     /* Bashism identifier with optional parentheses */
-	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
+	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+)))"
 	 ")"
-	 /* Optional whitespace */
-	 "[ \t]*"
-	 /* Compound command starting with `{`, `(`, `((` or `[[` */
-	 "(\\{|\\(\\(?|\\[\\[)"
+	 /* Everything after the function header is captured  */
+	 ".*$"
 	 /* End of captured text */
 	 ")",
 	 /* -- */
-	 /* Characters not in the default $IFS value */
-	 "[^ \t]+"),
+	 /* Identifiers: variable and function names */
+	  "[a-zA-Z_][a-zA-Z0-9_]*"
+	 /* Numeric constants: integers and decimals */
+	  "|[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+"
+	 /* Shell variables: $VAR, ${VAR} */
+	  "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{"
+	  /* Logical and comparison operators */
+	 "|\\|\\||&&|<<|>>|==|!=|<=|>="
+	 /* Assignment and arithmetic operators */
+	 "|[-+*/%&|^!=<>]=?"
+	 /* Additional parameter expansion operators */
+	 "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?"
+	 /* Command-line options (to avoid splitting -option) */
+	 "|--?[a-zA-Z0-9_-]+"
+	 /* Brackets and grouping symbols */
+	 "|\\(|\\)|\\{|\\}|\\[|\\]"),
 PATTERNS("bibtex",
 	 "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
 	 /* -- */
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* (no subject)
  2025-05-11 12:58       ` [PATCH v5 0/1] Added the closing ")" to make sure is not unbalanced and corrected the tests for word diff Moumita
  2025-05-11 12:58         ` [PATCH v5 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
  2025-05-11 13:28         ` Moumita
@ 2025-05-11 13:37         ` Moumita
  2025-05-11 14:11         ` [PATCH v6 0/1] Added the newline after the test in t/4018 Moumita
  3 siblings, 0 replies; 37+ messages in thread
From: Moumita @ 2025-05-11 13:37 UTC (permalink / raw)
  To: git; +Cc: Moumita, Johannes Sixt, Eric Sunshine, Junio C Hamano

Subject: [PATCH v6 0/1] Added the newline after the test in t/4018

Sorry , I forgot to add the newline after the test in t/4018 again so I had send the patch agaain.

Moumita Dhar (1):
  userdiff: extend Bash pattern to cover more shell function forms

 t/t4018/bash-bashism-style-multiline-function |  4 ++
 .../bash-hunk-header-complete-line-capture    |  4 ++
 t/t4018/bash-posix-style-multiline-function   |  4 ++
 .../bash-posix-style-single-command-function  |  3 ++
 t/t4034-diff-words.sh                         |  1 +
 t/t4034/bash/expect                           | 38 +++++++++++++++++++
 t/t4034/bash/post                             | 33 ++++++++++++++++
 t/t4034/bash/pre                              | 33 ++++++++++++++++
 userdiff.c                                    | 28 ++++++++++----
 9 files changed, 140 insertions(+), 8 deletions(-)
 create mode 100644 t/t4018/bash-bashism-style-multiline-function
 create mode 100644 t/t4018/bash-hunk-header-complete-line-capture
 create mode 100644 t/t4018/bash-posix-style-multiline-function
 create mode 100644 t/t4018/bash-posix-style-single-command-function
 create mode 100644 t/t4034/bash/expect
 create mode 100644 t/t4034/bash/post
 create mode 100644 t/t4034/bash/pre

Range-diff against v5:
1:  40cffd3b4a ! 1:  464cb8a1eb userdiff: extend Bash pattern to cover more shell function forms
    @@ Commit message
           `x () echo hello`.
     
         Replacing the function body matching logic with `.*$`, ensures
    -    that everything on the function definition line is captured,
    -    aligning with other userdiff drivers and improving hunk headers in
    -    `git diff`.
    +    that everything on the function definition line is captured.
     
         Additionally, the word regex is refined to better recognize shell
         syntax, including additional parameter expansion operators and
    -    command-line options, improving syntax-aware diffs.
    +    command-line options.
     
         Signed-off-by: Moumita Dhar <dhar61595@gmail.com>
     
    @@ t/t4018/bash-bashism-style-multiline-function (new)
     +function RIGHT \
     +{    
     +    echo 'ChangeMe'
    ++}
    +
    + ## t/t4018/bash-hunk-header-complete-line-capture (new) ##
    +@@
    ++func() { # RIGHT
    ++
    ++    ChangeMe
     +}
     
      ## t/t4018/bash-posix-style-multiline-function (new) ##
    @@ t/t4034/bash/expect (new)
     +<BOLD>index 09ac008..60ba6a2 100644<RESET>
     +<BOLD>--- a/pre<RESET>
     +<BOLD>+++ b/post<RESET>
    -+<CYAN>@@ -1,25 +1,25 @@<RESET>
    ++<CYAN>@@ -1,33 +1,33 @@<RESET>
     +<RED>my_var<RESET><GREEN>new_var<RESET>=10
     +x=<RED>123<RESET><GREEN>456<RESET>
     +y=<RED>3.14<RESET><GREEN>2.71<RESET>
     +z=<RED>.5<RESET><GREEN>.75<RESET>
     +echo <RED>$USER<RESET><GREEN>$USERNAME<RESET>
     +${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>}
    -+if [ "<RED>$a<RESET><GREEN>$x<RESET>" == "<RED>$b<RESET><GREEN>$y<RESET>" ] || [ "<RED>$c<RESET><GREEN>$x<RESET>" != "<RED>$d<RESET><GREEN>$y<RESET>" ]; then echo "OK"; fi
    -+((<RED>a<RESET><GREEN>x<RESET>+=<RED>b<RESET><GREEN>y<RESET>))
    -+((<RED>a<RESET><GREEN>x<RESET>-=<RED>b<RESET><GREEN>y<RESET>))
    -+$((<RED>a<RESET><GREEN>x<RESET><<<RED>b<RESET><GREEN>y<RESET>))
    -+$((<RED>a<RESET><GREEN>x<RESET>>><RED>b<RESET><GREEN>y<RESET>))
    -+${<RED>a<RESET><GREEN>x<RESET>:-<RED>b<RESET><GREEN>y<RESET>}
    -+${<RED>a<RESET><GREEN>x<RESET>:=<RED>b<RESET><GREEN>y<RESET>}
    -+${<RED>a<RESET><GREEN>x<RESET>##*/}
    -+${<RED>a<RESET><GREEN>x<RESET>%.*}
    -+${<RED>a<RESET><GREEN>x<RESET>%%.*}
    -+${<RED>a<RESET><GREEN>x<RESET>^^}
    -+${<RED>a<RESET><GREEN>x<RESET>,}
    -+${<RED>a<RESET><GREEN>x<RESET>,,}
    -+${!<RED>a<RESET><GREEN>x<RESET>}
    -+${<RED>a<RESET><GREEN>x<RESET>[@]}
    -+${<RED>a<RESET><GREEN>x<RESET>:?error message}
    -+${<RED>a<RESET><GREEN>x<RESET>:2:3}
    ++((a<RED>+<RESET><GREEN>+=<RESET>b))
    ++((a<RED>*<RESET><GREEN>*=<RESET>b))
    ++((a<RED>/<RESET><GREEN>/=<RESET>b))
    ++((a<RED>%<RESET><GREEN>%=<RESET>b))
    ++((a<RED>|<RESET><GREEN>|=<RESET>b))
    ++((a<RED>^<RESET><GREEN>^=<RESET>b))
    ++((a<RED>=<RESET><GREEN>==<RESET>b))
    ++((a<RED>!<RESET><GREEN>!=<RESET>b))
    ++((a<RED><<RESET><GREEN><=<RESET>b))
    ++((a<RED>><RESET><GREEN>>=<RESET>b))
    ++$((a<RED><<RESET><GREEN><<<RESET>b))
    ++$((a<RED>><RESET><GREEN>>><RESET>b))
    ++$((a<RED>&<RESET><GREEN>&&<RESET>b))
    ++$((a<RED>|<RESET><GREEN>||<RESET>b))
    ++${a<RED>:<RESET><GREEN>:-<RESET>b}
    ++${a<RED>:<RESET><GREEN>:=<RESET>b}
    ++${a<RED>:<RESET><GREEN>:+<RESET>b}
    ++${a<RED>:<RESET><GREEN>:?<RESET>b}
    ++${a<RED>#<RESET><GREEN>##<RESET>*/}
    ++${a<RED>%<RESET><GREEN>%%<RESET>.*}
    ++${a<RED>^<RESET><GREEN>^^<RESET>}
    ++${a<RED>,<RESET><GREEN>,,<RESET>}
    ++${<GREEN>!<RESET>a}
    ++${a[<RED>*<RESET><GREEN>@<RESET>]}
    ++${a<RED>:2:3<RESET><GREEN>:4:6<RESET>}
     +ls <RED>-a<RESET><GREEN>-x<RESET>
     +ls <RED>--a<RESET><GREEN>--x<RESET>
     
    @@ t/t4034/bash/post (new)
     +z=.75
     +echo $USERNAME
     +${HOMEDIR}
    -+if [ "$x" == "$y" ] || [ "$x" != "$y" ]; then echo "OK"; fi
    -+((x+=y))
    -+((x-=y))
    -+$((x<<y))
    -+$((x>>y))
    -+${x:-y}
    -+${x:=y}
    -+${x##*/}
    -+${x%.*}
    -+${x%%.*}
    -+${x^^}
    -+${x,}
    -+${x,,}
    -+${!x}
    -+${x[@]}
    -+${x:?error message}
    -+${x:2:3}
    ++((a+=b))
    ++((a*=b))
    ++((a/=b))
    ++((a%=b))
    ++((a|=b))
    ++((a^=b))
    ++((a==b))
    ++((a!=b))
    ++((a<=b))
    ++((a>=b))
    ++$((a<<b))
    ++$((a>>b))
    ++$((a&&b))
    ++$((a||b))
    ++${a:-b}
    ++${a:=b}
    ++${a:+b}
    ++${a:?b}
    ++${a##*/}
    ++${a%%.*}
    ++${a^^}
    ++${a,,}
    ++${!a}
    ++${a[@]}
    ++${a:4:6}
     +ls -x
     +ls --x
     
    @@ t/t4034/bash/pre (new)
     +z=.5
     +echo $USER
     +${HOME}
    -+if [ "$a" == "$b" ] || [ "$c" != "$d" ]; then echo "OK"; fi
    -+((a+=b))
    -+((a-=b))
    -+$((a << b))
    -+$((a >> b))
    -+${a:-b}
    -+${a:=b}
    -+${a##*/}
    ++((a+b))
    ++((a*b))
    ++((a/b))
    ++((a%b))
    ++((a|b))
    ++((a^b))
    ++((a=b))
    ++((a!b))
    ++((a<b))
    ++((a>b))
    ++$((a<b))
    ++$((a>b))
    ++$((a&b))
    ++$((a|b))
    ++${a:b}
    ++${a:b}
    ++${a:b}
    ++${a:b}
    ++${a#*/}
     +${a%.*}
    -+${a%%.*}
    -+${a^^}
    ++${a^}
     +${a,}
    -+${a,,}
    -+${!a}
    -+${a[@]}
    -+${a:?error message}
    ++${a}
    ++${a[*]}
     +${a:2:3}
     +ls -a
     +ls --a
     
      ## userdiff.c ##
     @@ userdiff.c: PATTERNS("bash",
    + 	 "("
    + 	 "("
    + 	     /* POSIX identifier with mandatory parentheses */
    +-	     "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
    ++	     "([a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
    + 	 "|"
      	     /* Bashism identifier with optional parentheses */
    - 	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
    +-	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
    ++	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+)))"
      	 ")"
     -	 /* Optional whitespace */
     -	 "[ \t]*"
    @@ userdiff.c: PATTERNS("bash",
     +	 /* Assignment and arithmetic operators */
     +	 "|[-+*/%&|^!=<>]=?"
     +	 /* Additional parameter expansion operators */
    -+	 "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|/[a-zA-Z0-9_-]+|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?"
    ++	 "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?"
     +	 /* Command-line options (to avoid splitting -option) */
     +	 "|--?[a-zA-Z0-9_-]+"
     +	 /* Brackets and grouping symbols */
-- 
2.48.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v6 0/1] Added the newline after the test in t/4018
  2025-05-11 12:58       ` [PATCH v5 0/1] Added the closing ")" to make sure is not unbalanced and corrected the tests for word diff Moumita
                           ` (2 preceding siblings ...)
  2025-05-11 13:37         ` Moumita
@ 2025-05-11 14:11         ` Moumita
  2025-05-11 14:11           ` [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
  2025-05-16 14:45           ` [PATCH v7 0/1] Updated the word diff regex for Bash scripts Moumita
  3 siblings, 2 replies; 37+ messages in thread
From: Moumita @ 2025-05-11 14:11 UTC (permalink / raw)
  To: git; +Cc: Moumita, Johannes Sixt, Eric Sunshine, Junio C Hamano

Sorry I forgot to add the newline after the test in t/4018 again so I had to send the patch again.

Moumita Dhar (1):
  userdiff: extend Bash pattern to cover more shell function forms

 t/t4018/bash-bashism-style-multiline-function |  4 ++
 .../bash-hunk-header-complete-line-capture    |  4 ++
 t/t4018/bash-posix-style-multiline-function   |  4 ++
 .../bash-posix-style-single-command-function  |  3 ++
 t/t4034-diff-words.sh                         |  1 +
 t/t4034/bash/expect                           | 38 +++++++++++++++++++
 t/t4034/bash/post                             | 33 ++++++++++++++++
 t/t4034/bash/pre                              | 33 ++++++++++++++++
 userdiff.c                                    | 28 ++++++++++----
 9 files changed, 140 insertions(+), 8 deletions(-)
 create mode 100644 t/t4018/bash-bashism-style-multiline-function
 create mode 100644 t/t4018/bash-hunk-header-complete-line-capture
 create mode 100644 t/t4018/bash-posix-style-multiline-function
 create mode 100644 t/t4018/bash-posix-style-single-command-function
 create mode 100644 t/t4034/bash/expect
 create mode 100644 t/t4034/bash/post
 create mode 100644 t/t4034/bash/pre

Range-diff against v5:
1:  40cffd3b4a ! 1:  464cb8a1eb userdiff: extend Bash pattern to cover more shell function forms
    @@ Commit message
           `x () echo hello`.
     
         Replacing the function body matching logic with `.*$`, ensures
    -    that everything on the function definition line is captured,
    -    aligning with other userdiff drivers and improving hunk headers in
    -    `git diff`.
    +    that everything on the function definition line is captured.
     
         Additionally, the word regex is refined to better recognize shell
         syntax, including additional parameter expansion operators and
    -    command-line options, improving syntax-aware diffs.
    +    command-line options.
     
         Signed-off-by: Moumita Dhar <dhar61595@gmail.com>
     
    @@ t/t4018/bash-bashism-style-multiline-function (new)
     +function RIGHT \
     +{    
     +    echo 'ChangeMe'
    ++}
    +
    + ## t/t4018/bash-hunk-header-complete-line-capture (new) ##
    +@@
    ++func() { # RIGHT
    ++
    ++    ChangeMe
     +}
     
      ## t/t4018/bash-posix-style-multiline-function (new) ##
    @@ t/t4034/bash/expect (new)
     +<BOLD>index 09ac008..60ba6a2 100644<RESET>
     +<BOLD>--- a/pre<RESET>
     +<BOLD>+++ b/post<RESET>
    -+<CYAN>@@ -1,25 +1,25 @@<RESET>
    ++<CYAN>@@ -1,33 +1,33 @@<RESET>
     +<RED>my_var<RESET><GREEN>new_var<RESET>=10
     +x=<RED>123<RESET><GREEN>456<RESET>
     +y=<RED>3.14<RESET><GREEN>2.71<RESET>
     +z=<RED>.5<RESET><GREEN>.75<RESET>
     +echo <RED>$USER<RESET><GREEN>$USERNAME<RESET>
     +${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>}
    -+if [ "<RED>$a<RESET><GREEN>$x<RESET>" == "<RED>$b<RESET><GREEN>$y<RESET>" ] || [ "<RED>$c<RESET><GREEN>$x<RESET>" != "<RED>$d<RESET><GREEN>$y<RESET>" ]; then echo "OK"; fi
    -+((<RED>a<RESET><GREEN>x<RESET>+=<RED>b<RESET><GREEN>y<RESET>))
    -+((<RED>a<RESET><GREEN>x<RESET>-=<RED>b<RESET><GREEN>y<RESET>))
    -+$((<RED>a<RESET><GREEN>x<RESET><<<RED>b<RESET><GREEN>y<RESET>))
    -+$((<RED>a<RESET><GREEN>x<RESET>>><RED>b<RESET><GREEN>y<RESET>))
    -+${<RED>a<RESET><GREEN>x<RESET>:-<RED>b<RESET><GREEN>y<RESET>}
    -+${<RED>a<RESET><GREEN>x<RESET>:=<RED>b<RESET><GREEN>y<RESET>}
    -+${<RED>a<RESET><GREEN>x<RESET>##*/}
    -+${<RED>a<RESET><GREEN>x<RESET>%.*}
    -+${<RED>a<RESET><GREEN>x<RESET>%%.*}
    -+${<RED>a<RESET><GREEN>x<RESET>^^}
    -+${<RED>a<RESET><GREEN>x<RESET>,}
    -+${<RED>a<RESET><GREEN>x<RESET>,,}
    -+${!<RED>a<RESET><GREEN>x<RESET>}
    -+${<RED>a<RESET><GREEN>x<RESET>[@]}
    -+${<RED>a<RESET><GREEN>x<RESET>:?error message}
    -+${<RED>a<RESET><GREEN>x<RESET>:2:3}
    ++((a<RED>+<RESET><GREEN>+=<RESET>b))
    ++((a<RED>*<RESET><GREEN>*=<RESET>b))
    ++((a<RED>/<RESET><GREEN>/=<RESET>b))
    ++((a<RED>%<RESET><GREEN>%=<RESET>b))
    ++((a<RED>|<RESET><GREEN>|=<RESET>b))
    ++((a<RED>^<RESET><GREEN>^=<RESET>b))
    ++((a<RED>=<RESET><GREEN>==<RESET>b))
    ++((a<RED>!<RESET><GREEN>!=<RESET>b))
    ++((a<RED><<RESET><GREEN><=<RESET>b))
    ++((a<RED>><RESET><GREEN>>=<RESET>b))
    ++$((a<RED><<RESET><GREEN><<<RESET>b))
    ++$((a<RED>><RESET><GREEN>>><RESET>b))
    ++$((a<RED>&<RESET><GREEN>&&<RESET>b))
    ++$((a<RED>|<RESET><GREEN>||<RESET>b))
    ++${a<RED>:<RESET><GREEN>:-<RESET>b}
    ++${a<RED>:<RESET><GREEN>:=<RESET>b}
    ++${a<RED>:<RESET><GREEN>:+<RESET>b}
    ++${a<RED>:<RESET><GREEN>:?<RESET>b}
    ++${a<RED>#<RESET><GREEN>##<RESET>*/}
    ++${a<RED>%<RESET><GREEN>%%<RESET>.*}
    ++${a<RED>^<RESET><GREEN>^^<RESET>}
    ++${a<RED>,<RESET><GREEN>,,<RESET>}
    ++${<GREEN>!<RESET>a}
    ++${a[<RED>*<RESET><GREEN>@<RESET>]}
    ++${a<RED>:2:3<RESET><GREEN>:4:6<RESET>}
     +ls <RED>-a<RESET><GREEN>-x<RESET>
     +ls <RED>--a<RESET><GREEN>--x<RESET>
     
    @@ t/t4034/bash/post (new)
     +z=.75
     +echo $USERNAME
     +${HOMEDIR}
    -+if [ "$x" == "$y" ] || [ "$x" != "$y" ]; then echo "OK"; fi
    -+((x+=y))
    -+((x-=y))
    -+$((x<<y))
    -+$((x>>y))
    -+${x:-y}
    -+${x:=y}
    -+${x##*/}
    -+${x%.*}
    -+${x%%.*}
    -+${x^^}
    -+${x,}
    -+${x,,}
    -+${!x}
    -+${x[@]}
    -+${x:?error message}
    -+${x:2:3}
    ++((a+=b))
    ++((a*=b))
    ++((a/=b))
    ++((a%=b))
    ++((a|=b))
    ++((a^=b))
    ++((a==b))
    ++((a!=b))
    ++((a<=b))
    ++((a>=b))
    ++$((a<<b))
    ++$((a>>b))
    ++$((a&&b))
    ++$((a||b))
    ++${a:-b}
    ++${a:=b}
    ++${a:+b}
    ++${a:?b}
    ++${a##*/}
    ++${a%%.*}
    ++${a^^}
    ++${a,,}
    ++${!a}
    ++${a[@]}
    ++${a:4:6}
     +ls -x
     +ls --x
     
    @@ t/t4034/bash/pre (new)
     +z=.5
     +echo $USER
     +${HOME}
    -+if [ "$a" == "$b" ] || [ "$c" != "$d" ]; then echo "OK"; fi
    -+((a+=b))
    -+((a-=b))
    -+$((a << b))
    -+$((a >> b))
    -+${a:-b}
    -+${a:=b}
    -+${a##*/}
    ++((a+b))
    ++((a*b))
    ++((a/b))
    ++((a%b))
    ++((a|b))
    ++((a^b))
    ++((a=b))
    ++((a!b))
    ++((a<b))
    ++((a>b))
    ++$((a<b))
    ++$((a>b))
    ++$((a&b))
    ++$((a|b))
    ++${a:b}
    ++${a:b}
    ++${a:b}
    ++${a:b}
    ++${a#*/}
     +${a%.*}
    -+${a%%.*}
    -+${a^^}
    ++${a^}
     +${a,}
    -+${a,,}
    -+${!a}
    -+${a[@]}
    -+${a:?error message}
    ++${a}
    ++${a[*]}
     +${a:2:3}
     +ls -a
     +ls --a
     
      ## userdiff.c ##
     @@ userdiff.c: PATTERNS("bash",
    + 	 "("
    + 	 "("
    + 	     /* POSIX identifier with mandatory parentheses */
    +-	     "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
    ++	     "([a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
    + 	 "|"
      	     /* Bashism identifier with optional parentheses */
    - 	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
    +-	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
    ++	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+)))"
      	 ")"
     -	 /* Optional whitespace */
     -	 "[ \t]*"
    @@ userdiff.c: PATTERNS("bash",
     +	 /* Assignment and arithmetic operators */
     +	 "|[-+*/%&|^!=<>]=?"
     +	 /* Additional parameter expansion operators */
    -+	 "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|/[a-zA-Z0-9_-]+|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?"
    ++	 "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?"
     +	 /* Command-line options (to avoid splitting -option) */
     +	 "|--?[a-zA-Z0-9_-]+"
     +	 /* Brackets and grouping symbols */
-- 
2.48.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell function forms
  2025-05-11 14:11         ` [PATCH v6 0/1] Added the newline after the test in t/4018 Moumita
@ 2025-05-11 14:11           ` Moumita
  2025-05-13 18:50             ` Junio C Hamano
  2025-05-16  7:25             ` Johannes Sixt
  2025-05-16 14:45           ` [PATCH v7 0/1] Updated the word diff regex for Bash scripts Moumita
  1 sibling, 2 replies; 37+ messages in thread
From: Moumita @ 2025-05-11 14:11 UTC (permalink / raw)
  To: git; +Cc: Moumita Dhar, Johannes Sixt, Eric Sunshine, Junio C Hamano

From: Moumita Dhar <dhar61595@gmail.com>

The previous function regex required explicit matching of function
bodies using `{`, `(`, `((`, or `[[`, which caused several issues:

- It failed to capture valid functions where `{` was on the next line
  due to line continuation (`\`).
- It did not recognize functions with single  command body, such as
  `x () echo hello`.

Replacing the function body matching logic with `.*$`, ensures
that everything on the function definition line is captured.

Additionally, the word regex is refined to better recognize shell
syntax, including additional parameter expansion operators and
command-line options.

Signed-off-by: Moumita Dhar <dhar61595@gmail.com>
---
 t/t4018/bash-bashism-style-multiline-function |  4 ++
 .../bash-hunk-header-complete-line-capture    |  4 ++
 t/t4018/bash-posix-style-multiline-function   |  4 ++
 .../bash-posix-style-single-command-function  |  3 ++
 t/t4034-diff-words.sh                         |  1 +
 t/t4034/bash/expect                           | 38 +++++++++++++++++++
 t/t4034/bash/post                             | 33 ++++++++++++++++
 t/t4034/bash/pre                              | 33 ++++++++++++++++
 userdiff.c                                    | 28 ++++++++++----
 9 files changed, 140 insertions(+), 8 deletions(-)
 create mode 100644 t/t4018/bash-bashism-style-multiline-function
 create mode 100644 t/t4018/bash-hunk-header-complete-line-capture
 create mode 100644 t/t4018/bash-posix-style-multiline-function
 create mode 100644 t/t4018/bash-posix-style-single-command-function
 create mode 100644 t/t4034/bash/expect
 create mode 100644 t/t4034/bash/post
 create mode 100644 t/t4034/bash/pre

diff --git a/t/t4018/bash-bashism-style-multiline-function b/t/t4018/bash-bashism-style-multiline-function
new file mode 100644
index 0000000000..284d50dd99
--- /dev/null
+++ b/t/t4018/bash-bashism-style-multiline-function
@@ -0,0 +1,4 @@
+function RIGHT \
+{    
+    echo 'ChangeMe'
+}
diff --git a/t/t4018/bash-hunk-header-complete-line-capture b/t/t4018/bash-hunk-header-complete-line-capture
new file mode 100644
index 0000000000..b56942f322
--- /dev/null
+++ b/t/t4018/bash-hunk-header-complete-line-capture
@@ -0,0 +1,4 @@
+func() { # RIGHT
+
+    ChangeMe
+}
diff --git a/t/t4018/bash-posix-style-multiline-function b/t/t4018/bash-posix-style-multiline-function
new file mode 100644
index 0000000000..cc8727cbcd
--- /dev/null
+++ b/t/t4018/bash-posix-style-multiline-function
@@ -0,0 +1,4 @@
+RIGHT() \
+{
+    ChangeMe
+}
diff --git a/t/t4018/bash-posix-style-single-command-function b/t/t4018/bash-posix-style-single-command-function
new file mode 100644
index 0000000000..398ae1c5d2
--- /dev/null
+++ b/t/t4018/bash-posix-style-single-command-function
@@ -0,0 +1,3 @@
+RIGHT() echo "hello"
+
+    ChangeMe
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index f51d3557f1..0be647c2fb 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -320,6 +320,7 @@ test_expect_success 'unset default driver' '
 
 test_language_driver ada
 test_language_driver bibtex
+test_language_driver bash
 test_language_driver cpp
 test_language_driver csharp
 test_language_driver css
diff --git a/t/t4034/bash/expect b/t/t4034/bash/expect
new file mode 100644
index 0000000000..17755e455f
--- /dev/null
+++ b/t/t4034/bash/expect
@@ -0,0 +1,38 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 09ac008..60ba6a2 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,33 +1,33 @@<RESET>
+<RED>my_var<RESET><GREEN>new_var<RESET>=10
+x=<RED>123<RESET><GREEN>456<RESET>
+y=<RED>3.14<RESET><GREEN>2.71<RESET>
+z=<RED>.5<RESET><GREEN>.75<RESET>
+echo <RED>$USER<RESET><GREEN>$USERNAME<RESET>
+${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>}
+((a<RED>+<RESET><GREEN>+=<RESET>b))
+((a<RED>*<RESET><GREEN>*=<RESET>b))
+((a<RED>/<RESET><GREEN>/=<RESET>b))
+((a<RED>%<RESET><GREEN>%=<RESET>b))
+((a<RED>|<RESET><GREEN>|=<RESET>b))
+((a<RED>^<RESET><GREEN>^=<RESET>b))
+((a<RED>=<RESET><GREEN>==<RESET>b))
+((a<RED>!<RESET><GREEN>!=<RESET>b))
+((a<RED><<RESET><GREEN><=<RESET>b))
+((a<RED>><RESET><GREEN>>=<RESET>b))
+$((a<RED><<RESET><GREEN><<<RESET>b))
+$((a<RED>><RESET><GREEN>>><RESET>b))
+$((a<RED>&<RESET><GREEN>&&<RESET>b))
+$((a<RED>|<RESET><GREEN>||<RESET>b))
+${a<RED>:<RESET><GREEN>:-<RESET>b}
+${a<RED>:<RESET><GREEN>:=<RESET>b}
+${a<RED>:<RESET><GREEN>:+<RESET>b}
+${a<RED>:<RESET><GREEN>:?<RESET>b}
+${a<RED>#<RESET><GREEN>##<RESET>*/}
+${a<RED>%<RESET><GREEN>%%<RESET>.*}
+${a<RED>^<RESET><GREEN>^^<RESET>}
+${a<RED>,<RESET><GREEN>,,<RESET>}
+${<GREEN>!<RESET>a}
+${a[<RED>*<RESET><GREEN>@<RESET>]}
+${a<RED>:2:3<RESET><GREEN>:4:6<RESET>}
+ls <RED>-a<RESET><GREEN>-x<RESET>
+ls <RED>--a<RESET><GREEN>--x<RESET>
diff --git a/t/t4034/bash/post b/t/t4034/bash/post
new file mode 100644
index 0000000000..669e218c30
--- /dev/null
+++ b/t/t4034/bash/post
@@ -0,0 +1,33 @@
+new_var=10
+x=456
+y=2.71
+z=.75
+echo $USERNAME
+${HOMEDIR}
+((a+=b))
+((a*=b))
+((a/=b))
+((a%=b))
+((a|=b))
+((a^=b))
+((a==b))
+((a!=b))
+((a<=b))
+((a>=b))
+$((a<<b))
+$((a>>b))
+$((a&&b))
+$((a||b))
+${a:-b}
+${a:=b}
+${a:+b}
+${a:?b}
+${a##*/}
+${a%%.*}
+${a^^}
+${a,,}
+${!a}
+${a[@]}
+${a:4:6}
+ls -x
+ls --x
diff --git a/t/t4034/bash/pre b/t/t4034/bash/pre
new file mode 100644
index 0000000000..ada8470bac
--- /dev/null
+++ b/t/t4034/bash/pre
@@ -0,0 +1,33 @@
+my_var=10
+x=123
+y=3.14
+z=.5
+echo $USER
+${HOME}
+((a+b))
+((a*b))
+((a/b))
+((a%b))
+((a|b))
+((a^b))
+((a=b))
+((a!b))
+((a<b))
+((a>b))
+$((a<b))
+$((a>b))
+$((a&b))
+$((a|b))
+${a:b}
+${a:b}
+${a:b}
+${a:b}
+${a#*/}
+${a%.*}
+${a^}
+${a,}
+${a}
+${a[*]}
+${a:2:3}
+ls -a
+ls --a
diff --git a/userdiff.c b/userdiff.c
index 340c4eb4f7..655c8fe0b1 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -59,20 +59,32 @@ PATTERNS("bash",
 	 "("
 	 "("
 	     /* POSIX identifier with mandatory parentheses */
-	     "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
+	     "([a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
 	 "|"
 	     /* Bashism identifier with optional parentheses */
-	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
+	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+)))"
 	 ")"
-	 /* Optional whitespace */
-	 "[ \t]*"
-	 /* Compound command starting with `{`, `(`, `((` or `[[` */
-	 "(\\{|\\(\\(?|\\[\\[)"
+	 /* Everything after the function header is captured  */
+	 ".*$"
 	 /* End of captured text */
 	 ")",
 	 /* -- */
-	 /* Characters not in the default $IFS value */
-	 "[^ \t]+"),
+	 /* Identifiers: variable and function names */
+	  "[a-zA-Z_][a-zA-Z0-9_]*"
+	 /* Numeric constants: integers and decimals */
+	  "|[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+"
+	 /* Shell variables: $VAR, ${VAR} */
+	  "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{"
+	  /* Logical and comparison operators */
+	 "|\\|\\||&&|<<|>>|==|!=|<=|>="
+	 /* Assignment and arithmetic operators */
+	 "|[-+*/%&|^!=<>]=?"
+	 /* Additional parameter expansion operators */
+	 "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?"
+	 /* Command-line options (to avoid splitting -option) */
+	 "|--?[a-zA-Z0-9_-]+"
+	 /* Brackets and grouping symbols */
+	 "|\\(|\\)|\\{|\\}|\\[|\\]"),
 PATTERNS("bibtex",
 	 "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
 	 /* -- */
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell function forms
  2025-05-11 14:11           ` [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
@ 2025-05-13 18:50             ` Junio C Hamano
  2025-05-14  6:33               ` MOUMITA DHAR
  2025-05-16  7:25             ` Johannes Sixt
  1 sibling, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2025-05-13 18:50 UTC (permalink / raw)
  To: Moumita; +Cc: git, Johannes Sixt, Eric Sunshine

Moumita <dhar61595@gmail.com> writes:

> diff --git a/t/t4018/bash-posix-style-multiline-function b/t/t4018/bash-posix-style-multiline-function
> new file mode 100644
> index 0000000000..cc8727cbcd
> --- /dev/null
> +++ b/t/t4018/bash-posix-style-multiline-function
> @@ -0,0 +1,4 @@
> +RIGHT() \
> +{
> +    ChangeMe
> +}

Not a review, but I am curious what this test is about.  Is it to
ensure that the pattern does not get confused with the backslash
that does not have to be (but it would not hurt to have one) there?

IOW, does

	RIGHT()
	{
		ChangeMe
	}

get processed just fine, and the above is to check the corner case
where an unusual "\" on the same line as RIGHT does not break the
funcline identification?

Thanks.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell function forms
  2025-05-13 18:50             ` Junio C Hamano
@ 2025-05-14  6:33               ` MOUMITA DHAR
  0 siblings, 0 replies; 37+ messages in thread
From: MOUMITA DHAR @ 2025-05-14  6:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Johannes Sixt, Eric Sunshine

On Wed, 14 May 2025 at 00:20, Junio C Hamano <gitster@pobox.com> wrote:
>
> Moumita <dhar61595@gmail.com> writes:
>
> > diff --git a/t/t4018/bash-posix-style-multiline-function b/t/t4018/bash-posix-style-multiline-function
> > new file mode 100644
> > index 0000000000..cc8727cbcd
> > --- /dev/null
> > +++ b/t/t4018/bash-posix-style-multiline-function
> > @@ -0,0 +1,4 @@
> > +RIGHT() \
> > +{
> > +    ChangeMe
> > +}
>
> Not a review, but I am curious what this test is about.  Is it to
> ensure that the pattern does not get confused with the backslash
> that does not have to be (but it would not hurt to have one) there?
>
> IOW, does
>
>         RIGHT()
>         {
>                 ChangeMe
>         }
>
> get processed just fine, and the above is to check the corner case
> where an unusual "\" on the same line as RIGHT does not break the
> funcline identification?
>
> Thanks.

I realise this test is redundant , The goal of my test was to ensure
that the entire function header line is correctly captured, even when
the opening brace is placed on the next line using a backslash.
However, I now realize that the test case Johannes mentioned already
covers this behavior. So rather than duplicating that, I think it
would be more useful to add a similar test using the alternative,
Bashism-style syntax:-

function myfunc # RIGHT
{
    echo 'ChangeMe'
}

Thank You
Moumita

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell function forms
  2025-05-11 14:11           ` [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
  2025-05-13 18:50             ` Junio C Hamano
@ 2025-05-16  7:25             ` Johannes Sixt
  2025-05-17 13:09               ` Junio C Hamano
  1 sibling, 1 reply; 37+ messages in thread
From: Johannes Sixt @ 2025-05-16  7:25 UTC (permalink / raw)
  To: Moumita; +Cc: Eric Sunshine, Junio C Hamano, git

Am 11.05.25 um 16:11 schrieb Moumita:
> From: Moumita Dhar <dhar61595@gmail.com>
> 
> The previous function regex required explicit matching of function
> bodies using `{`, `(`, `((`, or `[[`, which caused several issues:
> 
> - It failed to capture valid functions where `{` was on the next line
>   due to line continuation (`\`).
> - It did not recognize functions with single  command body, such as
>   `x () echo hello`.
> 
> Replacing the function body matching logic with `.*$`, ensures
> that everything on the function definition line is captured.
> 
> Additionally, the word regex is refined to better recognize shell
> syntax, including additional parameter expansion operators and
> command-line options.
> 
> Signed-off-by: Moumita Dhar <dhar61595@gmail.com>
> ---

Function patterns are looking good now. Let's focus on the worddiff
pattern now.

> diff --git a/t/t4034/bash/expect b/t/t4034/bash/expect
> new file mode 100644
> index 0000000000..17755e455f
> --- /dev/null
> +++ b/t/t4034/bash/expect
> @@ -0,0 +1,38 @@
> +<BOLD>diff --git a/pre b/post<RESET>
> +<BOLD>index 09ac008..60ba6a2 100644<RESET>
> +<BOLD>--- a/pre<RESET>
> +<BOLD>+++ b/post<RESET>
> +<CYAN>@@ -1,33 +1,33 @@<RESET>
> +<RED>my_var<RESET><GREEN>new_var<RESET>=10
> +x=<RED>123<RESET><GREEN>456<RESET>
> +y=<RED>3.14<RESET><GREEN>2.71<RESET>
> +z=<RED>.5<RESET><GREEN>.75<RESET>

When do decimal numbers occur in shell scripts? Wouldn't it be more
often the case that a fullstop is part of a regular expression or a file
name or version number that happens to be surrounded by numbers? In that
case, we would prefer to capture the digit sequences as separate words.

> +echo <RED>$USER<RESET><GREEN>$USERNAME<RESET>
> +${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>}
> +((a<RED>+<RESET><GREEN>+=<RESET>b))
> +((a<RED>*<RESET><GREEN>*=<RESET>b))
> +((a<RED>/<RESET><GREEN>/=<RESET>b))
> +((a<RED>%<RESET><GREEN>%=<RESET>b))
> +((a<RED>|<RESET><GREEN>|=<RESET>b))
> +((a<RED>^<RESET><GREEN>^=<RESET>b))
> +((a<RED>=<RESET><GREEN>==<RESET>b))
> +((a<RED>!<RESET><GREEN>!=<RESET>b))
> +((a<RED><<RESET><GREEN><=<RESET>b))
> +((a<RED>><RESET><GREEN>>=<RESET>b))
> +$((a<RED><<RESET><GREEN><<<RESET>b))
> +$((a<RED>><RESET><GREEN>>><RESET>b))
> +$((a<RED>&<RESET><GREEN>&&<RESET>b))
> +$((a<RED>|<RESET><GREEN>||<RESET>b))
> +${a<RED>:<RESET><GREEN>:-<RESET>b}
> +${a<RED>:<RESET><GREEN>:=<RESET>b}
> +${a<RED>:<RESET><GREEN>:+<RESET>b}
> +${a<RED>:<RESET><GREEN>:?<RESET>b}
> +${a<RED>#<RESET><GREEN>##<RESET>*/}
> +${a<RED>%<RESET><GREEN>%%<RESET>.*}
> +${a<RED>^<RESET><GREEN>^^<RESET>}
> +${a<RED>,<RESET><GREEN>,,<RESET>}
> +${<GREEN>!<RESET>a}
> +${a[<RED>*<RESET><GREEN>@<RESET>]}

All good tests!

All of | || & && would more often than not occur in commands rather than
expressions. But the way these operators are tested here is fine, too.

> +${a<RED>:2:3<RESET><GREEN>:4:6<RESET>}

I am surprised to see :2:3 as a single token. It's bash's substring
expansion. It would not be matched anyway in practice because these
numbers are often variables or more complicated expressions, I would
think, not integer constants.

> +ls <RED>-a<RESET><GREEN>-x<RESET>
> +ls <RED>--a<RESET><GREEN>--x<RESET>

Can we extend the second of these two to have a multi-letter option?

> diff --git a/userdiff.c b/userdiff.c
> index 340c4eb4f7..655c8fe0b1 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -59,20 +59,32 @@ PATTERNS("bash",
>  	 "("
>  	 "("
>  	     /* POSIX identifier with mandatory parentheses */
> -	     "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
> +	     "([a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
>  	 "|"
>  	     /* Bashism identifier with optional parentheses */
> -	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
> +	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+)))"
>  	 ")"
> -	 /* Optional whitespace */
> -	 "[ \t]*"
> -	 /* Compound command starting with `{`, `(`, `((` or `[[` */
> -	 "(\\{|\\(\\(?|\\[\\[)"
> +	 /* Everything after the function header is captured  */
> +	 ".*$"
>  	 /* End of captured text */
>  	 ")",
>  	 /* -- */

In the operator patterns that follow, it is not necessary to match
operators that are just a single punctuation character. They are matched
automatically, IIRC (but please correct me if I am wrong). Only
multi-character operators should be matched. That means that in all
patterns where one character is optional due to the '?' in a
two-character-operator, you should drop the '?'.

> -	 /* Characters not in the default $IFS value */
> -	 "[^ \t]+"),
> +	 /* Identifiers: variable and function names */
> +	  "[a-zA-Z_][a-zA-Z0-9_]*"

OK. But see below.

> +	 /* Numeric constants: integers and decimals */
> +	  "|[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+"

I would recommend not to match decimal numbers and instead have them as
three words: integer-fullstop-integer.

Generally, I advise to not include a sign in the word pattern of a
number, because the sign is usually its own operator. But we do have
numeric options, i.e., integers with a leading dash. But see below.

> +	 /* Shell variables: $VAR, ${VAR} */
> +	  "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{"

Shell variables do not have to begin with a letter. We have $0, $1 etc.
Just do not require the first character after the $ to be a letter.

> +	  /* Logical and comparison operators */
> +	 "|\\|\\||&&|<<|>>|==|!=|<=|>="

I'd call || and && "Command list separators" in the context of shell
language.

> +	 /* Assignment and arithmetic operators */
> +	 "|[-+*/%&|^!=<>]=?"

These would then be only compund assignment operators. These patterns
also match <= >= == != again. How about having all those (from this line
and from the line above) that end in '=' into a single case?

> +	 /* Additional parameter expansion operators */
> +	 "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?"

Remove all single-character operators.

Also, the :digits:digits word is strange. What is your rationale that
you included it?

> +	 /* Command-line options (to avoid splitting -option) */
> +	 "|--?[a-zA-Z0-9_-]+"

This pattern is a beast for the matcher.Would it match the text "-----"?
Yes, in more than one way! Would it match "--"? Yes! Would it match
"-1000"? Yes!

None of these are a problem, but actually desired. But it would not need
the optional '-' in the second position. The '-' included in the range
would match "--foo" just as well.

This patter also matches negative numbers. So, here's the idea: Just
turn this into

	"|[-a-zA-Z0-9_]+"

And we have a pattern that matches identifiers, function names,
integers, short options, long options, numeric options a.k.a. negative
integers.

It might match more and I hope that I do not miss a case that we do
*not* want to match. If you do, let me know. Until then I'd leave it at
that and fix it up when we encounter an important case that is
mishandled by this pattern.

> +	 /* Brackets and grouping symbols */
> +	 "|\\(|\\)|\\{|\\}|\\[|\\]"),

OK.

>  PATTERNS("bibtex",
>  	 "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
>  	 /* -- */

-- Hannes


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v7 0/1] Updated the word diff regex for Bash scripts
  2025-05-11 14:11         ` [PATCH v6 0/1] Added the newline after the test in t/4018 Moumita
  2025-05-11 14:11           ` [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
@ 2025-05-16 14:45           ` Moumita
  2025-05-16 14:45             ` [PATCH v7 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
  1 sibling, 1 reply; 37+ messages in thread
From: Moumita @ 2025-05-16 14:45 UTC (permalink / raw)
  To: git; +Cc: Moumita, Johannes Sixt, Eric Sunshine, Junio C Hamano

I cleaned up the parameter expansion regexes to avoid matching single-character operators like `:` or `#` that are already tokenized. 
I also removed the `:[0-9]+(:[0-9]+)?` pattern, which didn’t have a clear use case.
Variable matching now supports `$1`, `$2`, etc., not just `$foo`, by relaxing 
the first character requirement after the `$`. Also I removed the '?' after the second 
character of double character operators.
There are two test files in t/4018 which test the capturing of the complete line in the hunk header in the cases of both
posix style functions and bash style functions.



Moumita Dhar (1):
  userdiff: extend Bash pattern to cover more shell function forms

 .../bash-bashism-style-complete-line-capture  |  4 +++
 .../bash-posix-style-complete-line-capture    |  4 +++
 .../bash-posix-style-single-command-function  |  3 ++
 t/t4034-diff-words.sh                         |  1 +
 t/t4034/bash/expect                           | 36 +++++++++++++++++++
 t/t4034/bash/post                             | 31 ++++++++++++++++
 t/t4034/bash/pre                              | 31 ++++++++++++++++
 userdiff.c                                    | 26 +++++++++-----
 8 files changed, 128 insertions(+), 8 deletions(-)
 create mode 100644 t/t4018/bash-bashism-style-complete-line-capture
 create mode 100644 t/t4018/bash-posix-style-complete-line-capture
 create mode 100644 t/t4018/bash-posix-style-single-command-function
 create mode 100644 t/t4034/bash/expect
 create mode 100644 t/t4034/bash/post
 create mode 100644 t/t4034/bash/pre

Range-diff against v6:
1:  464cb8a1eb ! 1:  314ac45fa2 userdiff: extend Bash pattern to cover more shell function forms
    @@ Commit message
     
         Signed-off-by: Moumita Dhar <dhar61595@gmail.com>
     
    - ## t/t4018/bash-bashism-style-multiline-function (new) ##
    + ## t/t4018/bash-bashism-style-complete-line-capture (new) ##
     @@
    -+function RIGHT \
    -+{    
    ++function myfunc # RIGHT
    ++{
     +    echo 'ChangeMe'
     +}
     
    - ## t/t4018/bash-hunk-header-complete-line-capture (new) ##
    + ## t/t4018/bash-posix-style-complete-line-capture (new) ##
     @@
     +func() { # RIGHT
     +
     +    ChangeMe
    -+}
    -
    - ## t/t4018/bash-posix-style-multiline-function (new) ##
    -@@
    -+RIGHT() \
    -+{
    -+    ChangeMe
     +}
     
      ## t/t4018/bash-posix-style-single-command-function (new) ##
    @@ t/t4034/bash/expect (new)
     +<BOLD>index 09ac008..60ba6a2 100644<RESET>
     +<BOLD>--- a/pre<RESET>
     +<BOLD>+++ b/post<RESET>
    -+<CYAN>@@ -1,33 +1,33 @@<RESET>
    ++<CYAN>@@ -1,31 +1,31 @@<RESET>
     +<RED>my_var<RESET><GREEN>new_var<RESET>=10
     +x=<RED>123<RESET><GREEN>456<RESET>
    -+y=<RED>3.14<RESET><GREEN>2.71<RESET>
    -+z=<RED>.5<RESET><GREEN>.75<RESET>
    ++echo <RED>$1<RESET><GREEN>$2<RESET>
     +echo <RED>$USER<RESET><GREEN>$USERNAME<RESET>
     +${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>}
     +((a<RED>+<RESET><GREEN>+=<RESET>b))
    @@ t/t4034/bash/expect (new)
     +${a<RED>,<RESET><GREEN>,,<RESET>}
     +${<GREEN>!<RESET>a}
     +${a[<RED>*<RESET><GREEN>@<RESET>]}
    -+${a<RED>:2:3<RESET><GREEN>:4:6<RESET>}
     +ls <RED>-a<RESET><GREEN>-x<RESET>
    -+ls <RED>--a<RESET><GREEN>--x<RESET>
    ++ls <RED>--all<RESET><GREEN>--color<RESET>
     
      ## t/t4034/bash/post (new) ##
     @@
     +new_var=10
     +x=456
    -+y=2.71
    -+z=.75
    ++echo $2
     +echo $USERNAME
     +${HOMEDIR}
     +((a+=b))
    @@ t/t4034/bash/post (new)
     +${a,,}
     +${!a}
     +${a[@]}
    -+${a:4:6}
     +ls -x
    -+ls --x
    ++ls --color
     
      ## t/t4034/bash/pre (new) ##
     @@
     +my_var=10
     +x=123
    -+y=3.14
    -+z=.5
    ++echo $1
     +echo $USER
     +${HOME}
     +((a+b))
    @@ t/t4034/bash/pre (new)
     +${a,}
     +${a}
     +${a[*]}
    -+${a:2:3}
     +ls -a
    -+ls --a
    ++ls --all
     
      ## userdiff.c ##
     @@ userdiff.c: PATTERNS("bash",
    @@ userdiff.c: PATTERNS("bash",
     -	 "[^ \t]+"),
     +	 /* Identifiers: variable and function names */
     +	  "[a-zA-Z_][a-zA-Z0-9_]*"
    -+	 /* Numeric constants: integers and decimals */
    -+	  "|[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+"
     +	 /* Shell variables: $VAR, ${VAR} */
    -+	  "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{"
    -+	  /* Logical and comparison operators */
    -+	 "|\\|\\||&&|<<|>>|==|!=|<=|>="
    -+	 /* Assignment and arithmetic operators */
    -+	 "|[-+*/%&|^!=<>]=?"
    ++	  "|\\$[a-zA-Z0-9_]+|\\$\\{"
    ++	  /*Command list separators and redirection operators  */
    ++	 "|\\|\\||&&|<<|>>"
    ++	 /* Operators ending in '=' (comparison + compound assignment) */
    ++	 "|==|!=|<=|>=|[-+*/%&|^]="
     +	 /* Additional parameter expansion operators */
    -+	 "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?"
    ++	 "|:=|:-|:\\+|:\\?|##|%%|\\^\\^|,,"
     +	 /* Command-line options (to avoid splitting -option) */
    -+	 "|--?[a-zA-Z0-9_-]+"
    ++	 "|[-a-zA-Z0-9_]+"
     +	 /* Brackets and grouping symbols */
     +	 "|\\(|\\)|\\{|\\}|\\[|\\]"),
      PATTERNS("bibtex",
-- 
2.48.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v7 1/1] userdiff: extend Bash pattern to cover more shell function forms
  2025-05-16 14:45           ` [PATCH v7 0/1] Updated the word diff regex for Bash scripts Moumita
@ 2025-05-16 14:45             ` Moumita
  2025-05-16 17:45               ` Johannes Sixt
  0 siblings, 1 reply; 37+ messages in thread
From: Moumita @ 2025-05-16 14:45 UTC (permalink / raw)
  To: git; +Cc: Moumita Dhar, Johannes Sixt, Eric Sunshine, Junio C Hamano

From: Moumita Dhar <dhar61595@gmail.com>

The previous function regex required explicit matching of function
bodies using `{`, `(`, `((`, or `[[`, which caused several issues:

- It failed to capture valid functions where `{` was on the next line
  due to line continuation (`\`).
- It did not recognize functions with single  command body, such as
  `x () echo hello`.

Replacing the function body matching logic with `.*$`, ensures
that everything on the function definition line is captured.

Additionally, the word regex is refined to better recognize shell
syntax, including additional parameter expansion operators and
command-line options.

Signed-off-by: Moumita Dhar <dhar61595@gmail.com>
---
 .../bash-bashism-style-complete-line-capture  |  4 +++
 .../bash-posix-style-complete-line-capture    |  4 +++
 .../bash-posix-style-single-command-function  |  3 ++
 t/t4034-diff-words.sh                         |  1 +
 t/t4034/bash/expect                           | 36 +++++++++++++++++++
 t/t4034/bash/post                             | 31 ++++++++++++++++
 t/t4034/bash/pre                              | 31 ++++++++++++++++
 userdiff.c                                    | 26 +++++++++-----
 8 files changed, 128 insertions(+), 8 deletions(-)
 create mode 100644 t/t4018/bash-bashism-style-complete-line-capture
 create mode 100644 t/t4018/bash-posix-style-complete-line-capture
 create mode 100644 t/t4018/bash-posix-style-single-command-function
 create mode 100644 t/t4034/bash/expect
 create mode 100644 t/t4034/bash/post
 create mode 100644 t/t4034/bash/pre

diff --git a/t/t4018/bash-bashism-style-complete-line-capture b/t/t4018/bash-bashism-style-complete-line-capture
new file mode 100644
index 0000000000..070b979fa6
--- /dev/null
+++ b/t/t4018/bash-bashism-style-complete-line-capture
@@ -0,0 +1,4 @@
+function myfunc # RIGHT
+{
+    echo 'ChangeMe'
+}
diff --git a/t/t4018/bash-posix-style-complete-line-capture b/t/t4018/bash-posix-style-complete-line-capture
new file mode 100644
index 0000000000..b56942f322
--- /dev/null
+++ b/t/t4018/bash-posix-style-complete-line-capture
@@ -0,0 +1,4 @@
+func() { # RIGHT
+
+    ChangeMe
+}
diff --git a/t/t4018/bash-posix-style-single-command-function b/t/t4018/bash-posix-style-single-command-function
new file mode 100644
index 0000000000..398ae1c5d2
--- /dev/null
+++ b/t/t4018/bash-posix-style-single-command-function
@@ -0,0 +1,3 @@
+RIGHT() echo "hello"
+
+    ChangeMe
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index f51d3557f1..0be647c2fb 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -320,6 +320,7 @@ test_expect_success 'unset default driver' '
 
 test_language_driver ada
 test_language_driver bibtex
+test_language_driver bash
 test_language_driver cpp
 test_language_driver csharp
 test_language_driver css
diff --git a/t/t4034/bash/expect b/t/t4034/bash/expect
new file mode 100644
index 0000000000..1864ab25dc
--- /dev/null
+++ b/t/t4034/bash/expect
@@ -0,0 +1,36 @@
+<BOLD>diff --git a/pre b/post<RESET>
+<BOLD>index 09ac008..60ba6a2 100644<RESET>
+<BOLD>--- a/pre<RESET>
+<BOLD>+++ b/post<RESET>
+<CYAN>@@ -1,31 +1,31 @@<RESET>
+<RED>my_var<RESET><GREEN>new_var<RESET>=10
+x=<RED>123<RESET><GREEN>456<RESET>
+echo <RED>$1<RESET><GREEN>$2<RESET>
+echo <RED>$USER<RESET><GREEN>$USERNAME<RESET>
+${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>}
+((a<RED>+<RESET><GREEN>+=<RESET>b))
+((a<RED>*<RESET><GREEN>*=<RESET>b))
+((a<RED>/<RESET><GREEN>/=<RESET>b))
+((a<RED>%<RESET><GREEN>%=<RESET>b))
+((a<RED>|<RESET><GREEN>|=<RESET>b))
+((a<RED>^<RESET><GREEN>^=<RESET>b))
+((a<RED>=<RESET><GREEN>==<RESET>b))
+((a<RED>!<RESET><GREEN>!=<RESET>b))
+((a<RED><<RESET><GREEN><=<RESET>b))
+((a<RED>><RESET><GREEN>>=<RESET>b))
+$((a<RED><<RESET><GREEN><<<RESET>b))
+$((a<RED>><RESET><GREEN>>><RESET>b))
+$((a<RED>&<RESET><GREEN>&&<RESET>b))
+$((a<RED>|<RESET><GREEN>||<RESET>b))
+${a<RED>:<RESET><GREEN>:-<RESET>b}
+${a<RED>:<RESET><GREEN>:=<RESET>b}
+${a<RED>:<RESET><GREEN>:+<RESET>b}
+${a<RED>:<RESET><GREEN>:?<RESET>b}
+${a<RED>#<RESET><GREEN>##<RESET>*/}
+${a<RED>%<RESET><GREEN>%%<RESET>.*}
+${a<RED>^<RESET><GREEN>^^<RESET>}
+${a<RED>,<RESET><GREEN>,,<RESET>}
+${<GREEN>!<RESET>a}
+${a[<RED>*<RESET><GREEN>@<RESET>]}
+ls <RED>-a<RESET><GREEN>-x<RESET>
+ls <RED>--all<RESET><GREEN>--color<RESET>
diff --git a/t/t4034/bash/post b/t/t4034/bash/post
new file mode 100644
index 0000000000..2bbee8936d
--- /dev/null
+++ b/t/t4034/bash/post
@@ -0,0 +1,31 @@
+new_var=10
+x=456
+echo $2
+echo $USERNAME
+${HOMEDIR}
+((a+=b))
+((a*=b))
+((a/=b))
+((a%=b))
+((a|=b))
+((a^=b))
+((a==b))
+((a!=b))
+((a<=b))
+((a>=b))
+$((a<<b))
+$((a>>b))
+$((a&&b))
+$((a||b))
+${a:-b}
+${a:=b}
+${a:+b}
+${a:?b}
+${a##*/}
+${a%%.*}
+${a^^}
+${a,,}
+${!a}
+${a[@]}
+ls -x
+ls --color
diff --git a/t/t4034/bash/pre b/t/t4034/bash/pre
new file mode 100644
index 0000000000..8d22039c40
--- /dev/null
+++ b/t/t4034/bash/pre
@@ -0,0 +1,31 @@
+my_var=10
+x=123
+echo $1
+echo $USER
+${HOME}
+((a+b))
+((a*b))
+((a/b))
+((a%b))
+((a|b))
+((a^b))
+((a=b))
+((a!b))
+((a<b))
+((a>b))
+$((a<b))
+$((a>b))
+$((a&b))
+$((a|b))
+${a:b}
+${a:b}
+${a:b}
+${a:b}
+${a#*/}
+${a%.*}
+${a^}
+${a,}
+${a}
+${a[*]}
+ls -a
+ls --all
diff --git a/userdiff.c b/userdiff.c
index 340c4eb4f7..7b18cbb669 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -59,20 +59,30 @@ PATTERNS("bash",
 	 "("
 	 "("
 	     /* POSIX identifier with mandatory parentheses */
-	     "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
+	     "([a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
 	 "|"
 	     /* Bashism identifier with optional parentheses */
-	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
+	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+)))"
 	 ")"
-	 /* Optional whitespace */
-	 "[ \t]*"
-	 /* Compound command starting with `{`, `(`, `((` or `[[` */
-	 "(\\{|\\(\\(?|\\[\\[)"
+	 /* Everything after the function header is captured  */
+	 ".*$"
 	 /* End of captured text */
 	 ")",
 	 /* -- */
-	 /* Characters not in the default $IFS value */
-	 "[^ \t]+"),
+	 /* Identifiers: variable and function names */
+	  "[a-zA-Z_][a-zA-Z0-9_]*"
+	 /* Shell variables: $VAR, ${VAR} */
+	  "|\\$[a-zA-Z0-9_]+|\\$\\{"
+	  /*Command list separators and redirection operators  */
+	 "|\\|\\||&&|<<|>>"
+	 /* Operators ending in '=' (comparison + compound assignment) */
+	 "|==|!=|<=|>=|[-+*/%&|^]="
+	 /* Additional parameter expansion operators */
+	 "|:=|:-|:\\+|:\\?|##|%%|\\^\\^|,,"
+	 /* Command-line options (to avoid splitting -option) */
+	 "|[-a-zA-Z0-9_]+"
+	 /* Brackets and grouping symbols */
+	 "|\\(|\\)|\\{|\\}|\\[|\\]"),
 PATTERNS("bibtex",
 	 "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
 	 /* -- */
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 1/1] userdiff: extend Bash pattern to cover more shell function forms
  2025-05-16 14:45             ` [PATCH v7 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
@ 2025-05-16 17:45               ` Johannes Sixt
  2025-05-16 21:56                 ` Junio C Hamano
  0 siblings, 1 reply; 37+ messages in thread
From: Johannes Sixt @ 2025-05-16 17:45 UTC (permalink / raw)
  To: Moumita; +Cc: Eric Sunshine, Junio C Hamano, git

All my comments have been addressed and this round looks good. Thank you!

Acked-by: Johannes Sixt <j6t@kdbg.org>

-- Hannes


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 1/1] userdiff: extend Bash pattern to cover more shell function forms
  2025-05-16 17:45               ` Johannes Sixt
@ 2025-05-16 21:56                 ` Junio C Hamano
  0 siblings, 0 replies; 37+ messages in thread
From: Junio C Hamano @ 2025-05-16 21:56 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Moumita, Eric Sunshine, git

Johannes Sixt <j6t@kdbg.org> writes:

> All my comments have been addressed and this round looks good. Thank you!
>
> Acked-by: Johannes Sixt <j6t@kdbg.org>
>
> -- Hannes

Thanks, both.  Queued.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell function forms
  2025-05-16  7:25             ` Johannes Sixt
@ 2025-05-17 13:09               ` Junio C Hamano
  2025-05-18  7:41                 ` Johannes Sixt
  0 siblings, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2025-05-17 13:09 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Moumita, Eric Sunshine, git

Johannes Sixt <j6t@kdbg.org> writes:

>> +y=<RED>3.14<RESET><GREEN>2.71<RESET>
>> +z=<RED>.5<RESET><GREEN>.75<RESET>
>
> When do decimal numbers occur in shell scripts? Wouldn't it be more
> often the case that a fullstop is part of a regular expression or a file
> name or version number that happens to be surrounded by numbers? In that
> case, we would prefer to capture the digit sequences as separate words.

Sorry but I am confused.  

Do you want a filename "sample.3gp" treated as having separate
parts, "sample", ".3", and "gp", instead of a single word?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell function forms
  2025-05-17 13:09               ` Junio C Hamano
@ 2025-05-18  7:41                 ` Johannes Sixt
  0 siblings, 0 replies; 37+ messages in thread
From: Johannes Sixt @ 2025-05-18  7:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Moumita, Eric Sunshine, git

Am 17.05.25 um 15:09 schrieb Junio C Hamano:
> Johannes Sixt <j6t@kdbg.org> writes:
> 
>>> +y=<RED>3.14<RESET><GREEN>2.71<RESET>
>>> +z=<RED>.5<RESET><GREEN>.75<RESET>
>>
>> When do decimal numbers occur in shell scripts? Wouldn't it be more
>> often the case that a fullstop is part of a regular expression or a file
>> name or version number that happens to be surrounded by numbers? In that
>> case, we would prefer to capture the digit sequences as separate words.
> 
> Sorry but I am confused.  
> 
> Do you want a filename "sample.3gp" treated as having separate
> parts, "sample", ".3", and "gp", instead of a single word?

I didn't consider such a case. The cited paragraph could be interpreted
to say that we want the words "sample", ".", "3", and "gp". But the
pattern that we have in round v7 by a lucky strike actually gives
"sample", ".", and "3gp" (I think), which is much better.

-- Hannes


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2025-05-18  7:41 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-11 11:46 [PATCH 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Moumita
2025-02-11 11:46 ` [PATCH 1/1] Added built in function recognition for shell Moumita
2025-02-15 14:37   ` Johannes Sixt
2025-02-18 15:35 ` [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Moumita
2025-02-18 15:35   ` [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
2025-02-18 19:30     ` Junio C Hamano
2025-02-22 18:15       ` Johannes Sixt
2025-02-24 16:28         ` Junio C Hamano
2025-02-18 23:38     ` Junio C Hamano
2025-02-22 18:14     ` Johannes Sixt
2025-02-18 17:30   ` [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Eric Sunshine
2025-03-28 20:05   ` [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns Moumita
2025-03-28 20:05     ` [PATCH v3 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
2025-03-29 19:26     ` [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns Junio C Hamano
2025-03-30 12:28       ` MOUMITA DHAR
2025-03-30 13:39     ` [PATCH v4 0/1][GSOC] userdiff:Added newlines at the end of the test cases Moumita
2025-03-30 13:39       ` [PATCH v4 1/1][GSOC] userdiff: extend Bash pattern to cover more shell function forms Moumita
2025-05-02 21:27         ` Junio C Hamano
2025-05-06 16:30         ` Johannes Sixt
2025-05-10 11:37           ` MOUMITA DHAR
2025-05-10 12:40             ` Johannes Sixt
2025-05-11 12:58       ` [PATCH v5 0/1] Added the closing ")" to make sure is not unbalanced and corrected the tests for word diff Moumita
2025-05-11 12:58         ` [PATCH v5 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
2025-05-11 13:28         ` Moumita
2025-05-11 13:28           ` Moumita
2025-05-11 13:37         ` Moumita
2025-05-11 14:11         ` [PATCH v6 0/1] Added the newline after the test in t/4018 Moumita
2025-05-11 14:11           ` [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
2025-05-13 18:50             ` Junio C Hamano
2025-05-14  6:33               ` MOUMITA DHAR
2025-05-16  7:25             ` Johannes Sixt
2025-05-17 13:09               ` Junio C Hamano
2025-05-18  7:41                 ` Johannes Sixt
2025-05-16 14:45           ` [PATCH v7 0/1] Updated the word diff regex for Bash scripts Moumita
2025-05-16 14:45             ` [PATCH v7 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
2025-05-16 17:45               ` Johannes Sixt
2025-05-16 21:56                 ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).