* [PATCH 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts @ 2025-02-11 11:46 Moumita 2025-02-11 11:46 ` [PATCH 1/1] Added built in function recognition for shell Moumita 2025-02-18 15:35 ` [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Moumita 0 siblings, 2 replies; 37+ messages in thread From: Moumita @ 2025-02-11 11:46 UTC (permalink / raw) To: git; +Cc: Moumita This patch adds a built-in userdiff pattern for shell scripts, allowing Git to recognize function names in shell scripts when displaying diffs. This change defines a regular expression for detecting shell functions. Moumita (1): Added built in function recognition for shell userdiff.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) -- 2.48.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 1/1] Added built in function recognition for shell 2025-02-11 11:46 [PATCH 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Moumita @ 2025-02-11 11:46 ` Moumita 2025-02-15 14:37 ` Johannes Sixt 2025-02-18 15:35 ` [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Moumita 1 sibling, 1 reply; 37+ messages in thread From: Moumita @ 2025-02-11 11:46 UTC (permalink / raw) To: git Cc: Moumita, Eric Sunshine, Junio C Hamano, René Scharfe, Atharva Raykar, D. Ben Knoble Introduced a built-in userdiff driver for shell scripts, enabling accurate function name recognition in `git diff` hunk headers. Enhancements include: - Function name detection for both POSIX and Bash/Ksh-style functions: - `function_name() { ... }` - `function function_name { ... }` - Exclusion of shell keywords that can resemble function names, preventing false matches (e.g., `if`, `for`, `while`, `return`, etc.). - Improved tokenization support for: - Identifiers (variable and function names) - Numeric constants (integers and decimals) - Shell variables (`$VAR`, `${VAR}`) - Logical (`&&`, `||`, `==`, `!=`, `<=`, `>=`) and arithmetic operators - Assignment and redirection operators - Brackets and grouping symbols This update improves Git’s diff readability for shell scripts, bringing it in line with existing built-in userdiff drivers. Signed-off-by: Moumita <dhar61595@gmail.com> --- userdiff.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/userdiff.c b/userdiff.c index 340c4eb4f7..a8c14807c6 100644 --- a/userdiff.c +++ b/userdiff.c @@ -334,6 +334,26 @@ PATTERNS("scheme", "\\|([^\\\\]*)\\|" /* All other words should be delimited by spaces or parentheses */ "|([^][)(}{[ \t])+"), +PATTERNS("shell", + /* Negate shell keywords that can look like functions */ + "!^[ \t]*(if|elif|else|fi|for|while|until|case|esac|then|do|done|return|break|continue)\\b\n" + /* POSIX-style shell functions: function_name() { ... } */ + "^[ \t]*([a-zA-Z_][a-zA-Z0-9_]*)[ \t]*\\(\\)[ \t]*\\{\n" + /* Bash/Ksh-style functions: function function_name { ... } */ + "^[ \t]*function[ \t]+([a-zA-Z_][a-zA-Z0-9_]*)[ \t]*\\{\n", + /* -- */ + /* Identifiers: variable and function names */ + "[a-zA-Z_][a-zA-Z0-9_]*" + /* Numeric constants: integers and decimals */ + "|[-+]?[0-9]+(\\.[0-9]*)?" + /* Shell variables: $VAR and ${VAR} */ + "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{[^}]+\\}" + /* Logical and comparison operators */ + "|\\|\\||&&|<<|>>|==|!=|<=|>=" + /* Assignment and arithmetic operators */ + "|[-+*/%&|^!=<>]=?" + /* Brackets and grouping symbols */ + "|\\(|\\)|\\{|\\}|\\[|\\]"), PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$", "\\\\[a-zA-Z@]+|\\\\.|([a-zA-Z0-9]|[^\x01-\x7f])+"), { .name = "default", .binary = -1 }, -- 2.48.0 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 1/1] Added built in function recognition for shell 2025-02-11 11:46 ` [PATCH 1/1] Added built in function recognition for shell Moumita @ 2025-02-15 14:37 ` Johannes Sixt 0 siblings, 0 replies; 37+ messages in thread From: Johannes Sixt @ 2025-02-15 14:37 UTC (permalink / raw) To: Moumita Cc: Eric Sunshine, Junio C Hamano, René Scharfe, Atharva Raykar, D. Ben Knoble, git Am 11.02.25 um 12:46 schrieb Moumita: > Introduced a built-in userdiff driver for shell scripts, enabling > accurate function name recognition in `git diff` hunk headers. > > Enhancements include: > - Function name detection for both POSIX and Bash/Ksh-style functions: > - `function_name() { ... }` > - `function function_name { ... }` > - Exclusion of shell keywords that can resemble function names, > preventing false matches (e.g., `if`, `for`, `while`, `return`, etc.). > - Improved tokenization support for: > - Identifiers (variable and function names) > - Numeric constants (integers and decimals) > - Shell variables (`$VAR`, `${VAR}`) > - Logical (`&&`, `||`, `==`, `!=`, `<=`, `>=`) and arithmetic operators > - Assignment and redirection operators > - Brackets and grouping symbols > > This update improves Git’s diff readability for shell scripts, > bringing it in line with existing built-in userdiff drivers. Please remove the marketing tone from the commit message. - "accurate function name recognition": Is not possible because the tools (regular expressions) don't have sufficient context to allow it. - "Enhancements include": Either list *all* enhancements, or focus on noteworthy ones, but don't make a long list that's still incomplete. - The word "improve" is never needed in the context that we see it here, because you certainly don't want to worsen the code base. Please study section "Describe your changes well" in Documentation/SubmittingPatches on how to write the commit message. (Describe the state before the change in present tense, and the changes in imperative mood.) An important point is *why* we want this change. In the case of a new userdiff driver, however, the answer is usally simply "because we can", and everybody knows it. Please don't write a long paragraph that dances around this fact. Just don't write it; but if there are other reason, please do say so. > > Signed-off-by: Moumita <dhar61595@gmail.com> The name given in Signed-off-by has legal meaning. If you have a given name and a surname, please use both names here and then by extension also as patch author. > --- > userdiff.c | 20 ++++++++++++++++++++ > 1 file changed, 20 insertions(+) > > diff --git a/userdiff.c b/userdiff.c > index 340c4eb4f7..a8c14807c6 100644 > --- a/userdiff.c > +++ b/userdiff.c > @@ -334,6 +334,26 @@ PATTERNS("scheme", > "\\|([^\\\\]*)\\|" > /* All other words should be delimited by spaces or parentheses */ > "|([^][)(}{[ \t])+"), > +PATTERNS("shell", Correctly sorted into the list. Good! HOWEVER! We already have a diffdriver for bash. It would be better to extend that one than to introduce a new driver. > + /* Negate shell keywords that can look like functions */ > + "!^[ \t]*(if|elif|else|fi|for|while|until|case|esac|then|do|done|return|break|continue)\\b\n" This list looks unnecessary. The userdiff driver can assume that it operates on syntactically valid text. if () { } isn't correct, and that's the case for all listed keywords. > + /* POSIX-style shell functions: function_name() { ... } */ > + "^[ \t]*([a-zA-Z_][a-zA-Z0-9_]*)[ \t]*\\(\\)[ \t]*\\{\n" Two nitpicks: - There can be whitespace between the parentheses. - The function body can also be in parentheses. foo ( ) ( echo ) would be a correct shell function definition. Its body always runs in a sub-shell. > + /* Bash/Ksh-style functions: function function_name { ... } */ > + "^[ \t]*function[ \t]+([a-zA-Z_][a-zA-Z0-9_]*)[ \t]*\\{\n", Both here and above, just let the regular expression end before the opening bracket. Then it would also recognize this oddity: function x \ { echo x; } > + /* -- */ > + /* Identifiers: variable and function names */ > + "[a-zA-Z_][a-zA-Z0-9_]*" > + /* Numeric constants: integers and decimals */ > + "|[-+]?[0-9]+(\\.[0-9]*)?" Typically, a - or + isn't part of the number, but a separate operator. I suggest to drop the leading '[-+]?'. But see below. > + /* Shell variables: $VAR and ${VAR} */ > + "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{[^}]+\\}" The uses of ${name} are rather exceptional. The cases where ${ is used are the complex cases, for example: base=${fname##*/} base=${base%."$ext"} In such cases, I would prefer that the constituents of the expression are their own tokens and not lumped into a single "variable name" token. > + /* Logical and comparison operators */ > + "|\\|\\||&&|<<|>>|==|!=|<=|>=" > + /* Assignment and arithmetic operators */ > + "|[-+*/%&|^!=<>]=?" Which makes me think: Text in shell commands has only very little restrictions. A typical case are command options starting with one or two dashes. Do we want to separate the dash from the option name? This regular expression would do so, and I would consider it a deficiency. > + /* Brackets and grouping symbols */ > + "|\\(|\\)|\\{|\\}|\\[|\\]"), > PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$", > "\\\\[a-zA-Z@]+|\\\\.|([a-zA-Z0-9]|[^\x01-\x7f])+"), > { .name = "default", .binary = -1 }, It would be nice to have some tests. Have a look at t/t4018/bash-* and t/t4034/* (no bash there, yet). -- Hannes ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts 2025-02-11 11:46 [PATCH 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Moumita 2025-02-11 11:46 ` [PATCH 1/1] Added built in function recognition for shell Moumita @ 2025-02-18 15:35 ` Moumita 2025-02-18 15:35 ` [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita ` (2 more replies) 1 sibling, 3 replies; 37+ messages in thread From: Moumita @ 2025-02-18 15:35 UTC (permalink / raw) To: git Cc: Moumita, Johannes Sixt, Eric Sunshine, Junio C Hamano, René Scharfe, Atharva Raykar, D. Ben Knoble The modifications that I made were - "^[ \t]*([a-zA-Z_][a-zA-Z0-9_]*)[ \t]*\\([ \t]*\\)[ \t]*" - so that is allows foo() and foo ( ). "^[ \t]*([a-zA-Z_][a-zA-Z0-9_]*)[ \t]*\\([ \t]*\\)[ \t]*(\\{|\\(|\\[\\[)" - so that it recognises {, (, or [[ as function bodies "^[ \t]*([a-zA-Z_][a-zA-Z0-9_]*)[ \t]*\\([ \t]*\\)[ \t]*(\\{|\\(|\\[\\[)|\\\\\n"- so that it matches "function foo \ { echo "hello"; }" "(?:function[ \t]+(?=[a-zA-Z_]))?[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))?" - so that it matches function foo() { echo "hello"; } and also function bar { echo "no parens"; } "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{[^}]+\\}"- It seperates ${} expansions "|--?[a-zA-Z0-9_-]+" - It ensures -opt and --long-opt are treated as whole tokens. "|[0-9]+(\\.[0-9]*)?" - It matches numbers without +/- Moumita Dhar (1): userdiff: extend Bash pattern to cover more shell function forms userdiff.c | 34 +++++++++++++++++++++++----------- 1 file changed, 23 insertions(+), 11 deletions(-) Range-diff against v1: 1: 683ea08819 ! 1: de2e8f9792 userdiff : Added built in function recognition for shell @@ ## Metadata ## -Author: Moumita <dhar61595@gmail.com> +Author: Moumita Dhar <dhar61595@gmail.com> ## Commit message ## - userdiff : Added built in function recognition for shell + userdiff: extend Bash pattern to cover more shell function forms - Introduced a built-in userdiff driver for shell scripts, enabling - accurate function name recognition in `git diff` hunk headers. + The existing Bash userdiff pattern misses some shell function forms, such as + `function foo()`, multi-line definitions, and extra whitespace. - Enhancements include: - - Function name detection for both POSIX and Bash/Ksh-style functions: - - `function_name() { ... }` - - `function function_name { ... }` - - Exclusion of shell keywords that can resemble function names, - preventing false matches (e.g., `if`, `for`, `while`, `return`, etc.). - - Improved tokenization support for: - - Identifiers (variable and function names) - - Numeric constants (integers and decimals) - - Shell variables (`$VAR`, `${VAR}`) - - Logical (`&&`, `||`, `==`, `!=`, `<=`, `>=`) and arithmetic operators - - Assignment and redirection operators - - Brackets and grouping symbols + Extend the pattern to: + - Support `function foo()` syntax. + - Allow spaces in `foo ( )` definitions. + - Recognize multi-line definitions with backslashes. + - Broaden function body detection. - This update improves Git’s diff readability for shell scripts, - bringing it in line with existing built-in userdiff drivers. - - Signed-off-by: Moumita <dhar61595@gmail.com> + Signed-off-by: Moumita Dhar <dhar61595@gmail.com> ## userdiff.c ## -@@ userdiff.c: PATTERNS("scheme", - "\\|([^\\\\]*)\\|" - /* All other words should be delimited by spaces or parentheses */ - "|([^][)(}{[ \t])+"), -+PATTERNS("shell", -+ /* Negate shell keywords that can look like functions */ -+ "!^[ \t]*(if|elif|else|fi|for|while|until|case|esac|then|do|done|return|break|continue)\\b\n" -+ /* POSIX-style shell functions: function_name() { ... } */ -+ "^[ \t]*([a-zA-Z_][a-zA-Z0-9_]*)[ \t]*\\(\\)[ \t]*\\{\n" -+ /* Bash/Ksh-style functions: function function_name { ... } */ -+ "^[ \t]*function[ \t]+([a-zA-Z_][a-zA-Z0-9_]*)[ \t]*\\{\n", -+ /* -- */ +@@ userdiff.c: IPATTERN("ada", + "|[-+]?[0-9][0-9#_.aAbBcCdDeEfF]*([eE][+-]?[0-9_]+)?" + "|=>|\\.\\.|\\*\\*|:=|/=|>=|<=|<<|>>|<>"), + PATTERNS("bash", +- /* Optional leading indentation */ ++ /* Optional leading indentation */ + "^[ \t]*" +- /* Start of captured text */ ++ /* Start of captured function name */ + "(" + "(" +- /* POSIX identifier with mandatory parentheses */ +- "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" ++ /* POSIX identifier with mandatory parentheses (allow spaces inside) */ ++ "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\)" + "|" +- /* Bashism identifier with optional parentheses */ +- "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" ++ /* Bash-style function definitions, allowing optional `function` keyword */ ++ "(?:function[ \t]+(?=[a-zA-Z_]))?[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))?" + ")" + /* Optional whitespace */ + "[ \t]*" +- /* Compound command starting with `{`, `(`, `((` or `[[` */ +- "(\\{|\\(\\(?|\\[\\[)" +- /* End of captured text */ ++ /* Allow function body to start with `{`, `(` (subshell), `[[` */ ++ "(\\{|\\(|\\[\\[)" ++ /* End of captured function name */ + ")", + /* -- */ +- /* Characters not in the default $IFS value */ +- "[^ \t]+"), + /* Identifiers: variable and function names */ + "[a-zA-Z_][a-zA-Z0-9_]*" + /* Numeric constants: integers and decimals */ -+ "|[-+]?[0-9]+(\\.[0-9]*)?" -+ /* Shell variables: $VAR and ${VAR} */ ++ "|[-+]?[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+" ++ /* Shell variables: `$VAR`, `${VAR}` */ + "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{[^}]+\\}" + /* Logical and comparison operators */ + "|\\|\\||&&|<<|>>|==|!=|<=|>=" + /* Assignment and arithmetic operators */ + "|[-+*/%&|^!=<>]=?" ++ /* Command-line options (to avoid splitting `-option`) */ ++ "|--?[a-zA-Z0-9_-]+" + /* Brackets and grouping symbols */ + "|\\(|\\)|\\{|\\}|\\[|\\]"), - PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$", - "\\\\[a-zA-Z@]+|\\\\.|([a-zA-Z0-9]|[^\x01-\x7f])+"), - { .name = "default", .binary = -1 }, + PATTERNS("bibtex", + "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$", + /* -- */ -- 2.48.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms 2025-02-18 15:35 ` [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Moumita @ 2025-02-18 15:35 ` Moumita 2025-02-18 19:30 ` Junio C Hamano ` (2 more replies) 2025-02-18 17:30 ` [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Eric Sunshine 2025-03-28 20:05 ` [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns Moumita 2 siblings, 3 replies; 37+ messages in thread From: Moumita @ 2025-02-18 15:35 UTC (permalink / raw) To: git; +Cc: Moumita Dhar From: Moumita Dhar <dhar61595@gmail.com> The existing Bash userdiff pattern misses some shell function forms, such as `function foo()`, multi-line definitions, and extra whitespace. Extend the pattern to: - Support `function foo()` syntax. - Allow spaces in `foo ( )` definitions. - Recognize multi-line definitions with backslashes. - Broaden function body detection. Signed-off-by: Moumita Dhar <dhar61595@gmail.com> --- userdiff.c | 34 +++++++++++++++++++++++----------- 1 file changed, 23 insertions(+), 11 deletions(-) diff --git a/userdiff.c b/userdiff.c index 340c4eb4f7..194e28883d 100644 --- a/userdiff.c +++ b/userdiff.c @@ -53,26 +53,38 @@ IPATTERN("ada", "|[-+]?[0-9][0-9#_.aAbBcCdDeEfF]*([eE][+-]?[0-9_]+)?" "|=>|\\.\\.|\\*\\*|:=|/=|>=|<=|<<|>>|<>"), PATTERNS("bash", - /* Optional leading indentation */ + /* Optional leading indentation */ "^[ \t]*" - /* Start of captured text */ + /* Start of captured function name */ "(" "(" - /* POSIX identifier with mandatory parentheses */ - "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" + /* POSIX identifier with mandatory parentheses (allow spaces inside) */ + "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\)" "|" - /* Bashism identifier with optional parentheses */ - "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" + /* Bash-style function definitions, allowing optional `function` keyword */ + "(?:function[ \t]+(?=[a-zA-Z_]))?[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))?" ")" /* Optional whitespace */ "[ \t]*" - /* Compound command starting with `{`, `(`, `((` or `[[` */ - "(\\{|\\(\\(?|\\[\\[)" - /* End of captured text */ + /* Allow function body to start with `{`, `(` (subshell), `[[` */ + "(\\{|\\(|\\[\\[)" + /* End of captured function name */ ")", /* -- */ - /* Characters not in the default $IFS value */ - "[^ \t]+"), + /* Identifiers: variable and function names */ + "[a-zA-Z_][a-zA-Z0-9_]*" + /* Numeric constants: integers and decimals */ + "|[-+]?[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+" + /* Shell variables: `$VAR`, `${VAR}` */ + "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{[^}]+\\}" + /* Logical and comparison operators */ + "|\\|\\||&&|<<|>>|==|!=|<=|>=" + /* Assignment and arithmetic operators */ + "|[-+*/%&|^!=<>]=?" + /* Command-line options (to avoid splitting `-option`) */ + "|--?[a-zA-Z0-9_-]+" + /* Brackets and grouping symbols */ + "|\\(|\\)|\\{|\\}|\\[|\\]"), PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$", /* -- */ -- 2.48.0 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms 2025-02-18 15:35 ` [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita @ 2025-02-18 19:30 ` Junio C Hamano 2025-02-22 18:15 ` Johannes Sixt 2025-02-18 23:38 ` Junio C Hamano 2025-02-22 18:14 ` Johannes Sixt 2 siblings, 1 reply; 37+ messages in thread From: Junio C Hamano @ 2025-02-18 19:30 UTC (permalink / raw) To: Moumita; +Cc: git Moumita <dhar61595@gmail.com> writes: > PATTERNS("bash", > - /* Optional leading indentation */ > + /* Optional leading indentation */ What is this change about? > "^[ \t]*" > - /* Start of captured text */ > + /* Start of captured function name */ > "(" > "(" > - /* POSIX identifier with mandatory parentheses */ > - "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" > + /* POSIX identifier with mandatory parentheses (allow spaces inside) */ > + "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\)" Is indentation-change intended and required for this patch to work correctly? > "|" > - /* Bashism identifier with optional parentheses */ > - "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" > + /* Bash-style function definitions, allowing optional `function` keyword */ > + "(?:function[ \t]+(?=[a-zA-Z_]))?[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))?" Ditto. Regular expressions are write-only language; please make sure that you do not add any unnecessary changes to distract eyes of reviewers from spotting the _real_ changes that improves the current codebase. > ")" > /* Optional whitespace */ > "[ \t]*" > - /* Compound command starting with `{`, `(`, `((` or `[[` */ > - "(\\{|\\(\\(?|\\[\\[)" > - /* End of captured text */ > + /* Allow function body to start with `{`, `(` (subshell), `[[` */ > + "(\\{|\\(|\\[\\[)" > + /* End of captured function name */ > ")", > /* -- */ > - /* Characters not in the default $IFS value */ > - "[^ \t]+"), We used to pretty-much use "a run of non-whitespace characters is a token". Now we are a bit more picky. Which may or may not be good, but it is hard to tell if it is an improvement. > + /* Identifiers: variable and function names */ > + "[a-zA-Z_][a-zA-Z0-9_]*" > + /* Numeric constants: integers and decimals */ > + "|[-+]?[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+" > + /* Shell variables: `$VAR`, `${VAR}` */ > + "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{[^}]+\\}" > + /* Logical and comparison operators */ > + "|\\|\\||&&|<<|>>|==|!=|<=|>=" > + /* Assignment and arithmetic operators */ > + "|[-+*/%&|^!=<>]=?" > + /* Command-line options (to avoid splitting `-option`) */ > + "|--?[a-zA-Z0-9_-]+" > + /* Brackets and grouping symbols */ > + "|\\(|\\)|\\{|\\}|\\[|\\]"), The fact that this patch does not have any changes to "t/" hierarchy suggests me that we do not have existing tests to see how sample text files in the supported languages are tokenized (otherwise the above changes would require adjusting such existing tests), so I think it should be left outside of this topic, but I wonder if adding such tests gives us a good way to demonstrate the effect of these changes to userdiff patterns. Thanks. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms 2025-02-18 19:30 ` Junio C Hamano @ 2025-02-22 18:15 ` Johannes Sixt 2025-02-24 16:28 ` Junio C Hamano 0 siblings, 1 reply; 37+ messages in thread From: Johannes Sixt @ 2025-02-22 18:15 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Moumita Am 18.02.25 um 20:30 schrieb Junio C Hamano: > Moumita <dhar61595@gmail.com> writes: >> /* -- */ >> - /* Characters not in the default $IFS value */ >> - "[^ \t]+"), > > We used to pretty-much use "a run of non-whitespace characters is a > token". Now we are a bit more picky. > > Which may or may not be good, but it is hard to tell if it is an > improvement. It is only a stand-in, because every built-in userdiff driver must have a word pattern. See the old thread here: https://lore.kernel.org/git/373640ea4d95f3b279b9d460d9a8889b4030b4e9.camel@engmark.name/ -- Hannes ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms 2025-02-22 18:15 ` Johannes Sixt @ 2025-02-24 16:28 ` Junio C Hamano 0 siblings, 0 replies; 37+ messages in thread From: Junio C Hamano @ 2025-02-24 16:28 UTC (permalink / raw) To: Johannes Sixt; +Cc: git, Moumita Johannes Sixt <j6t@kdbg.org> writes: > Am 18.02.25 um 20:30 schrieb Junio C Hamano: >> Moumita <dhar61595@gmail.com> writes: >>> /* -- */ >>> - /* Characters not in the default $IFS value */ >>> - "[^ \t]+"), >> >> We used to pretty-much use "a run of non-whitespace characters is a >> token". Now we are a bit more picky. >> >> Which may or may not be good, but it is hard to tell if it is an >> improvement. > > It is only a stand-in, because every built-in userdiff driver must have > a word pattern. Yeah, I know. I was merely saying that it was not obvious that the new pattern, which is way more elaborate, is improvement over that stand-in pattern. As these patterns are meant to be applied to only syntactically valid text, by going more specific pattern from simple and lenient pattern, we stop recognising some word that we used to take as a word (i.e. specific patterns need to worry about false negatives, while simpler patterns only have to avoid egregious false positives). > See the old thread here: > https://lore.kernel.org/git/373640ea4d95f3b279b9d460d9a8889b4030b4e9.camel@engmark.name/ Yup, 2ff6c346 (userdiff: support Bash, 2020-10-22) is where the stand-in pattern came from, which is the v3 iteration of that patch. Thanks. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms 2025-02-18 15:35 ` [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita 2025-02-18 19:30 ` Junio C Hamano @ 2025-02-18 23:38 ` Junio C Hamano 2025-02-22 18:14 ` Johannes Sixt 2 siblings, 0 replies; 37+ messages in thread From: Junio C Hamano @ 2025-02-18 23:38 UTC (permalink / raw) To: Moumita; +Cc: git Moumita <dhar61595@gmail.com> writes: > From: Moumita Dhar <dhar61595@gmail.com> > > The existing Bash userdiff pattern misses some shell function forms, such as > `function foo()`, multi-line definitions, and extra whitespace. > > Extend the pattern to: > - Support `function foo()` syntax. > - Allow spaces in `foo ( )` definitions. > - Recognize multi-line definitions with backslashes. > - Broaden function body detection. > > Signed-off-by: Moumita Dhar <dhar61595@gmail.com> Applied to any one of the recent tips of 'master', this seemed break tests and the reproduction seems to be quite easy. $ make $ cd t && sh t4018-*.sh -i -v ... test_expect_code: command exited with 128, we wanted 1 ... not ok 6 - builtin bash pattern compiles #... # test_grep ! fatal msg && # test_grep ! error msg # $ cat t/trash*.t4018*/msg fatal: Invalid regexp to look for hunk header: ^[ ]*(([a-z... Please make it a habit to run tests after you modified the code before sending out patches with the modifications. Thanks. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms 2025-02-18 15:35 ` [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita 2025-02-18 19:30 ` Junio C Hamano 2025-02-18 23:38 ` Junio C Hamano @ 2025-02-22 18:14 ` Johannes Sixt 2 siblings, 0 replies; 37+ messages in thread From: Johannes Sixt @ 2025-02-22 18:14 UTC (permalink / raw) To: Moumita; +Cc: git Thank you for this follow-up. After reviewing my text below I find that the words may sound harsh. Please don't let you turn down by the lack of friendlyness. In technical reviews I (and very likely many other reviewers on this list) prefer to get to the point right away and not waste our time by talking in circles. I also do not repeat what others have said in their review. Am 18.02.25 um 16:35 schrieb Moumita: > From: Moumita Dhar <dhar61595@gmail.com> > > The existing Bash userdiff pattern misses some shell function forms, such as > `function foo()`, multi-line definitions, and extra whitespace. > > Extend the pattern to: > - Support `function foo()` syntax. > - Allow spaces in `foo ( )` definitions. > - Recognize multi-line definitions with backslashes. > - Broaden function body detection. Please be accurate what you mention here. The first two are already present and not new, the third one I cannot find, and the last one is not immediately visible if not wrong (you are removing the '((' introducer). > > Signed-off-by: Moumita Dhar <dhar61595@gmail.com> > --- > userdiff.c | 34 +++++++++++++++++++++++----------- > 1 file changed, 23 insertions(+), 11 deletions(-) > > diff --git a/userdiff.c b/userdiff.c > index 340c4eb4f7..194e28883d 100644 > --- a/userdiff.c > +++ b/userdiff.c > @@ -53,26 +53,38 @@ IPATTERN("ada", > "|[-+]?[0-9][0-9#_.aAbBcCdDeEfF]*([eE][+-]?[0-9_]+)?" > "|=>|\\.\\.|\\*\\*|:=|/=|>=|<=|<<|>>|<>"), > PATTERNS("bash", > - /* Optional leading indentation */ > + /* Optional leading indentation */ > "^[ \t]*" > - /* Start of captured text */ > + /* Start of captured function name */ > "(" > "(" > - /* POSIX identifier with mandatory parentheses */ > - "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" > + /* POSIX identifier with mandatory parentheses (allow spaces inside) */ > + "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\)" > "|" > - /* Bashism identifier with optional parentheses */ > - "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" > + /* Bash-style function definitions, allowing optional `function` keyword */ > + "(?:function[ \t]+(?=[a-zA-Z_]))?[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))?" > ")" You inserted (?=[a-zA-Z_]) after the function. What does this do? Then you made 'function identifier' optional. Why? You also added a ? at the end. Is that to make the parenthesis optional? But look closely: they are already optional, because we already have ( space_opt "(" space_opt ")" | space_opt ) It would be a minor improvement to turn this into space_opt ( "(" space_opt ")" )? Make a mental bookmark here. > /* Optional whitespace */ > "[ \t]*" > - /* Compound command starting with `{`, `(`, `((` or `[[` */ > - "(\\{|\\(\\(?|\\[\\[)" > - /* End of captured text */ > + /* Allow function body to start with `{`, `(` (subshell), `[[` */ > + "(\\{|\\(|\\[\\[)" > + /* End of captured function name */ What is the justification that you removed "((" as introduction of the function body? > ")", So, we capture only what we expect to be a function header in typical cases. Let me make a suggestion. What if we replaced everything from the mental bookmark above up to the closing parenthesis by .*$ like we do in all other builtin drivers? Then everything on the line that contains the function header would be used as hunk header, and we do not care what the function body looks like. It would also cover the case mentioned by Eric elsewhere where the body is just a simple command. If that leads to new cases that are detected, it would be really nice to add corresponding test cases in t/t4018/. > /* -- */ > - /* Characters not in the default $IFS value */ > - "[^ \t]+"), > + /* Identifiers: variable and function names */ > + "[a-zA-Z_][a-zA-Z0-9_]*" > + /* Numeric constants: integers and decimals */ > + "|[-+]?[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+" > + /* Shell variables: `$VAR`, `${VAR}` */ > + "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{[^}]+\\}" > + /* Logical and comparison operators */ > + "|\\|\\||&&|<<|>>|==|!=|<=|>=" > + /* Assignment and arithmetic operators */ > + "|[-+*/%&|^!=<>]=?" > + /* Command-line options (to avoid splitting `-option`) */ > + "|--?[a-zA-Z0-9_-]+" > + /* Brackets and grouping symbols */ > + "|\\(|\\)|\\{|\\}|\\[|\\]"), As far as I can see, you introduced a pattern for options, but otherwise left the patterns the same as in the earlier round. Of course, you are not obliged to integrate every suggestion that is made by a reviewer, but it is good tone that you leave a comment explaining why you dismissed a suggestion. To expand on my suggestion how to treat ${var} substitution: - Replace \\$\\{[^}]+\\} by just \\$\\{ to match only the introducer. (The closing brace is already matched by a later pattern.) - Add operators := :- :+ :? # ## %% that are used frequently in these complex expansions. - Have a look at section "Shell Parameter Expansion" in the bash manual[1] about the multitude of operators that can occur after ${. Maybe you want to add more of them. [1] https://www.gnu.org/software/bash/manual/bash.html#Shell-Parameter-Expansion -- Hannes ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts 2025-02-18 15:35 ` [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Moumita 2025-02-18 15:35 ` [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita @ 2025-02-18 17:30 ` Eric Sunshine 2025-03-28 20:05 ` [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns Moumita 2 siblings, 0 replies; 37+ messages in thread From: Eric Sunshine @ 2025-02-18 17:30 UTC (permalink / raw) To: Moumita Cc: git, Johannes Sixt, Junio C Hamano, René Scharfe, Atharva Raykar, D. Ben Knoble On Tue, Feb 18, 2025 at 10:36 AM Moumita <dhar61595@gmail.com> wrote: > The modifications that I made were - > "^[ \t]*([a-zA-Z_][a-zA-Z0-9_]*)[ \t]*\\([ \t]*\\)[ \t]*" - so that is allows foo() and foo ( ). > "^[ \t]*([a-zA-Z_][a-zA-Z0-9_]*)[ \t]*\\([ \t]*\\)[ \t]*(\\{|\\(|\\[\\[)" - so that it recognises {, (, or [[ as function bodies Regarding this last point, for completeness, I had meant to respond to j6t's review of your patch by saying that, according to POSIX, a function body can be any compound command, which means the body does not need to be encased in braces. For instance, this works: $ foo() echo nothing $ foo nothing $ That said, I doubt that this usage is common, thus is probably not worth worrying about. ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns 2025-02-18 15:35 ` [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Moumita 2025-02-18 15:35 ` [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita 2025-02-18 17:30 ` [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Eric Sunshine @ 2025-03-28 20:05 ` Moumita 2025-03-28 20:05 ` [PATCH v3 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita ` (2 more replies) 2 siblings, 3 replies; 37+ messages in thread From: Moumita @ 2025-03-28 20:05 UTC (permalink / raw) To: git Cc: Moumita, Johannes Sixt, Eric Sunshine, Junio C Hamano, René Scharfe, Atharva Raykar, D. Ben Knoble This patch improves function detection in userdiff for Bash scripts. The old regex tried to match function bodies explicitly, which caused issues with line continuations (`\`) and simple command bodies. Instead, I have replaced it with `.*$`, making it more consistent with other userdiff drivers and ensuring we capture the full function definition line. I also refined the word regex to better handle Bash syntax, including parameter expansions, arithmetic expressions, and command-line options. I have added test cases to cover these changes, making sure everything works as expected. Moumita Dhar (1): userdiff: extend Bash pattern to cover more shell function forms t/t4018/bash-bashism-style-multiline-function | 4 +++ t/t4018/bash-posix-style-multiline-function | 4 +++ .../bash-posix-style-single-command-function | 3 ++ t/t4034-diff-words.sh | 1 + t/t4034/bash/expect | 30 +++++++++++++++++++ t/t4034/bash/post | 25 ++++++++++++++++ t/t4034/bash/pre | 25 ++++++++++++++++ userdiff.c | 24 +++++++++++---- 8 files changed, 110 insertions(+), 6 deletions(-) create mode 100644 t/t4018/bash-bashism-style-multiline-function create mode 100644 t/t4018/bash-posix-style-multiline-function create mode 100644 t/t4018/bash-posix-style-single-command-function create mode 100644 t/t4034/bash/expect create mode 100644 t/t4034/bash/post create mode 100644 t/t4034/bash/pre Range-diff against v2: 1: de2e8f9792 ! 1: 3d077fadc4 userdiff: extend Bash pattern to cover more shell function forms @@ Metadata ## Commit message ## userdiff: extend Bash pattern to cover more shell function forms - The existing Bash userdiff pattern misses some shell function forms, such as - `function foo()`, multi-line definitions, and extra whitespace. + The previous function regex required explicit matching of function + bodies using `{`, `(`, `((`, or `[[`, which caused several issues: - Extend the pattern to: - - Support `function foo()` syntax. - - Allow spaces in `foo ( )` definitions. - - Recognize multi-line definitions with backslashes. - - Broaden function body detection. + - It failed to capture valid functions where `{` was on the next line + due to line continuation (`\`). + - It did not recognize functions with single command body, such as + `x () echo hello`. + + Replacing the function body matching logic with `.*$`, ensures + that everything on the function definition line is captured, + aligning with other userdiff drivers and improving hunk headers in + `git diff`. + + Additionally, the word regex is refined to better recognize shell + syntax, including additional parameter expansion operators and + command-line options, improving syntax-aware diffs. Signed-off-by: Moumita Dhar <dhar61595@gmail.com> + ## t/t4018/bash-bashism-style-multiline-function (new) ## +@@ ++function RIGHT \ ++{ ++ echo 'ChangeMe' ++} + \ No newline at end of file + + ## t/t4018/bash-posix-style-multiline-function (new) ## +@@ ++RIGHT() \ ++{ ++ ChangeMe ++} + \ No newline at end of file + + ## t/t4018/bash-posix-style-single-command-function (new) ## +@@ ++RIGHT() echo "hello" ++ ++ ChangeMe + + ## t/t4034-diff-words.sh ## +@@ t/t4034-diff-words.sh: test_expect_success 'unset default driver' ' + + test_language_driver ada + test_language_driver bibtex ++test_language_driver bash + test_language_driver cpp + test_language_driver csharp + test_language_driver css + + ## t/t4034/bash/expect (new) ## +@@ ++<BOLD>diff --git a/pre b/post<RESET> ++<BOLD>index 09ac008..60ba6a2 100644<RESET> ++<BOLD>--- a/pre<RESET> ++<BOLD>+++ b/post<RESET> ++<CYAN>@@ -1,25 +1,25 @@<RESET> ++<RED>my_var<RESET><GREEN>new_var<RESET>=10 ++x=<RED>123<RESET><GREEN>456<RESET> ++y=<RED>3.14<RESET><GREEN>2.71<RESET> ++z=<RED>.5<RESET><GREEN>.75<RESET> ++echo <RED>$USER<RESET><GREEN>$USERNAME<RESET> ++${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>} ++if [ "<RED>$a<RESET><GREEN>$x<RESET>" == "<RED>$b<RESET><GREEN>$y<RESET>" ] || [ "<RED>$c<RESET><GREEN>$x<RESET>" != "<RED>$d<RESET><GREEN>$y<RESET>" ]; then echo "OK"; fi ++((<RED>a<RESET><GREEN>x<RESET>+=<RED>b<RESET><GREEN>y<RESET>)) ++((<RED>a<RESET><GREEN>x<RESET>-=<RED>b<RESET><GREEN>y<RESET>)) ++$((<RED>a<RESET><GREEN>x<RESET><<<RED>b<RESET><GREEN>y<RESET>)) ++$((<RED>a<RESET><GREEN>x<RESET>>><RED>b<RESET><GREEN>y<RESET>)) ++${<RED>a<RESET><GREEN>x<RESET>:-<RED>b<RESET><GREEN>y<RESET>} ++${<RED>a<RESET><GREEN>x<RESET>:=<RED>b<RESET><GREEN>y<RESET>} ++${<RED>a<RESET><GREEN>x<RESET>##*/} ++${<RED>a<RESET><GREEN>x<RESET>%.*} ++${<RED>a<RESET><GREEN>x<RESET>%%.*} ++${<RED>a<RESET><GREEN>x<RESET>^^} ++${<RED>a<RESET><GREEN>x<RESET>,} ++${<RED>a<RESET><GREEN>x<RESET>,,} ++${!<RED>a<RESET><GREEN>x<RESET>} ++${<RED>a<RESET><GREEN>x<RESET>[@]} ++${<RED>a<RESET><GREEN>x<RESET>:?error message} ++${<RED>a<RESET><GREEN>x<RESET>:2:3} ++ls <RED>-a<RESET><GREEN>-x<RESET> ++ls <RED>--a<RESET><GREEN>--x<RESET> + + ## t/t4034/bash/post (new) ## +@@ ++new_var=10 ++x=456 ++y=2.71 ++z=.75 ++echo $USERNAME ++${HOMEDIR} ++if [ "$x" == "$y" ] || [ "$x" != "$y" ]; then echo "OK"; fi ++((x+=y)) ++((x-=y)) ++$((x<<y)) ++$((x>>y)) ++${x:-y} ++${x:=y} ++${x##*/} ++${x%.*} ++${x%%.*} ++${x^^} ++${x,} ++${x,,} ++${!x} ++${x[@]} ++${x:?error message} ++${x:2:3} ++ls -x ++ls --x + + ## t/t4034/bash/pre (new) ## +@@ ++my_var=10 ++x=123 ++y=3.14 ++z=.5 ++echo $USER ++${HOME} ++if [ "$a" == "$b" ] || [ "$c" != "$d" ]; then echo "OK"; fi ++((a+=b)) ++((a-=b)) ++$((a << b)) ++$((a >> b)) ++${a:-b} ++${a:=b} ++${a##*/} ++${a%.*} ++${a%%.*} ++${a^^} ++${a,} ++${a,,} ++${!a} ++${a[@]} ++${a:?error message} ++${a:2:3} ++ls -a ++ls --a + ## userdiff.c ## -@@ userdiff.c: IPATTERN("ada", - "|[-+]?[0-9][0-9#_.aAbBcCdDeEfF]*([eE][+-]?[0-9_]+)?" - "|=>|\\.\\.|\\*\\*|:=|/=|>=|<=|<<|>>|<>"), - PATTERNS("bash", -- /* Optional leading indentation */ -+ /* Optional leading indentation */ - "^[ \t]*" -- /* Start of captured text */ -+ /* Start of captured function name */ - "(" - "(" -- /* POSIX identifier with mandatory parentheses */ -- "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" -+ /* POSIX identifier with mandatory parentheses (allow spaces inside) */ -+ "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\)" - "|" -- /* Bashism identifier with optional parentheses */ -- "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" -+ /* Bash-style function definitions, allowing optional `function` keyword */ -+ "(?:function[ \t]+(?=[a-zA-Z_]))?[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))?" +@@ userdiff.c: PATTERNS("bash", + /* Bashism identifier with optional parentheses */ + "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" ")" - /* Optional whitespace */ - "[ \t]*" +- /* Optional whitespace */ +- "[ \t]*" - /* Compound command starting with `{`, `(`, `((` or `[[` */ - "(\\{|\\(\\(?|\\[\\[)" -- /* End of captured text */ -+ /* Allow function body to start with `{`, `(` (subshell), `[[` */ -+ "(\\{|\\(|\\[\\[)" -+ /* End of captured function name */ ++ /* Everything after the function header is captured */ ++ ".*$" + /* End of captured text */ ")", /* -- */ - /* Characters not in the default $IFS value */ - "[^ \t]+"), + /* Identifiers: variable and function names */ -+ "[a-zA-Z_][a-zA-Z0-9_]*" ++ "[a-zA-Z_][a-zA-Z0-9_]*" + /* Numeric constants: integers and decimals */ -+ "|[-+]?[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+" -+ /* Shell variables: `$VAR`, `${VAR}` */ -+ "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{[^}]+\\}" -+ /* Logical and comparison operators */ ++ "|[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+" ++ /* Shell variables: $VAR, ${VAR} */ ++ "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{" ++ /* Logical and comparison operators */ + "|\\|\\||&&|<<|>>|==|!=|<=|>=" + /* Assignment and arithmetic operators */ + "|[-+*/%&|^!=<>]=?" -+ /* Command-line options (to avoid splitting `-option`) */ ++ /* Additional parameter expansion operators */ ++ "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|/[a-zA-Z0-9_-]+|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?" ++ /* Command-line options (to avoid splitting -option) */ + "|--?[a-zA-Z0-9_-]+" + /* Brackets and grouping symbols */ + "|\\(|\\)|\\{|\\}|\\[|\\]"), -- 2.48.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH v3 1/1] userdiff: extend Bash pattern to cover more shell function forms 2025-03-28 20:05 ` [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns Moumita @ 2025-03-28 20:05 ` Moumita 2025-03-29 19:26 ` [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns Junio C Hamano 2025-03-30 13:39 ` [PATCH v4 0/1][GSOC] userdiff:Added newlines at the end of the test cases Moumita 2 siblings, 0 replies; 37+ messages in thread From: Moumita @ 2025-03-28 20:05 UTC (permalink / raw) To: git; +Cc: Moumita Dhar From: Moumita Dhar <dhar61595@gmail.com> The previous function regex required explicit matching of function bodies using `{`, `(`, `((`, or `[[`, which caused several issues: - It failed to capture valid functions where `{` was on the next line due to line continuation (`\`). - It did not recognize functions with single command body, such as `x () echo hello`. Replacing the function body matching logic with `.*$`, ensures that everything on the function definition line is captured, aligning with other userdiff drivers and improving hunk headers in `git diff`. Additionally, the word regex is refined to better recognize shell syntax, including additional parameter expansion operators and command-line options, improving syntax-aware diffs. Signed-off-by: Moumita Dhar <dhar61595@gmail.com> --- t/t4018/bash-bashism-style-multiline-function | 4 +++ t/t4018/bash-posix-style-multiline-function | 4 +++ .../bash-posix-style-single-command-function | 3 ++ t/t4034-diff-words.sh | 1 + t/t4034/bash/expect | 30 +++++++++++++++++++ t/t4034/bash/post | 25 ++++++++++++++++ t/t4034/bash/pre | 25 ++++++++++++++++ userdiff.c | 24 +++++++++++---- 8 files changed, 110 insertions(+), 6 deletions(-) create mode 100644 t/t4018/bash-bashism-style-multiline-function create mode 100644 t/t4018/bash-posix-style-multiline-function create mode 100644 t/t4018/bash-posix-style-single-command-function create mode 100644 t/t4034/bash/expect create mode 100644 t/t4034/bash/post create mode 100644 t/t4034/bash/pre diff --git a/t/t4018/bash-bashism-style-multiline-function b/t/t4018/bash-bashism-style-multiline-function new file mode 100644 index 0000000000..0800daa156 --- /dev/null +++ b/t/t4018/bash-bashism-style-multiline-function @@ -0,0 +1,4 @@ +function RIGHT \ +{ + echo 'ChangeMe' +} \ No newline at end of file diff --git a/t/t4018/bash-posix-style-multiline-function b/t/t4018/bash-posix-style-multiline-function new file mode 100644 index 0000000000..756f21524b --- /dev/null +++ b/t/t4018/bash-posix-style-multiline-function @@ -0,0 +1,4 @@ +RIGHT() \ +{ + ChangeMe +} \ No newline at end of file diff --git a/t/t4018/bash-posix-style-single-command-function b/t/t4018/bash-posix-style-single-command-function new file mode 100644 index 0000000000..398ae1c5d2 --- /dev/null +++ b/t/t4018/bash-posix-style-single-command-function @@ -0,0 +1,3 @@ +RIGHT() echo "hello" + + ChangeMe diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh index f51d3557f1..0be647c2fb 100755 --- a/t/t4034-diff-words.sh +++ b/t/t4034-diff-words.sh @@ -320,6 +320,7 @@ test_expect_success 'unset default driver' ' test_language_driver ada test_language_driver bibtex +test_language_driver bash test_language_driver cpp test_language_driver csharp test_language_driver css diff --git a/t/t4034/bash/expect b/t/t4034/bash/expect new file mode 100644 index 0000000000..a0f7cbd5a3 --- /dev/null +++ b/t/t4034/bash/expect @@ -0,0 +1,30 @@ +<BOLD>diff --git a/pre b/post<RESET> +<BOLD>index 09ac008..60ba6a2 100644<RESET> +<BOLD>--- a/pre<RESET> +<BOLD>+++ b/post<RESET> +<CYAN>@@ -1,25 +1,25 @@<RESET> +<RED>my_var<RESET><GREEN>new_var<RESET>=10 +x=<RED>123<RESET><GREEN>456<RESET> +y=<RED>3.14<RESET><GREEN>2.71<RESET> +z=<RED>.5<RESET><GREEN>.75<RESET> +echo <RED>$USER<RESET><GREEN>$USERNAME<RESET> +${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>} +if [ "<RED>$a<RESET><GREEN>$x<RESET>" == "<RED>$b<RESET><GREEN>$y<RESET>" ] || [ "<RED>$c<RESET><GREEN>$x<RESET>" != "<RED>$d<RESET><GREEN>$y<RESET>" ]; then echo "OK"; fi +((<RED>a<RESET><GREEN>x<RESET>+=<RED>b<RESET><GREEN>y<RESET>)) +((<RED>a<RESET><GREEN>x<RESET>-=<RED>b<RESET><GREEN>y<RESET>)) +$((<RED>a<RESET><GREEN>x<RESET><<<RED>b<RESET><GREEN>y<RESET>)) +$((<RED>a<RESET><GREEN>x<RESET>>><RED>b<RESET><GREEN>y<RESET>)) +${<RED>a<RESET><GREEN>x<RESET>:-<RED>b<RESET><GREEN>y<RESET>} +${<RED>a<RESET><GREEN>x<RESET>:=<RED>b<RESET><GREEN>y<RESET>} +${<RED>a<RESET><GREEN>x<RESET>##*/} +${<RED>a<RESET><GREEN>x<RESET>%.*} +${<RED>a<RESET><GREEN>x<RESET>%%.*} +${<RED>a<RESET><GREEN>x<RESET>^^} +${<RED>a<RESET><GREEN>x<RESET>,} +${<RED>a<RESET><GREEN>x<RESET>,,} +${!<RED>a<RESET><GREEN>x<RESET>} +${<RED>a<RESET><GREEN>x<RESET>[@]} +${<RED>a<RESET><GREEN>x<RESET>:?error message} +${<RED>a<RESET><GREEN>x<RESET>:2:3} +ls <RED>-a<RESET><GREEN>-x<RESET> +ls <RED>--a<RESET><GREEN>--x<RESET> diff --git a/t/t4034/bash/post b/t/t4034/bash/post new file mode 100644 index 0000000000..60ba6a2e75 --- /dev/null +++ b/t/t4034/bash/post @@ -0,0 +1,25 @@ +new_var=10 +x=456 +y=2.71 +z=.75 +echo $USERNAME +${HOMEDIR} +if [ "$x" == "$y" ] || [ "$x" != "$y" ]; then echo "OK"; fi +((x+=y)) +((x-=y)) +$((x<<y)) +$((x>>y)) +${x:-y} +${x:=y} +${x##*/} +${x%.*} +${x%%.*} +${x^^} +${x,} +${x,,} +${!x} +${x[@]} +${x:?error message} +${x:2:3} +ls -x +ls --x diff --git a/t/t4034/bash/pre b/t/t4034/bash/pre new file mode 100644 index 0000000000..09ac008a83 --- /dev/null +++ b/t/t4034/bash/pre @@ -0,0 +1,25 @@ +my_var=10 +x=123 +y=3.14 +z=.5 +echo $USER +${HOME} +if [ "$a" == "$b" ] || [ "$c" != "$d" ]; then echo "OK"; fi +((a+=b)) +((a-=b)) +$((a << b)) +$((a >> b)) +${a:-b} +${a:=b} +${a##*/} +${a%.*} +${a%%.*} +${a^^} +${a,} +${a,,} +${!a} +${a[@]} +${a:?error message} +${a:2:3} +ls -a +ls --a diff --git a/userdiff.c b/userdiff.c index 340c4eb4f7..4c77c7e0f6 100644 --- a/userdiff.c +++ b/userdiff.c @@ -64,15 +64,27 @@ PATTERNS("bash", /* Bashism identifier with optional parentheses */ "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" ")" - /* Optional whitespace */ - "[ \t]*" - /* Compound command starting with `{`, `(`, `((` or `[[` */ - "(\\{|\\(\\(?|\\[\\[)" + /* Everything after the function header is captured */ + ".*$" /* End of captured text */ ")", /* -- */ - /* Characters not in the default $IFS value */ - "[^ \t]+"), + /* Identifiers: variable and function names */ + "[a-zA-Z_][a-zA-Z0-9_]*" + /* Numeric constants: integers and decimals */ + "|[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+" + /* Shell variables: $VAR, ${VAR} */ + "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{" + /* Logical and comparison operators */ + "|\\|\\||&&|<<|>>|==|!=|<=|>=" + /* Assignment and arithmetic operators */ + "|[-+*/%&|^!=<>]=?" + /* Additional parameter expansion operators */ + "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|/[a-zA-Z0-9_-]+|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?" + /* Command-line options (to avoid splitting -option) */ + "|--?[a-zA-Z0-9_-]+" + /* Brackets and grouping symbols */ + "|\\(|\\)|\\{|\\}|\\[|\\]"), PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$", /* -- */ -- 2.48.0 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns 2025-03-28 20:05 ` [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns Moumita 2025-03-28 20:05 ` [PATCH v3 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita @ 2025-03-29 19:26 ` Junio C Hamano 2025-03-30 12:28 ` MOUMITA DHAR 2025-03-30 13:39 ` [PATCH v4 0/1][GSOC] userdiff:Added newlines at the end of the test cases Moumita 2 siblings, 1 reply; 37+ messages in thread From: Junio C Hamano @ 2025-03-29 19:26 UTC (permalink / raw) To: Moumita Cc: git, Johannes Sixt, Eric Sunshine, René Scharfe, Atharva Raykar, D. Ben Knoble Moumita <dhar61595@gmail.com> writes: > + ## t/t4018/bash-bashism-style-multiline-function (new) ## > +@@ > ++function RIGHT \ > ++{ > ++ echo 'ChangeMe' > ++} > + \ No newline at end of file > + > + ## t/t4018/bash-posix-style-multiline-function (new) ## > +@@ > ++RIGHT() \ > ++{ > ++ ChangeMe > ++} > + \ No newline at end of file For these new test, is it essential that these sample files end in incomplete lines? In other words, are these tests trying to make sure that the function line is correctly found even if the function body is at the end of the file that lack the final terminating LF? If that is what they are testing, please add comments near the beginning of the file to tell future developers that it is essential that they keep these files end in incomplete lines and why. If that is not what these tests are checking, then make these lines complete lines instead, as they waste future developers' time making them wonder if there are valid reasons why these files must end in incomplete lines. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns 2025-03-29 19:26 ` [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns Junio C Hamano @ 2025-03-30 12:28 ` MOUMITA DHAR 0 siblings, 0 replies; 37+ messages in thread From: MOUMITA DHAR @ 2025-03-30 12:28 UTC (permalink / raw) To: Junio C Hamano Cc: git, Johannes Sixt, Eric Sunshine, René Scharfe, Atharva Raykar, D. Ben Knoble On Sun, 30 Mar 2025 at 00:56, Junio C Hamano <gitster@pobox.com> wrote: > > Moumita <dhar61595@gmail.com> writes: > > > + ## t/t4018/bash-bashism-style-multiline-function (new) ## > > +@@ > > ++function RIGHT \ > > ++{ > > ++ echo 'ChangeMe' > > ++} > > + \ No newline at end of file > > + > > + ## t/t4018/bash-posix-style-multiline-function (new) ## > > +@@ > > ++RIGHT() \ > > ++{ > > ++ ChangeMe > > ++} > > + \ No newline at end of file > > For these new test, is it essential that these sample files end in > incomplete lines? In other words, are these tests trying to make > sure that the function line is correctly found even if the function > body is at the end of the file that lack the final terminating LF? >No , the skipping of newline at the end of the test files is not intended.The tests are only meant to check multiline function detection, ensuring that a function body starting on the next line after a line continuation \ is correctly recognized. > If that is what they are testing, please add comments near the > beginning of the file to tell future developers that it is essential > that they keep these files end in incomplete lines and why. > > If that is not what these tests are checking, then make these lines > complete lines instead, as they waste future developers' time making > them wonder if there are valid reasons why these files must end in > incomplete lines. > Yes I will do that and send the patch again . Thank you for the feedback. ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH v4 0/1][GSOC] userdiff:Added newlines at the end of the test cases 2025-03-28 20:05 ` [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns Moumita 2025-03-28 20:05 ` [PATCH v3 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita 2025-03-29 19:26 ` [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns Junio C Hamano @ 2025-03-30 13:39 ` Moumita 2025-03-30 13:39 ` [PATCH v4 1/1][GSOC] userdiff: extend Bash pattern to cover more shell function forms Moumita 2025-05-11 12:58 ` [PATCH v5 0/1] Added the closing ")" to make sure is not unbalanced and corrected the tests for word diff Moumita 2 siblings, 2 replies; 37+ messages in thread From: Moumita @ 2025-03-30 13:39 UTC (permalink / raw) To: git Cc: Moumita, Johannes Sixt, Eric Sunshine, Junio C Hamano, René Scharfe, Atharva Raykar, D. Ben Knoble While making this change, I considered whether shell scripts can lack a final newline and whether it would be useful to test this case. Technically, a shell script can be missing a final newline, and shells like Bash can still process it. However, this is uncommon and generally discouraged due to POSIX and Unix conventions and also although Bash can process such files, missing a final newline can cause unexpected behavior in edge cases—such as when concatenating scripts, where the last line of one script might merge with the first line of another. Given these considerations, I have added the missing newline to the test files to align with standard conventions and avoid unnecessary confusion for future developers. If testing function detection in a file without a final newline is considered useful , I can add a separate test specifically for that edge case. Let me know if that would be helpful. Moumita Dhar (1): userdiff: extend Bash pattern to cover more shell function forms t/t4018/bash-bashism-style-multiline-function | 4 +++ t/t4018/bash-posix-style-multiline-function | 4 +++ .../bash-posix-style-single-command-function | 3 ++ t/t4034-diff-words.sh | 1 + t/t4034/bash/expect | 30 +++++++++++++++++++ t/t4034/bash/post | 25 ++++++++++++++++ t/t4034/bash/pre | 25 ++++++++++++++++ userdiff.c | 24 +++++++++++---- 8 files changed, 110 insertions(+), 6 deletions(-) create mode 100644 t/t4018/bash-bashism-style-multiline-function create mode 100644 t/t4018/bash-posix-style-multiline-function create mode 100644 t/t4018/bash-posix-style-single-command-function create mode 100644 t/t4034/bash/expect create mode 100644 t/t4034/bash/post create mode 100644 t/t4034/bash/pre Range-diff against v3: 1: 3d077fadc4 ! 1: 40cffd3b4a userdiff: extend Bash pattern to cover more shell function forms @@ t/t4018/bash-bashism-style-multiline-function (new) +{ + echo 'ChangeMe' +} - \ No newline at end of file ## t/t4018/bash-posix-style-multiline-function (new) ## @@ @@ t/t4018/bash-posix-style-multiline-function (new) +{ + ChangeMe +} - \ No newline at end of file ## t/t4018/bash-posix-style-single-command-function (new) ## @@ -- 2.48.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH v4 1/1][GSOC] userdiff: extend Bash pattern to cover more shell function forms 2025-03-30 13:39 ` [PATCH v4 0/1][GSOC] userdiff:Added newlines at the end of the test cases Moumita @ 2025-03-30 13:39 ` Moumita 2025-05-02 21:27 ` Junio C Hamano 2025-05-06 16:30 ` Johannes Sixt 2025-05-11 12:58 ` [PATCH v5 0/1] Added the closing ")" to make sure is not unbalanced and corrected the tests for word diff Moumita 1 sibling, 2 replies; 37+ messages in thread From: Moumita @ 2025-03-30 13:39 UTC (permalink / raw) To: git Cc: Moumita Dhar, Johannes Sixt, Eric Sunshine, Junio C Hamano, René Scharfe, Atharva Raykar, D. Ben Knoble From: Moumita Dhar <dhar61595@gmail.com> The previous function regex required explicit matching of function bodies using `{`, `(`, `((`, or `[[`, which caused several issues: - It failed to capture valid functions where `{` was on the next line due to line continuation (`\`). - It did not recognize functions with single command body, such as `x () echo hello`. Replacing the function body matching logic with `.*$`, ensures that everything on the function definition line is captured, aligning with other userdiff drivers and improving hunk headers in `git diff`. Additionally, the word regex is refined to better recognize shell syntax, including additional parameter expansion operators and command-line options, improving syntax-aware diffs. Signed-off-by: Moumita Dhar <dhar61595@gmail.com> --- t/t4018/bash-bashism-style-multiline-function | 4 +++ t/t4018/bash-posix-style-multiline-function | 4 +++ .../bash-posix-style-single-command-function | 3 ++ t/t4034-diff-words.sh | 1 + t/t4034/bash/expect | 30 +++++++++++++++++++ t/t4034/bash/post | 25 ++++++++++++++++ t/t4034/bash/pre | 25 ++++++++++++++++ userdiff.c | 24 +++++++++++---- 8 files changed, 110 insertions(+), 6 deletions(-) create mode 100644 t/t4018/bash-bashism-style-multiline-function create mode 100644 t/t4018/bash-posix-style-multiline-function create mode 100644 t/t4018/bash-posix-style-single-command-function create mode 100644 t/t4034/bash/expect create mode 100644 t/t4034/bash/post create mode 100644 t/t4034/bash/pre diff --git a/t/t4018/bash-bashism-style-multiline-function b/t/t4018/bash-bashism-style-multiline-function new file mode 100644 index 0000000000..284d50dd99 --- /dev/null +++ b/t/t4018/bash-bashism-style-multiline-function @@ -0,0 +1,4 @@ +function RIGHT \ +{ + echo 'ChangeMe' +} diff --git a/t/t4018/bash-posix-style-multiline-function b/t/t4018/bash-posix-style-multiline-function new file mode 100644 index 0000000000..cc8727cbcd --- /dev/null +++ b/t/t4018/bash-posix-style-multiline-function @@ -0,0 +1,4 @@ +RIGHT() \ +{ + ChangeMe +} diff --git a/t/t4018/bash-posix-style-single-command-function b/t/t4018/bash-posix-style-single-command-function new file mode 100644 index 0000000000..398ae1c5d2 --- /dev/null +++ b/t/t4018/bash-posix-style-single-command-function @@ -0,0 +1,3 @@ +RIGHT() echo "hello" + + ChangeMe diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh index f51d3557f1..0be647c2fb 100755 --- a/t/t4034-diff-words.sh +++ b/t/t4034-diff-words.sh @@ -320,6 +320,7 @@ test_expect_success 'unset default driver' ' test_language_driver ada test_language_driver bibtex +test_language_driver bash test_language_driver cpp test_language_driver csharp test_language_driver css diff --git a/t/t4034/bash/expect b/t/t4034/bash/expect new file mode 100644 index 0000000000..a0f7cbd5a3 --- /dev/null +++ b/t/t4034/bash/expect @@ -0,0 +1,30 @@ +<BOLD>diff --git a/pre b/post<RESET> +<BOLD>index 09ac008..60ba6a2 100644<RESET> +<BOLD>--- a/pre<RESET> +<BOLD>+++ b/post<RESET> +<CYAN>@@ -1,25 +1,25 @@<RESET> +<RED>my_var<RESET><GREEN>new_var<RESET>=10 +x=<RED>123<RESET><GREEN>456<RESET> +y=<RED>3.14<RESET><GREEN>2.71<RESET> +z=<RED>.5<RESET><GREEN>.75<RESET> +echo <RED>$USER<RESET><GREEN>$USERNAME<RESET> +${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>} +if [ "<RED>$a<RESET><GREEN>$x<RESET>" == "<RED>$b<RESET><GREEN>$y<RESET>" ] || [ "<RED>$c<RESET><GREEN>$x<RESET>" != "<RED>$d<RESET><GREEN>$y<RESET>" ]; then echo "OK"; fi +((<RED>a<RESET><GREEN>x<RESET>+=<RED>b<RESET><GREEN>y<RESET>)) +((<RED>a<RESET><GREEN>x<RESET>-=<RED>b<RESET><GREEN>y<RESET>)) +$((<RED>a<RESET><GREEN>x<RESET><<<RED>b<RESET><GREEN>y<RESET>)) +$((<RED>a<RESET><GREEN>x<RESET>>><RED>b<RESET><GREEN>y<RESET>)) +${<RED>a<RESET><GREEN>x<RESET>:-<RED>b<RESET><GREEN>y<RESET>} +${<RED>a<RESET><GREEN>x<RESET>:=<RED>b<RESET><GREEN>y<RESET>} +${<RED>a<RESET><GREEN>x<RESET>##*/} +${<RED>a<RESET><GREEN>x<RESET>%.*} +${<RED>a<RESET><GREEN>x<RESET>%%.*} +${<RED>a<RESET><GREEN>x<RESET>^^} +${<RED>a<RESET><GREEN>x<RESET>,} +${<RED>a<RESET><GREEN>x<RESET>,,} +${!<RED>a<RESET><GREEN>x<RESET>} +${<RED>a<RESET><GREEN>x<RESET>[@]} +${<RED>a<RESET><GREEN>x<RESET>:?error message} +${<RED>a<RESET><GREEN>x<RESET>:2:3} +ls <RED>-a<RESET><GREEN>-x<RESET> +ls <RED>--a<RESET><GREEN>--x<RESET> diff --git a/t/t4034/bash/post b/t/t4034/bash/post new file mode 100644 index 0000000000..60ba6a2e75 --- /dev/null +++ b/t/t4034/bash/post @@ -0,0 +1,25 @@ +new_var=10 +x=456 +y=2.71 +z=.75 +echo $USERNAME +${HOMEDIR} +if [ "$x" == "$y" ] || [ "$x" != "$y" ]; then echo "OK"; fi +((x+=y)) +((x-=y)) +$((x<<y)) +$((x>>y)) +${x:-y} +${x:=y} +${x##*/} +${x%.*} +${x%%.*} +${x^^} +${x,} +${x,,} +${!x} +${x[@]} +${x:?error message} +${x:2:3} +ls -x +ls --x diff --git a/t/t4034/bash/pre b/t/t4034/bash/pre new file mode 100644 index 0000000000..09ac008a83 --- /dev/null +++ b/t/t4034/bash/pre @@ -0,0 +1,25 @@ +my_var=10 +x=123 +y=3.14 +z=.5 +echo $USER +${HOME} +if [ "$a" == "$b" ] || [ "$c" != "$d" ]; then echo "OK"; fi +((a+=b)) +((a-=b)) +$((a << b)) +$((a >> b)) +${a:-b} +${a:=b} +${a##*/} +${a%.*} +${a%%.*} +${a^^} +${a,} +${a,,} +${!a} +${a[@]} +${a:?error message} +${a:2:3} +ls -a +ls --a diff --git a/userdiff.c b/userdiff.c index 340c4eb4f7..4c77c7e0f6 100644 --- a/userdiff.c +++ b/userdiff.c @@ -64,15 +64,27 @@ PATTERNS("bash", /* Bashism identifier with optional parentheses */ "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" ")" - /* Optional whitespace */ - "[ \t]*" - /* Compound command starting with `{`, `(`, `((` or `[[` */ - "(\\{|\\(\\(?|\\[\\[)" + /* Everything after the function header is captured */ + ".*$" /* End of captured text */ ")", /* -- */ - /* Characters not in the default $IFS value */ - "[^ \t]+"), + /* Identifiers: variable and function names */ + "[a-zA-Z_][a-zA-Z0-9_]*" + /* Numeric constants: integers and decimals */ + "|[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+" + /* Shell variables: $VAR, ${VAR} */ + "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{" + /* Logical and comparison operators */ + "|\\|\\||&&|<<|>>|==|!=|<=|>=" + /* Assignment and arithmetic operators */ + "|[-+*/%&|^!=<>]=?" + /* Additional parameter expansion operators */ + "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|/[a-zA-Z0-9_-]+|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?" + /* Command-line options (to avoid splitting -option) */ + "|--?[a-zA-Z0-9_-]+" + /* Brackets and grouping symbols */ + "|\\(|\\)|\\{|\\}|\\[|\\]"), PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$", /* -- */ -- 2.48.0 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH v4 1/1][GSOC] userdiff: extend Bash pattern to cover more shell function forms 2025-03-30 13:39 ` [PATCH v4 1/1][GSOC] userdiff: extend Bash pattern to cover more shell function forms Moumita @ 2025-05-02 21:27 ` Junio C Hamano 2025-05-06 16:30 ` Johannes Sixt 1 sibling, 0 replies; 37+ messages in thread From: Junio C Hamano @ 2025-05-02 21:27 UTC (permalink / raw) To: git Cc: Moumita, Johannes Sixt, Eric Sunshine, René Scharfe, Atharva Raykar, D. Ben Knoble Moumita <dhar61595@gmail.com> writes: > From: Moumita Dhar <dhar61595@gmail.com> > > The previous function regex required explicit matching of function > bodies using `{`, `(`, `((`, or `[[`, which caused several issues: > > - It failed to capture valid functions where `{` was on the next line > due to line continuation (`\`). > - It did not recognize functions with single command body, such as > `x () echo hello`. > > Replacing the function body matching logic with `.*$`, ensures > that everything on the function definition line is captured, > aligning with other userdiff drivers and improving hunk headers in > `git diff`. > > Additionally, the word regex is refined to better recognize shell > syntax, including additional parameter expansion operators and > command-line options, improving syntax-aware diffs. > > Signed-off-by: Moumita Dhar <dhar61595@gmail.com> > --- This iteration hasn't seen any reviews or responses. Is everybody who participated in reviews in the previous rounds happy with it? Thanks. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v4 1/1][GSOC] userdiff: extend Bash pattern to cover more shell function forms 2025-03-30 13:39 ` [PATCH v4 1/1][GSOC] userdiff: extend Bash pattern to cover more shell function forms Moumita 2025-05-02 21:27 ` Junio C Hamano @ 2025-05-06 16:30 ` Johannes Sixt 2025-05-10 11:37 ` MOUMITA DHAR 1 sibling, 1 reply; 37+ messages in thread From: Johannes Sixt @ 2025-05-06 16:30 UTC (permalink / raw) To: Moumita Cc: Eric Sunshine, Junio C Hamano, René Scharfe, Atharva Raykar, D. Ben Knoble, git Am 30.03.25 um 15:39 schrieb Moumita: > From: Moumita Dhar <dhar61595@gmail.com> > > The previous function regex required explicit matching of function > bodies using `{`, `(`, `((`, or `[[`, which caused several issues: > > - It failed to capture valid functions where `{` was on the next line > due to line continuation (`\`). > - It did not recognize functions with single command body, such as > `x () echo hello`. Good. > Replacing the function body matching logic with `.*$`, ensures > that everything on the function definition line is captured, > aligning with other userdiff drivers and improving hunk headers in > `git diff`. Personally, I am a bit allergic against marketing speak. *Of course* do we intend improve the code. No need to mention it. > Additionally, the word regex is refined to better recognize shell > syntax, including additional parameter expansion operators and > command-line options, improving syntax-aware diffs. Good. (But see above about "improving".) > > Signed-off-by: Moumita Dhar <dhar61595@gmail.com> > --- > diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh > index f51d3557f1..0be647c2fb 100755 > --- a/t/t4034-diff-words.sh > +++ b/t/t4034-diff-words.sh > @@ -320,6 +320,7 @@ test_expect_success 'unset default driver' ' > > test_language_driver ada > test_language_driver bibtex > +test_language_driver bash > test_language_driver cpp > test_language_driver csharp > test_language_driver css Including a test for word-diff is very much appreciated! > diff --git a/t/t4034/bash/expect b/t/t4034/bash/expect > new file mode 100644 > index 0000000000..a0f7cbd5a3 > --- /dev/null > +++ b/t/t4034/bash/expect > @@ -0,0 +1,30 @@ > +<BOLD>diff --git a/pre b/post<RESET> > +<BOLD>index 09ac008..60ba6a2 100644<RESET> > +<BOLD>--- a/pre<RESET> > +<BOLD>+++ b/post<RESET> > +<CYAN>@@ -1,25 +1,25 @@<RESET> > +<RED>my_var<RESET><GREEN>new_var<RESET>=10 > +x=<RED>123<RESET><GREEN>456<RESET> > +y=<RED>3.14<RESET><GREEN>2.71<RESET> > +z=<RED>.5<RESET><GREEN>.75<RESET> OK. > +echo <RED>$USER<RESET><GREEN>$USERNAME<RESET> This tests a typical variable expansion. Good. > +${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>} A more elaborate variable expansion does not include the surrounding ${ }. Good. > +if [ "<RED>$a<RESET><GREEN>$x<RESET>" == "<RED>$b<RESET><GREEN>$y<RESET>" ] || [ "<RED>$c<RESET><GREEN>$x<RESET>" != "<RED>$d<RESET><GREEN>$y<RESET>" ]; then echo "OK"; fi This line also only tests variable expansions and is quite redundant. It could test the operators that we see, but it doesn't. See below for ideas how to do that. And all from here... > +((<RED>a<RESET><GREEN>x<RESET>+=<RED>b<RESET><GREEN>y<RESET>)) > +((<RED>a<RESET><GREEN>x<RESET>-=<RED>b<RESET><GREEN>y<RESET>)) > +$((<RED>a<RESET><GREEN>x<RESET><<<RED>b<RESET><GREEN>y<RESET>)) > +$((<RED>a<RESET><GREEN>x<RESET>>><RED>b<RESET><GREEN>y<RESET>)) > +${<RED>a<RESET><GREEN>x<RESET>:-<RED>b<RESET><GREEN>y<RESET>} > +${<RED>a<RESET><GREEN>x<RESET>:=<RED>b<RESET><GREEN>y<RESET>} > +${<RED>a<RESET><GREEN>x<RESET>##*/} > +${<RED>a<RESET><GREEN>x<RESET>%.*} > +${<RED>a<RESET><GREEN>x<RESET>%%.*} > +${<RED>a<RESET><GREEN>x<RESET>^^} > +${<RED>a<RESET><GREEN>x<RESET>,} > +${<RED>a<RESET><GREEN>x<RESET>,,} ... to here also only test variable expansions, but not the operators. As written, they are totally redundant and should be dropped or improved. To test, for example, that '##' is kept together as an operator, you have to have ${x#*/} in the pre-image and ${x##*/} in the post-image, so that we see ${x<RED>#<RESET><GREEN>##<RESET>*/} in the result. If '##' were two tokens (due to a bug in the pattern), we would see ${x#<GREEN>#<RESET>*/} in the result. You should go through all not-single-character operators and have a pre-image and a post-image where one characters is added or removed. The following is not a correct test: pre: ((x+=b)) post: ((x-=b)) result: ((x<RED>+=<RESET><GREEN>-=<RESET>)) because we would see this result even if the pattern has a bug that recognizes only one of += or -=, but would split the other one. A correct test would be: pre: ((x+b)) post: ((x+=b)) result: ((x<RED>+<RESET><GREEN>+=<RESET>)) > +${!<RED>a<RESET><GREEN>x<RESET>} Here, you should have no '!' in the pre-image and the '!' in the post image and not change the name. Then the test demonstrates that '!' is its own token and not merged with '${' (if that is the intent of the test; otherwise it is a redunant test). > +${<RED>a<RESET><GREEN>x<RESET>[@]} If you want to test that '@' is not merged with the brackets, then you can have, for example '*' in the pre-image and '@' in the post image. > +${<RED>a<RESET><GREEN>x<RESET>:?error message} Either redundant or another two-character operator to test. > +${<RED>a<RESET><GREEN>x<RESET>:2:3} Redundant. > +ls <RED>-a<RESET><GREEN>-x<RESET> > +ls <RED>--a<RESET><GREEN>--x<RESET> Both are good tests. > diff --git a/userdiff.c b/userdiff.c > index 340c4eb4f7..4c77c7e0f6 100644 > --- a/userdiff.c > +++ b/userdiff.c > @@ -64,15 +64,27 @@ PATTERNS("bash", > /* Bashism identifier with optional parentheses */ > "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" > ")" > - /* Optional whitespace */ > - "[ \t]*" > - /* Compound command starting with `{`, `(`, `((` or `[[` */ > - "(\\{|\\(\\(?|\\[\\[)" > + /* Everything after the function header is captured */ > + ".*$" I remember suggesting to capture everything after the function header. However, If I am not mistaken, this does not do what I intended (and as written it means that a pointless matching operation happens). The hunk header shows everything that is in the outermost parentheses (group). This catch-all, however, is even outside the outermost group and not captured. It should be above the closing parenthesis that we see in the context. To test for this, you can have one test where "RIGHT" is not the function name, but in the comment on the function header line. > /* End of captured text */ > ")", > /* -- */ > - /* Characters not in the default $IFS value */ > - "[^ \t]+"), > + /* Identifiers: variable and function names */ > + "[a-zA-Z_][a-zA-Z0-9_]*" > + /* Numeric constants: integers and decimals */ > + "|[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+" > + /* Shell variables: $VAR, ${VAR} */ > + "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{" > + /* Logical and comparison operators */ > + "|\\|\\||&&|<<|>>|==|!=|<=|>=" > + /* Assignment and arithmetic operators */ > + "|[-+*/%&|^!=<>]=?" > + /* Additional parameter expansion operators */ > + "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|/[a-zA-Z0-9_-]+|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?" What is the purpose of this "/[a-zA-Z0-9_-]+"? It would mean, for example that changes in alphanumeric path names would include the slash in the changed part. I don't think this should be here. > + /* Command-line options (to avoid splitting -option) */ > + "|--?[a-zA-Z0-9_-]+" > + /* Brackets and grouping symbols */ > + "|\\(|\\)|\\{|\\}|\\[|\\]"), > PATTERNS("bibtex", > "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$", > /* -- */ -- Hannes ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v4 1/1][GSOC] userdiff: extend Bash pattern to cover more shell function forms 2025-05-06 16:30 ` Johannes Sixt @ 2025-05-10 11:37 ` MOUMITA DHAR 2025-05-10 12:40 ` Johannes Sixt 0 siblings, 1 reply; 37+ messages in thread From: MOUMITA DHAR @ 2025-05-10 11:37 UTC (permalink / raw) To: Johannes Sixt Cc: Eric Sunshine, Junio C Hamano, René Scharfe, Atharva Raykar, D. Ben Knoble, git On Tue, 6 May 2025 at 22:00, Johannes Sixt <j6t@kdbg.org> wrote: > > Am 30.03.25 um 15:39 schrieb Moumita: > > From: Moumita Dhar <dhar61595@gmail.com> > > > > The previous function regex required explicit matching of function > > bodies using `{`, `(`, `((`, or `[[`, which caused several issues: > > > > - It failed to capture valid functions where `{` was on the next line > > due to line continuation (`\`). > > - It did not recognize functions with single command body, such as > > `x () echo hello`. > > Good. > > > Replacing the function body matching logic with `.*$`, ensures > > that everything on the function definition line is captured, > > aligning with other userdiff drivers and improving hunk headers in > > `git diff`. > > Personally, I am a bit allergic against marketing speak. *Of course* do > we intend improve the code. No need to mention it. > > > Additionally, the word regex is refined to better recognize shell > > syntax, including additional parameter expansion operators and > > command-line options, improving syntax-aware diffs. > > Good. (But see above about "improving".) > > > Yes I will be careful about the commit messages. > > Signed-off-by: Moumita Dhar <dhar61595@gmail.com> > > --- > > > diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh > > index f51d3557f1..0be647c2fb 100755 > > --- a/t/t4034-diff-words.sh > > +++ b/t/t4034-diff-words.sh > > @@ -320,6 +320,7 @@ test_expect_success 'unset default driver' ' > > > > test_language_driver ada > > test_language_driver bibtex > > +test_language_driver bash > > test_language_driver cpp > > test_language_driver csharp > > test_language_driver css > > Including a test for word-diff is very much appreciated! > > > diff --git a/t/t4034/bash/expect b/t/t4034/bash/expect > > new file mode 100644 > > index 0000000000..a0f7cbd5a3 > > --- /dev/null > > +++ b/t/t4034/bash/expect > > @@ -0,0 +1,30 @@ > > +<BOLD>diff --git a/pre b/post<RESET> > > +<BOLD>index 09ac008..60ba6a2 100644<RESET> > > +<BOLD>--- a/pre<RESET> > > +<BOLD>+++ b/post<RESET> > > +<CYAN>@@ -1,25 +1,25 @@<RESET> > > +<RED>my_var<RESET><GREEN>new_var<RESET>=10 > > +x=<RED>123<RESET><GREEN>456<RESET> > > +y=<RED>3.14<RESET><GREEN>2.71<RESET> > > +z=<RED>.5<RESET><GREEN>.75<RESET> > > OK. > > > +echo <RED>$USER<RESET><GREEN>$USERNAME<RESET> > > This tests a typical variable expansion. Good. > > > +${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>} > > A more elaborate variable expansion does not include the surrounding ${ > }. Good. > > > +if [ "<RED>$a<RESET><GREEN>$x<RESET>" == "<RED>$b<RESET><GREEN>$y<RESET>" ] || [ "<RED>$c<RESET><GREEN>$x<RESET>" != "<RED>$d<RESET><GREEN>$y<RESET>" ]; then echo "OK"; fi > > This line also only tests variable expansions and is quite redundant. It > could test the operators that we see, but it doesn't. See below for > ideas how to do that. > > And all from here... > > > +((<RED>a<RESET><GREEN>x<RESET>+=<RED>b<RESET><GREEN>y<RESET>)) > > +((<RED>a<RESET><GREEN>x<RESET>-=<RED>b<RESET><GREEN>y<RESET>)) > > +$((<RED>a<RESET><GREEN>x<RESET><<<RED>b<RESET><GREEN>y<RESET>)) > > +$((<RED>a<RESET><GREEN>x<RESET>>><RED>b<RESET><GREEN>y<RESET>)) > > +${<RED>a<RESET><GREEN>x<RESET>:-<RED>b<RESET><GREEN>y<RESET>} > > +${<RED>a<RESET><GREEN>x<RESET>:=<RED>b<RESET><GREEN>y<RESET>} > > +${<RED>a<RESET><GREEN>x<RESET>##*/} > > +${<RED>a<RESET><GREEN>x<RESET>%.*} > > +${<RED>a<RESET><GREEN>x<RESET>%%.*} > > +${<RED>a<RESET><GREEN>x<RESET>^^} > > +${<RED>a<RESET><GREEN>x<RESET>,} > > +${<RED>a<RESET><GREEN>x<RESET>,,} > > ... to here also only test variable expansions, but not the operators. > As written, they are totally redundant and should be dropped or improved. > > To test, for example, that '##' is kept together as an operator, you > have to have > > ${x#*/} > > in the pre-image and > > ${x##*/} > > in the post-image, so that we see > > ${x<RED>#<RESET><GREEN>##<RESET>*/} > > in the result. If '##' were two tokens (due to a bug in the pattern), we > would see > > ${x#<GREEN>#<RESET>*/} > > in the result. > > You should go through all not-single-character operators and have a > pre-image and a post-image where one characters is added or removed. The > following is not a correct test: > > pre: ((x+=b)) > post: ((x-=b)) > result: ((x<RED>+=<RESET><GREEN>-=<RESET>)) > > because we would see this result even if the pattern has a bug that > recognizes only one of += or -=, but would split the other one. A > correct test would be: > > pre: ((x+b)) > post: ((x+=b)) > result: ((x<RED>+<RESET><GREEN>+=<RESET>)) > > > +${!<RED>a<RESET><GREEN>x<RESET>} > > Here, you should have no '!' in the pre-image and the '!' in the post > image and not change the name. Then the test demonstrates that '!' is > its own token and not merged with '${' (if that is the intent of the > test; otherwise it is a redunant test). > > > +${<RED>a<RESET><GREEN>x<RESET>[@]} > > If you want to test that '@' is not merged with the brackets, then you > can have, for example '*' in the pre-image and '@' in the post image. > > > +${<RED>a<RESET><GREEN>x<RESET>:?error message} > > Either redundant or another two-character operator to test. Ok I got it. > > > +${<RED>a<RESET><GREEN>x<RESET>:2:3} > > Redundant. > I understand I will remove it. > > +ls <RED>-a<RESET><GREEN>-x<RESET> > > +ls <RED>--a<RESET><GREEN>--x<RESET> > > Both are good tests. Thank you. I understand the suggestions about the word diff tests and I will remove the redundant ones and write the correct tests . > > > diff --git a/userdiff.c b/userdiff.c > > index 340c4eb4f7..4c77c7e0f6 100644 > > --- a/userdiff.c > > +++ b/userdiff.c > > @@ -64,15 +64,27 @@ PATTERNS("bash", > > /* Bashism identifier with optional parentheses */ > > "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" > > ")" > > - /* Optional whitespace */ > > - "[ \t]*" > > - /* Compound command starting with `{`, `(`, `((` or `[[` */ > > - "(\\{|\\(\\(?|\\[\\[)" > > + /* Everything after the function header is captured */ > > + ".*$" > > I remember suggesting to capture everything after the function header. > However, If I am not mistaken, this does not do what I intended (and as > written it means that a pointless matching operation happens). The hunk > header shows everything that is in the outermost parentheses (group). > This catch-all, however, is even outside the outermost group and not > captured. It should be above the closing parenthesis that we see in the > context. > I am sorry I want to understand a thing , we want everything from the function name to the end of the line to be the hunk header right ? So in the pattern "^[ \t]*" /* Start of captured text */ "(" "(" /* POSIX identifier with mandatory parentheses */ "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" "|" /* Bashism identifier with optional parentheses */ "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" ")" /* Optional whitespace */ "[ \t]*" /* Compound command starting with `{`, `(`, `((` or `[[` */ "(\\{|\\(\\(?|\\[\\[)" /* End of captured text */ ")" if I replace "[ \t]*" "(\\{|\\(\\(?|\\[\\[)" part with .*$ then will it not capture the entire line ? I mean the outermost group ends here- /* End of captured text */ ")" right ? What am I getting wrong ? > To test for this, you can have one test where "RIGHT" is not the > function name, but in the comment on the function header line. > > > /* End of captured text */ > > ")", > > /* -- */ > > - /* Characters not in the default $IFS value */ > > - "[^ \t]+"), > > + /* Identifiers: variable and function names */ > > + "[a-zA-Z_][a-zA-Z0-9_]*" > > + /* Numeric constants: integers and decimals */ > > + "|[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+" > > + /* Shell variables: $VAR, ${VAR} */ > > + "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{" > > + /* Logical and comparison operators */ > > + "|\\|\\||&&|<<|>>|==|!=|<=|>=" > > + /* Assignment and arithmetic operators */ > > + "|[-+*/%&|^!=<>]=?" > > + /* Additional parameter expansion operators */ > > + "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|/[a-zA-Z0-9_-]+|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?" > > What is the purpose of this "/[a-zA-Z0-9_-]+"? It would mean, for > example that changes in alphanumeric path names would include the slash > in the changed part. I don't think this should be here. > Ok I got it. > > + /* Command-line options (to avoid splitting -option) */ > > + "|--?[a-zA-Z0-9_-]+" > > + /* Brackets and grouping symbols */ > > + "|\\(|\\)|\\{|\\}|\\[|\\]"), > > PATTERNS("bibtex", > > "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$", > > /* -- */ > > -- Hannes > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v4 1/1][GSOC] userdiff: extend Bash pattern to cover more shell function forms 2025-05-10 11:37 ` MOUMITA DHAR @ 2025-05-10 12:40 ` Johannes Sixt 0 siblings, 0 replies; 37+ messages in thread From: Johannes Sixt @ 2025-05-10 12:40 UTC (permalink / raw) To: MOUMITA DHAR Cc: Eric Sunshine, Junio C Hamano, René Scharfe, Atharva Raykar, D. Ben Knoble, git The way how you cited the message makes it very difficult to see what is citation, code, or your message text. Please insert sufficient blank lines in your replies. Also, indentation may help to declare what is the code that you talk about. Am 10.05.25 um 13:37 schrieb MOUMITA DHAR: > On Tue, 6 May 2025 at 22:00, Johannes Sixt <j6t@kdbg.org> wrote: >>> diff --git a/userdiff.c b/userdiff.c >>> index 340c4eb4f7..4c77c7e0f6 100644 >>> --- a/userdiff.c >>> +++ b/userdiff.c >>> @@ -64,15 +64,27 @@ PATTERNS("bash", >>> /* Bashism identifier with optional parentheses */ >>> "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" >>> ")" >>> - /* Optional whitespace */ >>> - "[ \t]*" >>> - /* Compound command starting with `{`, `(`, `((` or `[[` */ >>> - "(\\{|\\(\\(?|\\[\\[)" >>> + /* Everything after the function header is captured */ >>> + ".*$" >> >> I remember suggesting to capture everything after the function header. >> However, If I am not mistaken, this does not do what I intended (and as >> written it means that a pointless matching operation happens). The hunk >> header shows everything that is in the outermost parentheses (group). >> This catch-all, however, is even outside the outermost group and not >> captured. It should be above the closing parenthesis that we see in the >> context. > > I am sorry I want to understand a thing , we want everything from > the function name to the end of the line to be the hunk header > right ? So in the pattern > "^[ \t]*"> /* Start of captured text */ > "(" > "(" > /* POSIX identifier with mandatory parentheses */ > "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" > "|" > /* Bashism identifier with optional parentheses */ > "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" > ")" > /* Optional whitespace */ > "[ \t]*" > /* Compound command starting with `{`, `(`, `((` or `[[` */ > "(\\{|\\(\\(?|\\[\\[)" > /* End of captured text */ > ")" > if I replace > "[ \t]*" > "(\\{|\\(\\(?|\\[\\[)" > part with .*$ then will it not capture the > entire line ? I mean the outermost group ends here- > > /* End of captured text */ > ")" > right ? What am I getting wrong ? It is my error, sorry. I see now that your intended change was correct. However, in my tests, a comment after the function name is not captured. The reason is that the parentheses are not balanced. My editor tells me that the parenthesis in "(function" is matched with the lone ")" in the next line, but the latter is intended to match up with the "(" above the '/* POSIX' comment. There is something fishy going on and needs a fix. To test, apply this: diff --git a/t/t4018/bash-posix-style-function b/t/t4018/bash-posix-style-function index a4d144856e..673c51b89e 100644 --- a/t/t4018/bash-posix-style-function +++ b/t/t4018/bash-posix-style-function @@ -1,4 +1,4 @@ -RIGHT() { +afunc () { # RIGHT ChangeMe } -- Hannes ^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 0/1] Added the closing ")" to make sure is not unbalanced and corrected the tests for word diff 2025-03-30 13:39 ` [PATCH v4 0/1][GSOC] userdiff:Added newlines at the end of the test cases Moumita 2025-03-30 13:39 ` [PATCH v4 1/1][GSOC] userdiff: extend Bash pattern to cover more shell function forms Moumita @ 2025-05-11 12:58 ` Moumita 2025-05-11 12:58 ` [PATCH v5 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita ` (3 more replies) 1 sibling, 4 replies; 37+ messages in thread From: Moumita @ 2025-05-11 12:58 UTC (permalink / raw) To: git; +Cc: Moumita, Johannes Sixt, Eric Sunshine, Junio C Hamano I wrapped the POSIX function pattern in an extra set of parentheses and added a closing parenthesis in the Bashism function case to balance the opening group ,this makes the grouping explicit and consistent with the outer structure of the regex. Also I added the test case so that everything on the function definition line is captured correctly. And corrected the word diff test so that the operators like '+=' are treated as a single token. Moumita Dhar (1): userdiff: extend Bash pattern to cover more shell function forms t/t4018/bash-bashism-style-multiline-function | 4 ++ .../bash-hunk-header-complete-line-capture | 4 ++ t/t4018/bash-posix-style-multiline-function | 4 ++ .../bash-posix-style-single-command-function | 3 ++ t/t4034-diff-words.sh | 1 + t/t4034/bash/expect | 38 +++++++++++++++++++ t/t4034/bash/post | 33 ++++++++++++++++ t/t4034/bash/pre | 33 ++++++++++++++++ userdiff.c | 28 ++++++++++---- 9 files changed, 140 insertions(+), 8 deletions(-) create mode 100644 t/t4018/bash-bashism-style-multiline-function create mode 100644 t/t4018/bash-hunk-header-complete-line-capture create mode 100644 t/t4018/bash-posix-style-multiline-function create mode 100644 t/t4018/bash-posix-style-single-command-function create mode 100644 t/t4034/bash/expect create mode 100644 t/t4034/bash/post create mode 100644 t/t4034/bash/pre Range-diff against v4: 1: 40cffd3b4a ! 1: 478b77e0c8 userdiff: extend Bash pattern to cover more shell function forms @@ Commit message `x () echo hello`. Replacing the function body matching logic with `.*$`, ensures - that everything on the function definition line is captured, - aligning with other userdiff drivers and improving hunk headers in - `git diff`. + that everything on the function definition line is captured. Additionally, the word regex is refined to better recognize shell syntax, including additional parameter expansion operators and - command-line options, improving syntax-aware diffs. + command-line options. Signed-off-by: Moumita Dhar <dhar61595@gmail.com> @@ t/t4018/bash-bashism-style-multiline-function (new) + echo 'ChangeMe' +} + ## t/t4018/bash-hunk-header-complete-line-capture (new) ## +@@ ++func() { # RIGHT ++ ++ ChangeMe ++} + \ No newline at end of file + ## t/t4018/bash-posix-style-multiline-function (new) ## @@ +RIGHT() \ @@ t/t4034/bash/expect (new) +<BOLD>index 09ac008..60ba6a2 100644<RESET> +<BOLD>--- a/pre<RESET> +<BOLD>+++ b/post<RESET> -+<CYAN>@@ -1,25 +1,25 @@<RESET> ++<CYAN>@@ -1,33 +1,33 @@<RESET> +<RED>my_var<RESET><GREEN>new_var<RESET>=10 +x=<RED>123<RESET><GREEN>456<RESET> +y=<RED>3.14<RESET><GREEN>2.71<RESET> +z=<RED>.5<RESET><GREEN>.75<RESET> +echo <RED>$USER<RESET><GREEN>$USERNAME<RESET> +${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>} -+if [ "<RED>$a<RESET><GREEN>$x<RESET>" == "<RED>$b<RESET><GREEN>$y<RESET>" ] || [ "<RED>$c<RESET><GREEN>$x<RESET>" != "<RED>$d<RESET><GREEN>$y<RESET>" ]; then echo "OK"; fi -+((<RED>a<RESET><GREEN>x<RESET>+=<RED>b<RESET><GREEN>y<RESET>)) -+((<RED>a<RESET><GREEN>x<RESET>-=<RED>b<RESET><GREEN>y<RESET>)) -+$((<RED>a<RESET><GREEN>x<RESET><<<RED>b<RESET><GREEN>y<RESET>)) -+$((<RED>a<RESET><GREEN>x<RESET>>><RED>b<RESET><GREEN>y<RESET>)) -+${<RED>a<RESET><GREEN>x<RESET>:-<RED>b<RESET><GREEN>y<RESET>} -+${<RED>a<RESET><GREEN>x<RESET>:=<RED>b<RESET><GREEN>y<RESET>} -+${<RED>a<RESET><GREEN>x<RESET>##*/} -+${<RED>a<RESET><GREEN>x<RESET>%.*} -+${<RED>a<RESET><GREEN>x<RESET>%%.*} -+${<RED>a<RESET><GREEN>x<RESET>^^} -+${<RED>a<RESET><GREEN>x<RESET>,} -+${<RED>a<RESET><GREEN>x<RESET>,,} -+${!<RED>a<RESET><GREEN>x<RESET>} -+${<RED>a<RESET><GREEN>x<RESET>[@]} -+${<RED>a<RESET><GREEN>x<RESET>:?error message} -+${<RED>a<RESET><GREEN>x<RESET>:2:3} ++((a<RED>+<RESET><GREEN>+=<RESET>b)) ++((a<RED>*<RESET><GREEN>*=<RESET>b)) ++((a<RED>/<RESET><GREEN>/=<RESET>b)) ++((a<RED>%<RESET><GREEN>%=<RESET>b)) ++((a<RED>|<RESET><GREEN>|=<RESET>b)) ++((a<RED>^<RESET><GREEN>^=<RESET>b)) ++((a<RED>=<RESET><GREEN>==<RESET>b)) ++((a<RED>!<RESET><GREEN>!=<RESET>b)) ++((a<RED><<RESET><GREEN><=<RESET>b)) ++((a<RED>><RESET><GREEN>>=<RESET>b)) ++$((a<RED><<RESET><GREEN><<<RESET>b)) ++$((a<RED>><RESET><GREEN>>><RESET>b)) ++$((a<RED>&<RESET><GREEN>&&<RESET>b)) ++$((a<RED>|<RESET><GREEN>||<RESET>b)) ++${a<RED>:<RESET><GREEN>:-<RESET>b} ++${a<RED>:<RESET><GREEN>:=<RESET>b} ++${a<RED>:<RESET><GREEN>:+<RESET>b} ++${a<RED>:<RESET><GREEN>:?<RESET>b} ++${a<RED>#<RESET><GREEN>##<RESET>*/} ++${a<RED>%<RESET><GREEN>%%<RESET>.*} ++${a<RED>^<RESET><GREEN>^^<RESET>} ++${a<RED>,<RESET><GREEN>,,<RESET>} ++${<GREEN>!<RESET>a} ++${a[<RED>*<RESET><GREEN>@<RESET>]} ++${a<RED>:2:3<RESET><GREEN>:4:6<RESET>} +ls <RED>-a<RESET><GREEN>-x<RESET> +ls <RED>--a<RESET><GREEN>--x<RESET> @@ t/t4034/bash/post (new) +z=.75 +echo $USERNAME +${HOMEDIR} -+if [ "$x" == "$y" ] || [ "$x" != "$y" ]; then echo "OK"; fi -+((x+=y)) -+((x-=y)) -+$((x<<y)) -+$((x>>y)) -+${x:-y} -+${x:=y} -+${x##*/} -+${x%.*} -+${x%%.*} -+${x^^} -+${x,} -+${x,,} -+${!x} -+${x[@]} -+${x:?error message} -+${x:2:3} ++((a+=b)) ++((a*=b)) ++((a/=b)) ++((a%=b)) ++((a|=b)) ++((a^=b)) ++((a==b)) ++((a!=b)) ++((a<=b)) ++((a>=b)) ++$((a<<b)) ++$((a>>b)) ++$((a&&b)) ++$((a||b)) ++${a:-b} ++${a:=b} ++${a:+b} ++${a:?b} ++${a##*/} ++${a%%.*} ++${a^^} ++${a,,} ++${!a} ++${a[@]} ++${a:4:6} +ls -x +ls --x @@ t/t4034/bash/pre (new) +z=.5 +echo $USER +${HOME} -+if [ "$a" == "$b" ] || [ "$c" != "$d" ]; then echo "OK"; fi -+((a+=b)) -+((a-=b)) -+$((a << b)) -+$((a >> b)) -+${a:-b} -+${a:=b} -+${a##*/} ++((a+b)) ++((a*b)) ++((a/b)) ++((a%b)) ++((a|b)) ++((a^b)) ++((a=b)) ++((a!b)) ++((a<b)) ++((a>b)) ++$((a<b)) ++$((a>b)) ++$((a&b)) ++$((a|b)) ++${a:b} ++${a:b} ++${a:b} ++${a:b} ++${a#*/} +${a%.*} -+${a%%.*} -+${a^^} ++${a^} +${a,} -+${a,,} -+${!a} -+${a[@]} -+${a:?error message} ++${a} ++${a[*]} +${a:2:3} +ls -a +ls --a ## userdiff.c ## @@ userdiff.c: PATTERNS("bash", + "(" + "(" + /* POSIX identifier with mandatory parentheses */ +- "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" ++ "([a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" + "|" /* Bashism identifier with optional parentheses */ - "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" +- "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" ++ "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+)))" ")" - /* Optional whitespace */ - "[ \t]*" @@ userdiff.c: PATTERNS("bash", + /* Assignment and arithmetic operators */ + "|[-+*/%&|^!=<>]=?" + /* Additional parameter expansion operators */ -+ "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|/[a-zA-Z0-9_-]+|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?" ++ "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?" + /* Command-line options (to avoid splitting -option) */ + "|--?[a-zA-Z0-9_-]+" + /* Brackets and grouping symbols */ -- 2.48.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH v5 1/1] userdiff: extend Bash pattern to cover more shell function forms 2025-05-11 12:58 ` [PATCH v5 0/1] Added the closing ")" to make sure is not unbalanced and corrected the tests for word diff Moumita @ 2025-05-11 12:58 ` Moumita 2025-05-11 13:28 ` Moumita ` (2 subsequent siblings) 3 siblings, 0 replies; 37+ messages in thread From: Moumita @ 2025-05-11 12:58 UTC (permalink / raw) To: git; +Cc: Moumita Dhar, Johannes Sixt, Eric Sunshine, Junio C Hamano From: Moumita Dhar <dhar61595@gmail.com> The previous function regex required explicit matching of function bodies using `{`, `(`, `((`, or `[[`, which caused several issues: - It failed to capture valid functions where `{` was on the next line due to line continuation (`\`). - It did not recognize functions with single command body, such as `x () echo hello`. Replacing the function body matching logic with `.*$`, ensures that everything on the function definition line is captured. Additionally, the word regex is refined to better recognize shell syntax, including additional parameter expansion operators and command-line options. Signed-off-by: Moumita Dhar <dhar61595@gmail.com> --- t/t4018/bash-bashism-style-multiline-function | 4 ++ .../bash-hunk-header-complete-line-capture | 4 ++ t/t4018/bash-posix-style-multiline-function | 4 ++ .../bash-posix-style-single-command-function | 3 ++ t/t4034-diff-words.sh | 1 + t/t4034/bash/expect | 38 +++++++++++++++++++ t/t4034/bash/post | 33 ++++++++++++++++ t/t4034/bash/pre | 33 ++++++++++++++++ userdiff.c | 28 ++++++++++---- 9 files changed, 140 insertions(+), 8 deletions(-) create mode 100644 t/t4018/bash-bashism-style-multiline-function create mode 100644 t/t4018/bash-hunk-header-complete-line-capture create mode 100644 t/t4018/bash-posix-style-multiline-function create mode 100644 t/t4018/bash-posix-style-single-command-function create mode 100644 t/t4034/bash/expect create mode 100644 t/t4034/bash/post create mode 100644 t/t4034/bash/pre diff --git a/t/t4018/bash-bashism-style-multiline-function b/t/t4018/bash-bashism-style-multiline-function new file mode 100644 index 0000000000..284d50dd99 --- /dev/null +++ b/t/t4018/bash-bashism-style-multiline-function @@ -0,0 +1,4 @@ +function RIGHT \ +{ + echo 'ChangeMe' +} diff --git a/t/t4018/bash-hunk-header-complete-line-capture b/t/t4018/bash-hunk-header-complete-line-capture new file mode 100644 index 0000000000..818c8c5a5f --- /dev/null +++ b/t/t4018/bash-hunk-header-complete-line-capture @@ -0,0 +1,4 @@ +func() { # RIGHT + + ChangeMe +} \ No newline at end of file diff --git a/t/t4018/bash-posix-style-multiline-function b/t/t4018/bash-posix-style-multiline-function new file mode 100644 index 0000000000..cc8727cbcd --- /dev/null +++ b/t/t4018/bash-posix-style-multiline-function @@ -0,0 +1,4 @@ +RIGHT() \ +{ + ChangeMe +} diff --git a/t/t4018/bash-posix-style-single-command-function b/t/t4018/bash-posix-style-single-command-function new file mode 100644 index 0000000000..398ae1c5d2 --- /dev/null +++ b/t/t4018/bash-posix-style-single-command-function @@ -0,0 +1,3 @@ +RIGHT() echo "hello" + + ChangeMe diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh index f51d3557f1..0be647c2fb 100755 --- a/t/t4034-diff-words.sh +++ b/t/t4034-diff-words.sh @@ -320,6 +320,7 @@ test_expect_success 'unset default driver' ' test_language_driver ada test_language_driver bibtex +test_language_driver bash test_language_driver cpp test_language_driver csharp test_language_driver css diff --git a/t/t4034/bash/expect b/t/t4034/bash/expect new file mode 100644 index 0000000000..17755e455f --- /dev/null +++ b/t/t4034/bash/expect @@ -0,0 +1,38 @@ +<BOLD>diff --git a/pre b/post<RESET> +<BOLD>index 09ac008..60ba6a2 100644<RESET> +<BOLD>--- a/pre<RESET> +<BOLD>+++ b/post<RESET> +<CYAN>@@ -1,33 +1,33 @@<RESET> +<RED>my_var<RESET><GREEN>new_var<RESET>=10 +x=<RED>123<RESET><GREEN>456<RESET> +y=<RED>3.14<RESET><GREEN>2.71<RESET> +z=<RED>.5<RESET><GREEN>.75<RESET> +echo <RED>$USER<RESET><GREEN>$USERNAME<RESET> +${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>} +((a<RED>+<RESET><GREEN>+=<RESET>b)) +((a<RED>*<RESET><GREEN>*=<RESET>b)) +((a<RED>/<RESET><GREEN>/=<RESET>b)) +((a<RED>%<RESET><GREEN>%=<RESET>b)) +((a<RED>|<RESET><GREEN>|=<RESET>b)) +((a<RED>^<RESET><GREEN>^=<RESET>b)) +((a<RED>=<RESET><GREEN>==<RESET>b)) +((a<RED>!<RESET><GREEN>!=<RESET>b)) +((a<RED><<RESET><GREEN><=<RESET>b)) +((a<RED>><RESET><GREEN>>=<RESET>b)) +$((a<RED><<RESET><GREEN><<<RESET>b)) +$((a<RED>><RESET><GREEN>>><RESET>b)) +$((a<RED>&<RESET><GREEN>&&<RESET>b)) +$((a<RED>|<RESET><GREEN>||<RESET>b)) +${a<RED>:<RESET><GREEN>:-<RESET>b} +${a<RED>:<RESET><GREEN>:=<RESET>b} +${a<RED>:<RESET><GREEN>:+<RESET>b} +${a<RED>:<RESET><GREEN>:?<RESET>b} +${a<RED>#<RESET><GREEN>##<RESET>*/} +${a<RED>%<RESET><GREEN>%%<RESET>.*} +${a<RED>^<RESET><GREEN>^^<RESET>} +${a<RED>,<RESET><GREEN>,,<RESET>} +${<GREEN>!<RESET>a} +${a[<RED>*<RESET><GREEN>@<RESET>]} +${a<RED>:2:3<RESET><GREEN>:4:6<RESET>} +ls <RED>-a<RESET><GREEN>-x<RESET> +ls <RED>--a<RESET><GREEN>--x<RESET> diff --git a/t/t4034/bash/post b/t/t4034/bash/post new file mode 100644 index 0000000000..669e218c30 --- /dev/null +++ b/t/t4034/bash/post @@ -0,0 +1,33 @@ +new_var=10 +x=456 +y=2.71 +z=.75 +echo $USERNAME +${HOMEDIR} +((a+=b)) +((a*=b)) +((a/=b)) +((a%=b)) +((a|=b)) +((a^=b)) +((a==b)) +((a!=b)) +((a<=b)) +((a>=b)) +$((a<<b)) +$((a>>b)) +$((a&&b)) +$((a||b)) +${a:-b} +${a:=b} +${a:+b} +${a:?b} +${a##*/} +${a%%.*} +${a^^} +${a,,} +${!a} +${a[@]} +${a:4:6} +ls -x +ls --x diff --git a/t/t4034/bash/pre b/t/t4034/bash/pre new file mode 100644 index 0000000000..ada8470bac --- /dev/null +++ b/t/t4034/bash/pre @@ -0,0 +1,33 @@ +my_var=10 +x=123 +y=3.14 +z=.5 +echo $USER +${HOME} +((a+b)) +((a*b)) +((a/b)) +((a%b)) +((a|b)) +((a^b)) +((a=b)) +((a!b)) +((a<b)) +((a>b)) +$((a<b)) +$((a>b)) +$((a&b)) +$((a|b)) +${a:b} +${a:b} +${a:b} +${a:b} +${a#*/} +${a%.*} +${a^} +${a,} +${a} +${a[*]} +${a:2:3} +ls -a +ls --a diff --git a/userdiff.c b/userdiff.c index 340c4eb4f7..655c8fe0b1 100644 --- a/userdiff.c +++ b/userdiff.c @@ -59,20 +59,32 @@ PATTERNS("bash", "(" "(" /* POSIX identifier with mandatory parentheses */ - "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" + "([a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" "|" /* Bashism identifier with optional parentheses */ - "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" + "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+)))" ")" - /* Optional whitespace */ - "[ \t]*" - /* Compound command starting with `{`, `(`, `((` or `[[` */ - "(\\{|\\(\\(?|\\[\\[)" + /* Everything after the function header is captured */ + ".*$" /* End of captured text */ ")", /* -- */ - /* Characters not in the default $IFS value */ - "[^ \t]+"), + /* Identifiers: variable and function names */ + "[a-zA-Z_][a-zA-Z0-9_]*" + /* Numeric constants: integers and decimals */ + "|[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+" + /* Shell variables: $VAR, ${VAR} */ + "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{" + /* Logical and comparison operators */ + "|\\|\\||&&|<<|>>|==|!=|<=|>=" + /* Assignment and arithmetic operators */ + "|[-+*/%&|^!=<>]=?" + /* Additional parameter expansion operators */ + "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?" + /* Command-line options (to avoid splitting -option) */ + "|--?[a-zA-Z0-9_-]+" + /* Brackets and grouping symbols */ + "|\\(|\\)|\\{|\\}|\\[|\\]"), PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$", /* -- */ -- 2.48.0 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* (no subject) 2025-05-11 12:58 ` [PATCH v5 0/1] Added the closing ")" to make sure is not unbalanced and corrected the tests for word diff Moumita 2025-05-11 12:58 ` [PATCH v5 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita @ 2025-05-11 13:28 ` Moumita 2025-05-11 13:28 ` Moumita 2025-05-11 13:37 ` Moumita 2025-05-11 14:11 ` [PATCH v6 0/1] Added the newline after the test in t/4018 Moumita 3 siblings, 1 reply; 37+ messages in thread From: Moumita @ 2025-05-11 13:28 UTC (permalink / raw) To: git; +Cc: Moumita, Johannes Sixt, Eric Sunshine, Junio C Hamano Subject: [PATCH v6 0/1] Added the newline after the test in t/4018 Sorry , I forgot to add the newline after the test in t/4018 again so I had send the patch agaain. Moumita Dhar (1): userdiff: extend Bash pattern to cover more shell function forms t/t4018/bash-bashism-style-multiline-function | 4 ++ .../bash-hunk-header-complete-line-capture | 4 ++ t/t4018/bash-posix-style-multiline-function | 4 ++ .../bash-posix-style-single-command-function | 3 ++ t/t4034-diff-words.sh | 1 + t/t4034/bash/expect | 38 +++++++++++++++++++ t/t4034/bash/post | 33 ++++++++++++++++ t/t4034/bash/pre | 33 ++++++++++++++++ userdiff.c | 28 ++++++++++---- 9 files changed, 140 insertions(+), 8 deletions(-) create mode 100644 t/t4018/bash-bashism-style-multiline-function create mode 100644 t/t4018/bash-hunk-header-complete-line-capture create mode 100644 t/t4018/bash-posix-style-multiline-function create mode 100644 t/t4018/bash-posix-style-single-command-function create mode 100644 t/t4034/bash/expect create mode 100644 t/t4034/bash/post create mode 100644 t/t4034/bash/pre Range-diff against v5: 1: 40cffd3b4a ! 1: 464cb8a1eb userdiff: extend Bash pattern to cover more shell function forms @@ Commit message `x () echo hello`. Replacing the function body matching logic with `.*$`, ensures - that everything on the function definition line is captured, - aligning with other userdiff drivers and improving hunk headers in - `git diff`. + that everything on the function definition line is captured. Additionally, the word regex is refined to better recognize shell syntax, including additional parameter expansion operators and - command-line options, improving syntax-aware diffs. + command-line options. Signed-off-by: Moumita Dhar <dhar61595@gmail.com> @@ t/t4018/bash-bashism-style-multiline-function (new) +function RIGHT \ +{ + echo 'ChangeMe' ++} + + ## t/t4018/bash-hunk-header-complete-line-capture (new) ## +@@ ++func() { # RIGHT ++ ++ ChangeMe +} ## t/t4018/bash-posix-style-multiline-function (new) ## @@ t/t4034/bash/expect (new) +<BOLD>index 09ac008..60ba6a2 100644<RESET> +<BOLD>--- a/pre<RESET> +<BOLD>+++ b/post<RESET> -+<CYAN>@@ -1,25 +1,25 @@<RESET> ++<CYAN>@@ -1,33 +1,33 @@<RESET> +<RED>my_var<RESET><GREEN>new_var<RESET>=10 +x=<RED>123<RESET><GREEN>456<RESET> +y=<RED>3.14<RESET><GREEN>2.71<RESET> +z=<RED>.5<RESET><GREEN>.75<RESET> +echo <RED>$USER<RESET><GREEN>$USERNAME<RESET> +${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>} -+if [ "<RED>$a<RESET><GREEN>$x<RESET>" == "<RED>$b<RESET><GREEN>$y<RESET>" ] || [ "<RED>$c<RESET><GREEN>$x<RESET>" != "<RED>$d<RESET><GREEN>$y<RESET>" ]; then echo "OK"; fi -+((<RED>a<RESET><GREEN>x<RESET>+=<RED>b<RESET><GREEN>y<RESET>)) -+((<RED>a<RESET><GREEN>x<RESET>-=<RED>b<RESET><GREEN>y<RESET>)) -+$((<RED>a<RESET><GREEN>x<RESET><<<RED>b<RESET><GREEN>y<RESET>)) -+$((<RED>a<RESET><GREEN>x<RESET>>><RED>b<RESET><GREEN>y<RESET>)) -+${<RED>a<RESET><GREEN>x<RESET>:-<RED>b<RESET><GREEN>y<RESET>} -+${<RED>a<RESET><GREEN>x<RESET>:=<RED>b<RESET><GREEN>y<RESET>} -+${<RED>a<RESET><GREEN>x<RESET>##*/} -+${<RED>a<RESET><GREEN>x<RESET>%.*} -+${<RED>a<RESET><GREEN>x<RESET>%%.*} -+${<RED>a<RESET><GREEN>x<RESET>^^} -+${<RED>a<RESET><GREEN>x<RESET>,} -+${<RED>a<RESET><GREEN>x<RESET>,,} -+${!<RED>a<RESET><GREEN>x<RESET>} -+${<RED>a<RESET><GREEN>x<RESET>[@]} -+${<RED>a<RESET><GREEN>x<RESET>:?error message} -+${<RED>a<RESET><GREEN>x<RESET>:2:3} ++((a<RED>+<RESET><GREEN>+=<RESET>b)) ++((a<RED>*<RESET><GREEN>*=<RESET>b)) ++((a<RED>/<RESET><GREEN>/=<RESET>b)) ++((a<RED>%<RESET><GREEN>%=<RESET>b)) ++((a<RED>|<RESET><GREEN>|=<RESET>b)) ++((a<RED>^<RESET><GREEN>^=<RESET>b)) ++((a<RED>=<RESET><GREEN>==<RESET>b)) ++((a<RED>!<RESET><GREEN>!=<RESET>b)) ++((a<RED><<RESET><GREEN><=<RESET>b)) ++((a<RED>><RESET><GREEN>>=<RESET>b)) ++$((a<RED><<RESET><GREEN><<<RESET>b)) ++$((a<RED>><RESET><GREEN>>><RESET>b)) ++$((a<RED>&<RESET><GREEN>&&<RESET>b)) ++$((a<RED>|<RESET><GREEN>||<RESET>b)) ++${a<RED>:<RESET><GREEN>:-<RESET>b} ++${a<RED>:<RESET><GREEN>:=<RESET>b} ++${a<RED>:<RESET><GREEN>:+<RESET>b} ++${a<RED>:<RESET><GREEN>:?<RESET>b} ++${a<RED>#<RESET><GREEN>##<RESET>*/} ++${a<RED>%<RESET><GREEN>%%<RESET>.*} ++${a<RED>^<RESET><GREEN>^^<RESET>} ++${a<RED>,<RESET><GREEN>,,<RESET>} ++${<GREEN>!<RESET>a} ++${a[<RED>*<RESET><GREEN>@<RESET>]} ++${a<RED>:2:3<RESET><GREEN>:4:6<RESET>} +ls <RED>-a<RESET><GREEN>-x<RESET> +ls <RED>--a<RESET><GREEN>--x<RESET> @@ t/t4034/bash/post (new) +z=.75 +echo $USERNAME +${HOMEDIR} -+if [ "$x" == "$y" ] || [ "$x" != "$y" ]; then echo "OK"; fi -+((x+=y)) -+((x-=y)) -+$((x<<y)) -+$((x>>y)) -+${x:-y} -+${x:=y} -+${x##*/} -+${x%.*} -+${x%%.*} -+${x^^} -+${x,} -+${x,,} -+${!x} -+${x[@]} -+${x:?error message} -+${x:2:3} ++((a+=b)) ++((a*=b)) ++((a/=b)) ++((a%=b)) ++((a|=b)) ++((a^=b)) ++((a==b)) ++((a!=b)) ++((a<=b)) ++((a>=b)) ++$((a<<b)) ++$((a>>b)) ++$((a&&b)) ++$((a||b)) ++${a:-b} ++${a:=b} ++${a:+b} ++${a:?b} ++${a##*/} ++${a%%.*} ++${a^^} ++${a,,} ++${!a} ++${a[@]} ++${a:4:6} +ls -x +ls --x @@ t/t4034/bash/pre (new) +z=.5 +echo $USER +${HOME} -+if [ "$a" == "$b" ] || [ "$c" != "$d" ]; then echo "OK"; fi -+((a+=b)) -+((a-=b)) -+$((a << b)) -+$((a >> b)) -+${a:-b} -+${a:=b} -+${a##*/} ++((a+b)) ++((a*b)) ++((a/b)) ++((a%b)) ++((a|b)) ++((a^b)) ++((a=b)) ++((a!b)) ++((a<b)) ++((a>b)) ++$((a<b)) ++$((a>b)) ++$((a&b)) ++$((a|b)) ++${a:b} ++${a:b} ++${a:b} ++${a:b} ++${a#*/} +${a%.*} -+${a%%.*} -+${a^^} ++${a^} +${a,} -+${a,,} -+${!a} -+${a[@]} -+${a:?error message} ++${a} ++${a[*]} +${a:2:3} +ls -a +ls --a ## userdiff.c ## @@ userdiff.c: PATTERNS("bash", + "(" + "(" + /* POSIX identifier with mandatory parentheses */ +- "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" ++ "([a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" + "|" /* Bashism identifier with optional parentheses */ - "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" +- "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" ++ "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+)))" ")" - /* Optional whitespace */ - "[ \t]*" @@ userdiff.c: PATTERNS("bash", + /* Assignment and arithmetic operators */ + "|[-+*/%&|^!=<>]=?" + /* Additional parameter expansion operators */ -+ "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|/[a-zA-Z0-9_-]+|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?" ++ "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?" + /* Command-line options (to avoid splitting -option) */ + "|--?[a-zA-Z0-9_-]+" + /* Brackets and grouping symbols */ -- 2.48.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* (no subject) 2025-05-11 13:28 ` Moumita @ 2025-05-11 13:28 ` Moumita 0 siblings, 0 replies; 37+ messages in thread From: Moumita @ 2025-05-11 13:28 UTC (permalink / raw) To: git; +Cc: Moumita Dhar, Johannes Sixt, Eric Sunshine, Junio C Hamano From: Moumita Dhar <dhar61595@gmail.com> Subject: [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell function forms The previous function regex required explicit matching of function bodies using `{`, `(`, `((`, or `[[`, which caused several issues: - It failed to capture valid functions where `{` was on the next line due to line continuation (`\`). - It did not recognize functions with single command body, such as `x () echo hello`. Replacing the function body matching logic with `.*$`, ensures that everything on the function definition line is captured. Additionally, the word regex is refined to better recognize shell syntax, including additional parameter expansion operators and command-line options. Signed-off-by: Moumita Dhar <dhar61595@gmail.com> --- t/t4018/bash-bashism-style-multiline-function | 4 ++ .../bash-hunk-header-complete-line-capture | 4 ++ t/t4018/bash-posix-style-multiline-function | 4 ++ .../bash-posix-style-single-command-function | 3 ++ t/t4034-diff-words.sh | 1 + t/t4034/bash/expect | 38 +++++++++++++++++++ t/t4034/bash/post | 33 ++++++++++++++++ t/t4034/bash/pre | 33 ++++++++++++++++ userdiff.c | 28 ++++++++++---- 9 files changed, 140 insertions(+), 8 deletions(-) create mode 100644 t/t4018/bash-bashism-style-multiline-function create mode 100644 t/t4018/bash-hunk-header-complete-line-capture create mode 100644 t/t4018/bash-posix-style-multiline-function create mode 100644 t/t4018/bash-posix-style-single-command-function create mode 100644 t/t4034/bash/expect create mode 100644 t/t4034/bash/post create mode 100644 t/t4034/bash/pre diff --git a/t/t4018/bash-bashism-style-multiline-function b/t/t4018/bash-bashism-style-multiline-function new file mode 100644 index 0000000000..284d50dd99 --- /dev/null +++ b/t/t4018/bash-bashism-style-multiline-function @@ -0,0 +1,4 @@ +function RIGHT \ +{ + echo 'ChangeMe' +} diff --git a/t/t4018/bash-hunk-header-complete-line-capture b/t/t4018/bash-hunk-header-complete-line-capture new file mode 100644 index 0000000000..b56942f322 --- /dev/null +++ b/t/t4018/bash-hunk-header-complete-line-capture @@ -0,0 +1,4 @@ +func() { # RIGHT + + ChangeMe +} diff --git a/t/t4018/bash-posix-style-multiline-function b/t/t4018/bash-posix-style-multiline-function new file mode 100644 index 0000000000..cc8727cbcd --- /dev/null +++ b/t/t4018/bash-posix-style-multiline-function @@ -0,0 +1,4 @@ +RIGHT() \ +{ + ChangeMe +} diff --git a/t/t4018/bash-posix-style-single-command-function b/t/t4018/bash-posix-style-single-command-function new file mode 100644 index 0000000000..398ae1c5d2 --- /dev/null +++ b/t/t4018/bash-posix-style-single-command-function @@ -0,0 +1,3 @@ +RIGHT() echo "hello" + + ChangeMe diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh index f51d3557f1..0be647c2fb 100755 --- a/t/t4034-diff-words.sh +++ b/t/t4034-diff-words.sh @@ -320,6 +320,7 @@ test_expect_success 'unset default driver' ' test_language_driver ada test_language_driver bibtex +test_language_driver bash test_language_driver cpp test_language_driver csharp test_language_driver css diff --git a/t/t4034/bash/expect b/t/t4034/bash/expect new file mode 100644 index 0000000000..17755e455f --- /dev/null +++ b/t/t4034/bash/expect @@ -0,0 +1,38 @@ +<BOLD>diff --git a/pre b/post<RESET> +<BOLD>index 09ac008..60ba6a2 100644<RESET> +<BOLD>--- a/pre<RESET> +<BOLD>+++ b/post<RESET> +<CYAN>@@ -1,33 +1,33 @@<RESET> +<RED>my_var<RESET><GREEN>new_var<RESET>=10 +x=<RED>123<RESET><GREEN>456<RESET> +y=<RED>3.14<RESET><GREEN>2.71<RESET> +z=<RED>.5<RESET><GREEN>.75<RESET> +echo <RED>$USER<RESET><GREEN>$USERNAME<RESET> +${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>} +((a<RED>+<RESET><GREEN>+=<RESET>b)) +((a<RED>*<RESET><GREEN>*=<RESET>b)) +((a<RED>/<RESET><GREEN>/=<RESET>b)) +((a<RED>%<RESET><GREEN>%=<RESET>b)) +((a<RED>|<RESET><GREEN>|=<RESET>b)) +((a<RED>^<RESET><GREEN>^=<RESET>b)) +((a<RED>=<RESET><GREEN>==<RESET>b)) +((a<RED>!<RESET><GREEN>!=<RESET>b)) +((a<RED><<RESET><GREEN><=<RESET>b)) +((a<RED>><RESET><GREEN>>=<RESET>b)) +$((a<RED><<RESET><GREEN><<<RESET>b)) +$((a<RED>><RESET><GREEN>>><RESET>b)) +$((a<RED>&<RESET><GREEN>&&<RESET>b)) +$((a<RED>|<RESET><GREEN>||<RESET>b)) +${a<RED>:<RESET><GREEN>:-<RESET>b} +${a<RED>:<RESET><GREEN>:=<RESET>b} +${a<RED>:<RESET><GREEN>:+<RESET>b} +${a<RED>:<RESET><GREEN>:?<RESET>b} +${a<RED>#<RESET><GREEN>##<RESET>*/} +${a<RED>%<RESET><GREEN>%%<RESET>.*} +${a<RED>^<RESET><GREEN>^^<RESET>} +${a<RED>,<RESET><GREEN>,,<RESET>} +${<GREEN>!<RESET>a} +${a[<RED>*<RESET><GREEN>@<RESET>]} +${a<RED>:2:3<RESET><GREEN>:4:6<RESET>} +ls <RED>-a<RESET><GREEN>-x<RESET> +ls <RED>--a<RESET><GREEN>--x<RESET> diff --git a/t/t4034/bash/post b/t/t4034/bash/post new file mode 100644 index 0000000000..669e218c30 --- /dev/null +++ b/t/t4034/bash/post @@ -0,0 +1,33 @@ +new_var=10 +x=456 +y=2.71 +z=.75 +echo $USERNAME +${HOMEDIR} +((a+=b)) +((a*=b)) +((a/=b)) +((a%=b)) +((a|=b)) +((a^=b)) +((a==b)) +((a!=b)) +((a<=b)) +((a>=b)) +$((a<<b)) +$((a>>b)) +$((a&&b)) +$((a||b)) +${a:-b} +${a:=b} +${a:+b} +${a:?b} +${a##*/} +${a%%.*} +${a^^} +${a,,} +${!a} +${a[@]} +${a:4:6} +ls -x +ls --x diff --git a/t/t4034/bash/pre b/t/t4034/bash/pre new file mode 100644 index 0000000000..ada8470bac --- /dev/null +++ b/t/t4034/bash/pre @@ -0,0 +1,33 @@ +my_var=10 +x=123 +y=3.14 +z=.5 +echo $USER +${HOME} +((a+b)) +((a*b)) +((a/b)) +((a%b)) +((a|b)) +((a^b)) +((a=b)) +((a!b)) +((a<b)) +((a>b)) +$((a<b)) +$((a>b)) +$((a&b)) +$((a|b)) +${a:b} +${a:b} +${a:b} +${a:b} +${a#*/} +${a%.*} +${a^} +${a,} +${a} +${a[*]} +${a:2:3} +ls -a +ls --a diff --git a/userdiff.c b/userdiff.c index 340c4eb4f7..655c8fe0b1 100644 --- a/userdiff.c +++ b/userdiff.c @@ -59,20 +59,32 @@ PATTERNS("bash", "(" "(" /* POSIX identifier with mandatory parentheses */ - "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" + "([a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" "|" /* Bashism identifier with optional parentheses */ - "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" + "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+)))" ")" - /* Optional whitespace */ - "[ \t]*" - /* Compound command starting with `{`, `(`, `((` or `[[` */ - "(\\{|\\(\\(?|\\[\\[)" + /* Everything after the function header is captured */ + ".*$" /* End of captured text */ ")", /* -- */ - /* Characters not in the default $IFS value */ - "[^ \t]+"), + /* Identifiers: variable and function names */ + "[a-zA-Z_][a-zA-Z0-9_]*" + /* Numeric constants: integers and decimals */ + "|[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+" + /* Shell variables: $VAR, ${VAR} */ + "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{" + /* Logical and comparison operators */ + "|\\|\\||&&|<<|>>|==|!=|<=|>=" + /* Assignment and arithmetic operators */ + "|[-+*/%&|^!=<>]=?" + /* Additional parameter expansion operators */ + "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?" + /* Command-line options (to avoid splitting -option) */ + "|--?[a-zA-Z0-9_-]+" + /* Brackets and grouping symbols */ + "|\\(|\\)|\\{|\\}|\\[|\\]"), PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$", /* -- */ -- 2.48.0 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* (no subject) 2025-05-11 12:58 ` [PATCH v5 0/1] Added the closing ")" to make sure is not unbalanced and corrected the tests for word diff Moumita 2025-05-11 12:58 ` [PATCH v5 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita 2025-05-11 13:28 ` Moumita @ 2025-05-11 13:37 ` Moumita 2025-05-11 14:11 ` [PATCH v6 0/1] Added the newline after the test in t/4018 Moumita 3 siblings, 0 replies; 37+ messages in thread From: Moumita @ 2025-05-11 13:37 UTC (permalink / raw) To: git; +Cc: Moumita, Johannes Sixt, Eric Sunshine, Junio C Hamano Subject: [PATCH v6 0/1] Added the newline after the test in t/4018 Sorry , I forgot to add the newline after the test in t/4018 again so I had send the patch agaain. Moumita Dhar (1): userdiff: extend Bash pattern to cover more shell function forms t/t4018/bash-bashism-style-multiline-function | 4 ++ .../bash-hunk-header-complete-line-capture | 4 ++ t/t4018/bash-posix-style-multiline-function | 4 ++ .../bash-posix-style-single-command-function | 3 ++ t/t4034-diff-words.sh | 1 + t/t4034/bash/expect | 38 +++++++++++++++++++ t/t4034/bash/post | 33 ++++++++++++++++ t/t4034/bash/pre | 33 ++++++++++++++++ userdiff.c | 28 ++++++++++---- 9 files changed, 140 insertions(+), 8 deletions(-) create mode 100644 t/t4018/bash-bashism-style-multiline-function create mode 100644 t/t4018/bash-hunk-header-complete-line-capture create mode 100644 t/t4018/bash-posix-style-multiline-function create mode 100644 t/t4018/bash-posix-style-single-command-function create mode 100644 t/t4034/bash/expect create mode 100644 t/t4034/bash/post create mode 100644 t/t4034/bash/pre Range-diff against v5: 1: 40cffd3b4a ! 1: 464cb8a1eb userdiff: extend Bash pattern to cover more shell function forms @@ Commit message `x () echo hello`. Replacing the function body matching logic with `.*$`, ensures - that everything on the function definition line is captured, - aligning with other userdiff drivers and improving hunk headers in - `git diff`. + that everything on the function definition line is captured. Additionally, the word regex is refined to better recognize shell syntax, including additional parameter expansion operators and - command-line options, improving syntax-aware diffs. + command-line options. Signed-off-by: Moumita Dhar <dhar61595@gmail.com> @@ t/t4018/bash-bashism-style-multiline-function (new) +function RIGHT \ +{ + echo 'ChangeMe' ++} + + ## t/t4018/bash-hunk-header-complete-line-capture (new) ## +@@ ++func() { # RIGHT ++ ++ ChangeMe +} ## t/t4018/bash-posix-style-multiline-function (new) ## @@ t/t4034/bash/expect (new) +<BOLD>index 09ac008..60ba6a2 100644<RESET> +<BOLD>--- a/pre<RESET> +<BOLD>+++ b/post<RESET> -+<CYAN>@@ -1,25 +1,25 @@<RESET> ++<CYAN>@@ -1,33 +1,33 @@<RESET> +<RED>my_var<RESET><GREEN>new_var<RESET>=10 +x=<RED>123<RESET><GREEN>456<RESET> +y=<RED>3.14<RESET><GREEN>2.71<RESET> +z=<RED>.5<RESET><GREEN>.75<RESET> +echo <RED>$USER<RESET><GREEN>$USERNAME<RESET> +${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>} -+if [ "<RED>$a<RESET><GREEN>$x<RESET>" == "<RED>$b<RESET><GREEN>$y<RESET>" ] || [ "<RED>$c<RESET><GREEN>$x<RESET>" != "<RED>$d<RESET><GREEN>$y<RESET>" ]; then echo "OK"; fi -+((<RED>a<RESET><GREEN>x<RESET>+=<RED>b<RESET><GREEN>y<RESET>)) -+((<RED>a<RESET><GREEN>x<RESET>-=<RED>b<RESET><GREEN>y<RESET>)) -+$((<RED>a<RESET><GREEN>x<RESET><<<RED>b<RESET><GREEN>y<RESET>)) -+$((<RED>a<RESET><GREEN>x<RESET>>><RED>b<RESET><GREEN>y<RESET>)) -+${<RED>a<RESET><GREEN>x<RESET>:-<RED>b<RESET><GREEN>y<RESET>} -+${<RED>a<RESET><GREEN>x<RESET>:=<RED>b<RESET><GREEN>y<RESET>} -+${<RED>a<RESET><GREEN>x<RESET>##*/} -+${<RED>a<RESET><GREEN>x<RESET>%.*} -+${<RED>a<RESET><GREEN>x<RESET>%%.*} -+${<RED>a<RESET><GREEN>x<RESET>^^} -+${<RED>a<RESET><GREEN>x<RESET>,} -+${<RED>a<RESET><GREEN>x<RESET>,,} -+${!<RED>a<RESET><GREEN>x<RESET>} -+${<RED>a<RESET><GREEN>x<RESET>[@]} -+${<RED>a<RESET><GREEN>x<RESET>:?error message} -+${<RED>a<RESET><GREEN>x<RESET>:2:3} ++((a<RED>+<RESET><GREEN>+=<RESET>b)) ++((a<RED>*<RESET><GREEN>*=<RESET>b)) ++((a<RED>/<RESET><GREEN>/=<RESET>b)) ++((a<RED>%<RESET><GREEN>%=<RESET>b)) ++((a<RED>|<RESET><GREEN>|=<RESET>b)) ++((a<RED>^<RESET><GREEN>^=<RESET>b)) ++((a<RED>=<RESET><GREEN>==<RESET>b)) ++((a<RED>!<RESET><GREEN>!=<RESET>b)) ++((a<RED><<RESET><GREEN><=<RESET>b)) ++((a<RED>><RESET><GREEN>>=<RESET>b)) ++$((a<RED><<RESET><GREEN><<<RESET>b)) ++$((a<RED>><RESET><GREEN>>><RESET>b)) ++$((a<RED>&<RESET><GREEN>&&<RESET>b)) ++$((a<RED>|<RESET><GREEN>||<RESET>b)) ++${a<RED>:<RESET><GREEN>:-<RESET>b} ++${a<RED>:<RESET><GREEN>:=<RESET>b} ++${a<RED>:<RESET><GREEN>:+<RESET>b} ++${a<RED>:<RESET><GREEN>:?<RESET>b} ++${a<RED>#<RESET><GREEN>##<RESET>*/} ++${a<RED>%<RESET><GREEN>%%<RESET>.*} ++${a<RED>^<RESET><GREEN>^^<RESET>} ++${a<RED>,<RESET><GREEN>,,<RESET>} ++${<GREEN>!<RESET>a} ++${a[<RED>*<RESET><GREEN>@<RESET>]} ++${a<RED>:2:3<RESET><GREEN>:4:6<RESET>} +ls <RED>-a<RESET><GREEN>-x<RESET> +ls <RED>--a<RESET><GREEN>--x<RESET> @@ t/t4034/bash/post (new) +z=.75 +echo $USERNAME +${HOMEDIR} -+if [ "$x" == "$y" ] || [ "$x" != "$y" ]; then echo "OK"; fi -+((x+=y)) -+((x-=y)) -+$((x<<y)) -+$((x>>y)) -+${x:-y} -+${x:=y} -+${x##*/} -+${x%.*} -+${x%%.*} -+${x^^} -+${x,} -+${x,,} -+${!x} -+${x[@]} -+${x:?error message} -+${x:2:3} ++((a+=b)) ++((a*=b)) ++((a/=b)) ++((a%=b)) ++((a|=b)) ++((a^=b)) ++((a==b)) ++((a!=b)) ++((a<=b)) ++((a>=b)) ++$((a<<b)) ++$((a>>b)) ++$((a&&b)) ++$((a||b)) ++${a:-b} ++${a:=b} ++${a:+b} ++${a:?b} ++${a##*/} ++${a%%.*} ++${a^^} ++${a,,} ++${!a} ++${a[@]} ++${a:4:6} +ls -x +ls --x @@ t/t4034/bash/pre (new) +z=.5 +echo $USER +${HOME} -+if [ "$a" == "$b" ] || [ "$c" != "$d" ]; then echo "OK"; fi -+((a+=b)) -+((a-=b)) -+$((a << b)) -+$((a >> b)) -+${a:-b} -+${a:=b} -+${a##*/} ++((a+b)) ++((a*b)) ++((a/b)) ++((a%b)) ++((a|b)) ++((a^b)) ++((a=b)) ++((a!b)) ++((a<b)) ++((a>b)) ++$((a<b)) ++$((a>b)) ++$((a&b)) ++$((a|b)) ++${a:b} ++${a:b} ++${a:b} ++${a:b} ++${a#*/} +${a%.*} -+${a%%.*} -+${a^^} ++${a^} +${a,} -+${a,,} -+${!a} -+${a[@]} -+${a:?error message} ++${a} ++${a[*]} +${a:2:3} +ls -a +ls --a ## userdiff.c ## @@ userdiff.c: PATTERNS("bash", + "(" + "(" + /* POSIX identifier with mandatory parentheses */ +- "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" ++ "([a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" + "|" /* Bashism identifier with optional parentheses */ - "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" +- "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" ++ "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+)))" ")" - /* Optional whitespace */ - "[ \t]*" @@ userdiff.c: PATTERNS("bash", + /* Assignment and arithmetic operators */ + "|[-+*/%&|^!=<>]=?" + /* Additional parameter expansion operators */ -+ "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|/[a-zA-Z0-9_-]+|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?" ++ "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?" + /* Command-line options (to avoid splitting -option) */ + "|--?[a-zA-Z0-9_-]+" + /* Brackets and grouping symbols */ -- 2.48.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH v6 0/1] Added the newline after the test in t/4018 2025-05-11 12:58 ` [PATCH v5 0/1] Added the closing ")" to make sure is not unbalanced and corrected the tests for word diff Moumita ` (2 preceding siblings ...) 2025-05-11 13:37 ` Moumita @ 2025-05-11 14:11 ` Moumita 2025-05-11 14:11 ` [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita 2025-05-16 14:45 ` [PATCH v7 0/1] Updated the word diff regex for Bash scripts Moumita 3 siblings, 2 replies; 37+ messages in thread From: Moumita @ 2025-05-11 14:11 UTC (permalink / raw) To: git; +Cc: Moumita, Johannes Sixt, Eric Sunshine, Junio C Hamano Sorry I forgot to add the newline after the test in t/4018 again so I had to send the patch again. Moumita Dhar (1): userdiff: extend Bash pattern to cover more shell function forms t/t4018/bash-bashism-style-multiline-function | 4 ++ .../bash-hunk-header-complete-line-capture | 4 ++ t/t4018/bash-posix-style-multiline-function | 4 ++ .../bash-posix-style-single-command-function | 3 ++ t/t4034-diff-words.sh | 1 + t/t4034/bash/expect | 38 +++++++++++++++++++ t/t4034/bash/post | 33 ++++++++++++++++ t/t4034/bash/pre | 33 ++++++++++++++++ userdiff.c | 28 ++++++++++---- 9 files changed, 140 insertions(+), 8 deletions(-) create mode 100644 t/t4018/bash-bashism-style-multiline-function create mode 100644 t/t4018/bash-hunk-header-complete-line-capture create mode 100644 t/t4018/bash-posix-style-multiline-function create mode 100644 t/t4018/bash-posix-style-single-command-function create mode 100644 t/t4034/bash/expect create mode 100644 t/t4034/bash/post create mode 100644 t/t4034/bash/pre Range-diff against v5: 1: 40cffd3b4a ! 1: 464cb8a1eb userdiff: extend Bash pattern to cover more shell function forms @@ Commit message `x () echo hello`. Replacing the function body matching logic with `.*$`, ensures - that everything on the function definition line is captured, - aligning with other userdiff drivers and improving hunk headers in - `git diff`. + that everything on the function definition line is captured. Additionally, the word regex is refined to better recognize shell syntax, including additional parameter expansion operators and - command-line options, improving syntax-aware diffs. + command-line options. Signed-off-by: Moumita Dhar <dhar61595@gmail.com> @@ t/t4018/bash-bashism-style-multiline-function (new) +function RIGHT \ +{ + echo 'ChangeMe' ++} + + ## t/t4018/bash-hunk-header-complete-line-capture (new) ## +@@ ++func() { # RIGHT ++ ++ ChangeMe +} ## t/t4018/bash-posix-style-multiline-function (new) ## @@ t/t4034/bash/expect (new) +<BOLD>index 09ac008..60ba6a2 100644<RESET> +<BOLD>--- a/pre<RESET> +<BOLD>+++ b/post<RESET> -+<CYAN>@@ -1,25 +1,25 @@<RESET> ++<CYAN>@@ -1,33 +1,33 @@<RESET> +<RED>my_var<RESET><GREEN>new_var<RESET>=10 +x=<RED>123<RESET><GREEN>456<RESET> +y=<RED>3.14<RESET><GREEN>2.71<RESET> +z=<RED>.5<RESET><GREEN>.75<RESET> +echo <RED>$USER<RESET><GREEN>$USERNAME<RESET> +${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>} -+if [ "<RED>$a<RESET><GREEN>$x<RESET>" == "<RED>$b<RESET><GREEN>$y<RESET>" ] || [ "<RED>$c<RESET><GREEN>$x<RESET>" != "<RED>$d<RESET><GREEN>$y<RESET>" ]; then echo "OK"; fi -+((<RED>a<RESET><GREEN>x<RESET>+=<RED>b<RESET><GREEN>y<RESET>)) -+((<RED>a<RESET><GREEN>x<RESET>-=<RED>b<RESET><GREEN>y<RESET>)) -+$((<RED>a<RESET><GREEN>x<RESET><<<RED>b<RESET><GREEN>y<RESET>)) -+$((<RED>a<RESET><GREEN>x<RESET>>><RED>b<RESET><GREEN>y<RESET>)) -+${<RED>a<RESET><GREEN>x<RESET>:-<RED>b<RESET><GREEN>y<RESET>} -+${<RED>a<RESET><GREEN>x<RESET>:=<RED>b<RESET><GREEN>y<RESET>} -+${<RED>a<RESET><GREEN>x<RESET>##*/} -+${<RED>a<RESET><GREEN>x<RESET>%.*} -+${<RED>a<RESET><GREEN>x<RESET>%%.*} -+${<RED>a<RESET><GREEN>x<RESET>^^} -+${<RED>a<RESET><GREEN>x<RESET>,} -+${<RED>a<RESET><GREEN>x<RESET>,,} -+${!<RED>a<RESET><GREEN>x<RESET>} -+${<RED>a<RESET><GREEN>x<RESET>[@]} -+${<RED>a<RESET><GREEN>x<RESET>:?error message} -+${<RED>a<RESET><GREEN>x<RESET>:2:3} ++((a<RED>+<RESET><GREEN>+=<RESET>b)) ++((a<RED>*<RESET><GREEN>*=<RESET>b)) ++((a<RED>/<RESET><GREEN>/=<RESET>b)) ++((a<RED>%<RESET><GREEN>%=<RESET>b)) ++((a<RED>|<RESET><GREEN>|=<RESET>b)) ++((a<RED>^<RESET><GREEN>^=<RESET>b)) ++((a<RED>=<RESET><GREEN>==<RESET>b)) ++((a<RED>!<RESET><GREEN>!=<RESET>b)) ++((a<RED><<RESET><GREEN><=<RESET>b)) ++((a<RED>><RESET><GREEN>>=<RESET>b)) ++$((a<RED><<RESET><GREEN><<<RESET>b)) ++$((a<RED>><RESET><GREEN>>><RESET>b)) ++$((a<RED>&<RESET><GREEN>&&<RESET>b)) ++$((a<RED>|<RESET><GREEN>||<RESET>b)) ++${a<RED>:<RESET><GREEN>:-<RESET>b} ++${a<RED>:<RESET><GREEN>:=<RESET>b} ++${a<RED>:<RESET><GREEN>:+<RESET>b} ++${a<RED>:<RESET><GREEN>:?<RESET>b} ++${a<RED>#<RESET><GREEN>##<RESET>*/} ++${a<RED>%<RESET><GREEN>%%<RESET>.*} ++${a<RED>^<RESET><GREEN>^^<RESET>} ++${a<RED>,<RESET><GREEN>,,<RESET>} ++${<GREEN>!<RESET>a} ++${a[<RED>*<RESET><GREEN>@<RESET>]} ++${a<RED>:2:3<RESET><GREEN>:4:6<RESET>} +ls <RED>-a<RESET><GREEN>-x<RESET> +ls <RED>--a<RESET><GREEN>--x<RESET> @@ t/t4034/bash/post (new) +z=.75 +echo $USERNAME +${HOMEDIR} -+if [ "$x" == "$y" ] || [ "$x" != "$y" ]; then echo "OK"; fi -+((x+=y)) -+((x-=y)) -+$((x<<y)) -+$((x>>y)) -+${x:-y} -+${x:=y} -+${x##*/} -+${x%.*} -+${x%%.*} -+${x^^} -+${x,} -+${x,,} -+${!x} -+${x[@]} -+${x:?error message} -+${x:2:3} ++((a+=b)) ++((a*=b)) ++((a/=b)) ++((a%=b)) ++((a|=b)) ++((a^=b)) ++((a==b)) ++((a!=b)) ++((a<=b)) ++((a>=b)) ++$((a<<b)) ++$((a>>b)) ++$((a&&b)) ++$((a||b)) ++${a:-b} ++${a:=b} ++${a:+b} ++${a:?b} ++${a##*/} ++${a%%.*} ++${a^^} ++${a,,} ++${!a} ++${a[@]} ++${a:4:6} +ls -x +ls --x @@ t/t4034/bash/pre (new) +z=.5 +echo $USER +${HOME} -+if [ "$a" == "$b" ] || [ "$c" != "$d" ]; then echo "OK"; fi -+((a+=b)) -+((a-=b)) -+$((a << b)) -+$((a >> b)) -+${a:-b} -+${a:=b} -+${a##*/} ++((a+b)) ++((a*b)) ++((a/b)) ++((a%b)) ++((a|b)) ++((a^b)) ++((a=b)) ++((a!b)) ++((a<b)) ++((a>b)) ++$((a<b)) ++$((a>b)) ++$((a&b)) ++$((a|b)) ++${a:b} ++${a:b} ++${a:b} ++${a:b} ++${a#*/} +${a%.*} -+${a%%.*} -+${a^^} ++${a^} +${a,} -+${a,,} -+${!a} -+${a[@]} -+${a:?error message} ++${a} ++${a[*]} +${a:2:3} +ls -a +ls --a ## userdiff.c ## @@ userdiff.c: PATTERNS("bash", + "(" + "(" + /* POSIX identifier with mandatory parentheses */ +- "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" ++ "([a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" + "|" /* Bashism identifier with optional parentheses */ - "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" +- "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" ++ "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+)))" ")" - /* Optional whitespace */ - "[ \t]*" @@ userdiff.c: PATTERNS("bash", + /* Assignment and arithmetic operators */ + "|[-+*/%&|^!=<>]=?" + /* Additional parameter expansion operators */ -+ "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|/[a-zA-Z0-9_-]+|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?" ++ "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?" + /* Command-line options (to avoid splitting -option) */ + "|--?[a-zA-Z0-9_-]+" + /* Brackets and grouping symbols */ -- 2.48.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell function forms 2025-05-11 14:11 ` [PATCH v6 0/1] Added the newline after the test in t/4018 Moumita @ 2025-05-11 14:11 ` Moumita 2025-05-13 18:50 ` Junio C Hamano 2025-05-16 7:25 ` Johannes Sixt 2025-05-16 14:45 ` [PATCH v7 0/1] Updated the word diff regex for Bash scripts Moumita 1 sibling, 2 replies; 37+ messages in thread From: Moumita @ 2025-05-11 14:11 UTC (permalink / raw) To: git; +Cc: Moumita Dhar, Johannes Sixt, Eric Sunshine, Junio C Hamano From: Moumita Dhar <dhar61595@gmail.com> The previous function regex required explicit matching of function bodies using `{`, `(`, `((`, or `[[`, which caused several issues: - It failed to capture valid functions where `{` was on the next line due to line continuation (`\`). - It did not recognize functions with single command body, such as `x () echo hello`. Replacing the function body matching logic with `.*$`, ensures that everything on the function definition line is captured. Additionally, the word regex is refined to better recognize shell syntax, including additional parameter expansion operators and command-line options. Signed-off-by: Moumita Dhar <dhar61595@gmail.com> --- t/t4018/bash-bashism-style-multiline-function | 4 ++ .../bash-hunk-header-complete-line-capture | 4 ++ t/t4018/bash-posix-style-multiline-function | 4 ++ .../bash-posix-style-single-command-function | 3 ++ t/t4034-diff-words.sh | 1 + t/t4034/bash/expect | 38 +++++++++++++++++++ t/t4034/bash/post | 33 ++++++++++++++++ t/t4034/bash/pre | 33 ++++++++++++++++ userdiff.c | 28 ++++++++++---- 9 files changed, 140 insertions(+), 8 deletions(-) create mode 100644 t/t4018/bash-bashism-style-multiline-function create mode 100644 t/t4018/bash-hunk-header-complete-line-capture create mode 100644 t/t4018/bash-posix-style-multiline-function create mode 100644 t/t4018/bash-posix-style-single-command-function create mode 100644 t/t4034/bash/expect create mode 100644 t/t4034/bash/post create mode 100644 t/t4034/bash/pre diff --git a/t/t4018/bash-bashism-style-multiline-function b/t/t4018/bash-bashism-style-multiline-function new file mode 100644 index 0000000000..284d50dd99 --- /dev/null +++ b/t/t4018/bash-bashism-style-multiline-function @@ -0,0 +1,4 @@ +function RIGHT \ +{ + echo 'ChangeMe' +} diff --git a/t/t4018/bash-hunk-header-complete-line-capture b/t/t4018/bash-hunk-header-complete-line-capture new file mode 100644 index 0000000000..b56942f322 --- /dev/null +++ b/t/t4018/bash-hunk-header-complete-line-capture @@ -0,0 +1,4 @@ +func() { # RIGHT + + ChangeMe +} diff --git a/t/t4018/bash-posix-style-multiline-function b/t/t4018/bash-posix-style-multiline-function new file mode 100644 index 0000000000..cc8727cbcd --- /dev/null +++ b/t/t4018/bash-posix-style-multiline-function @@ -0,0 +1,4 @@ +RIGHT() \ +{ + ChangeMe +} diff --git a/t/t4018/bash-posix-style-single-command-function b/t/t4018/bash-posix-style-single-command-function new file mode 100644 index 0000000000..398ae1c5d2 --- /dev/null +++ b/t/t4018/bash-posix-style-single-command-function @@ -0,0 +1,3 @@ +RIGHT() echo "hello" + + ChangeMe diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh index f51d3557f1..0be647c2fb 100755 --- a/t/t4034-diff-words.sh +++ b/t/t4034-diff-words.sh @@ -320,6 +320,7 @@ test_expect_success 'unset default driver' ' test_language_driver ada test_language_driver bibtex +test_language_driver bash test_language_driver cpp test_language_driver csharp test_language_driver css diff --git a/t/t4034/bash/expect b/t/t4034/bash/expect new file mode 100644 index 0000000000..17755e455f --- /dev/null +++ b/t/t4034/bash/expect @@ -0,0 +1,38 @@ +<BOLD>diff --git a/pre b/post<RESET> +<BOLD>index 09ac008..60ba6a2 100644<RESET> +<BOLD>--- a/pre<RESET> +<BOLD>+++ b/post<RESET> +<CYAN>@@ -1,33 +1,33 @@<RESET> +<RED>my_var<RESET><GREEN>new_var<RESET>=10 +x=<RED>123<RESET><GREEN>456<RESET> +y=<RED>3.14<RESET><GREEN>2.71<RESET> +z=<RED>.5<RESET><GREEN>.75<RESET> +echo <RED>$USER<RESET><GREEN>$USERNAME<RESET> +${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>} +((a<RED>+<RESET><GREEN>+=<RESET>b)) +((a<RED>*<RESET><GREEN>*=<RESET>b)) +((a<RED>/<RESET><GREEN>/=<RESET>b)) +((a<RED>%<RESET><GREEN>%=<RESET>b)) +((a<RED>|<RESET><GREEN>|=<RESET>b)) +((a<RED>^<RESET><GREEN>^=<RESET>b)) +((a<RED>=<RESET><GREEN>==<RESET>b)) +((a<RED>!<RESET><GREEN>!=<RESET>b)) +((a<RED><<RESET><GREEN><=<RESET>b)) +((a<RED>><RESET><GREEN>>=<RESET>b)) +$((a<RED><<RESET><GREEN><<<RESET>b)) +$((a<RED>><RESET><GREEN>>><RESET>b)) +$((a<RED>&<RESET><GREEN>&&<RESET>b)) +$((a<RED>|<RESET><GREEN>||<RESET>b)) +${a<RED>:<RESET><GREEN>:-<RESET>b} +${a<RED>:<RESET><GREEN>:=<RESET>b} +${a<RED>:<RESET><GREEN>:+<RESET>b} +${a<RED>:<RESET><GREEN>:?<RESET>b} +${a<RED>#<RESET><GREEN>##<RESET>*/} +${a<RED>%<RESET><GREEN>%%<RESET>.*} +${a<RED>^<RESET><GREEN>^^<RESET>} +${a<RED>,<RESET><GREEN>,,<RESET>} +${<GREEN>!<RESET>a} +${a[<RED>*<RESET><GREEN>@<RESET>]} +${a<RED>:2:3<RESET><GREEN>:4:6<RESET>} +ls <RED>-a<RESET><GREEN>-x<RESET> +ls <RED>--a<RESET><GREEN>--x<RESET> diff --git a/t/t4034/bash/post b/t/t4034/bash/post new file mode 100644 index 0000000000..669e218c30 --- /dev/null +++ b/t/t4034/bash/post @@ -0,0 +1,33 @@ +new_var=10 +x=456 +y=2.71 +z=.75 +echo $USERNAME +${HOMEDIR} +((a+=b)) +((a*=b)) +((a/=b)) +((a%=b)) +((a|=b)) +((a^=b)) +((a==b)) +((a!=b)) +((a<=b)) +((a>=b)) +$((a<<b)) +$((a>>b)) +$((a&&b)) +$((a||b)) +${a:-b} +${a:=b} +${a:+b} +${a:?b} +${a##*/} +${a%%.*} +${a^^} +${a,,} +${!a} +${a[@]} +${a:4:6} +ls -x +ls --x diff --git a/t/t4034/bash/pre b/t/t4034/bash/pre new file mode 100644 index 0000000000..ada8470bac --- /dev/null +++ b/t/t4034/bash/pre @@ -0,0 +1,33 @@ +my_var=10 +x=123 +y=3.14 +z=.5 +echo $USER +${HOME} +((a+b)) +((a*b)) +((a/b)) +((a%b)) +((a|b)) +((a^b)) +((a=b)) +((a!b)) +((a<b)) +((a>b)) +$((a<b)) +$((a>b)) +$((a&b)) +$((a|b)) +${a:b} +${a:b} +${a:b} +${a:b} +${a#*/} +${a%.*} +${a^} +${a,} +${a} +${a[*]} +${a:2:3} +ls -a +ls --a diff --git a/userdiff.c b/userdiff.c index 340c4eb4f7..655c8fe0b1 100644 --- a/userdiff.c +++ b/userdiff.c @@ -59,20 +59,32 @@ PATTERNS("bash", "(" "(" /* POSIX identifier with mandatory parentheses */ - "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" + "([a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" "|" /* Bashism identifier with optional parentheses */ - "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" + "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+)))" ")" - /* Optional whitespace */ - "[ \t]*" - /* Compound command starting with `{`, `(`, `((` or `[[` */ - "(\\{|\\(\\(?|\\[\\[)" + /* Everything after the function header is captured */ + ".*$" /* End of captured text */ ")", /* -- */ - /* Characters not in the default $IFS value */ - "[^ \t]+"), + /* Identifiers: variable and function names */ + "[a-zA-Z_][a-zA-Z0-9_]*" + /* Numeric constants: integers and decimals */ + "|[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+" + /* Shell variables: $VAR, ${VAR} */ + "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{" + /* Logical and comparison operators */ + "|\\|\\||&&|<<|>>|==|!=|<=|>=" + /* Assignment and arithmetic operators */ + "|[-+*/%&|^!=<>]=?" + /* Additional parameter expansion operators */ + "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?" + /* Command-line options (to avoid splitting -option) */ + "|--?[a-zA-Z0-9_-]+" + /* Brackets and grouping symbols */ + "|\\(|\\)|\\{|\\}|\\[|\\]"), PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$", /* -- */ -- 2.48.0 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell function forms 2025-05-11 14:11 ` [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita @ 2025-05-13 18:50 ` Junio C Hamano 2025-05-14 6:33 ` MOUMITA DHAR 2025-05-16 7:25 ` Johannes Sixt 1 sibling, 1 reply; 37+ messages in thread From: Junio C Hamano @ 2025-05-13 18:50 UTC (permalink / raw) To: Moumita; +Cc: git, Johannes Sixt, Eric Sunshine Moumita <dhar61595@gmail.com> writes: > diff --git a/t/t4018/bash-posix-style-multiline-function b/t/t4018/bash-posix-style-multiline-function > new file mode 100644 > index 0000000000..cc8727cbcd > --- /dev/null > +++ b/t/t4018/bash-posix-style-multiline-function > @@ -0,0 +1,4 @@ > +RIGHT() \ > +{ > + ChangeMe > +} Not a review, but I am curious what this test is about. Is it to ensure that the pattern does not get confused with the backslash that does not have to be (but it would not hurt to have one) there? IOW, does RIGHT() { ChangeMe } get processed just fine, and the above is to check the corner case where an unusual "\" on the same line as RIGHT does not break the funcline identification? Thanks. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell function forms 2025-05-13 18:50 ` Junio C Hamano @ 2025-05-14 6:33 ` MOUMITA DHAR 0 siblings, 0 replies; 37+ messages in thread From: MOUMITA DHAR @ 2025-05-14 6:33 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Johannes Sixt, Eric Sunshine On Wed, 14 May 2025 at 00:20, Junio C Hamano <gitster@pobox.com> wrote: > > Moumita <dhar61595@gmail.com> writes: > > > diff --git a/t/t4018/bash-posix-style-multiline-function b/t/t4018/bash-posix-style-multiline-function > > new file mode 100644 > > index 0000000000..cc8727cbcd > > --- /dev/null > > +++ b/t/t4018/bash-posix-style-multiline-function > > @@ -0,0 +1,4 @@ > > +RIGHT() \ > > +{ > > + ChangeMe > > +} > > Not a review, but I am curious what this test is about. Is it to > ensure that the pattern does not get confused with the backslash > that does not have to be (but it would not hurt to have one) there? > > IOW, does > > RIGHT() > { > ChangeMe > } > > get processed just fine, and the above is to check the corner case > where an unusual "\" on the same line as RIGHT does not break the > funcline identification? > > Thanks. I realise this test is redundant , The goal of my test was to ensure that the entire function header line is correctly captured, even when the opening brace is placed on the next line using a backslash. However, I now realize that the test case Johannes mentioned already covers this behavior. So rather than duplicating that, I think it would be more useful to add a similar test using the alternative, Bashism-style syntax:- function myfunc # RIGHT { echo 'ChangeMe' } Thank You Moumita ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell function forms 2025-05-11 14:11 ` [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita 2025-05-13 18:50 ` Junio C Hamano @ 2025-05-16 7:25 ` Johannes Sixt 2025-05-17 13:09 ` Junio C Hamano 1 sibling, 1 reply; 37+ messages in thread From: Johannes Sixt @ 2025-05-16 7:25 UTC (permalink / raw) To: Moumita; +Cc: Eric Sunshine, Junio C Hamano, git Am 11.05.25 um 16:11 schrieb Moumita: > From: Moumita Dhar <dhar61595@gmail.com> > > The previous function regex required explicit matching of function > bodies using `{`, `(`, `((`, or `[[`, which caused several issues: > > - It failed to capture valid functions where `{` was on the next line > due to line continuation (`\`). > - It did not recognize functions with single command body, such as > `x () echo hello`. > > Replacing the function body matching logic with `.*$`, ensures > that everything on the function definition line is captured. > > Additionally, the word regex is refined to better recognize shell > syntax, including additional parameter expansion operators and > command-line options. > > Signed-off-by: Moumita Dhar <dhar61595@gmail.com> > --- Function patterns are looking good now. Let's focus on the worddiff pattern now. > diff --git a/t/t4034/bash/expect b/t/t4034/bash/expect > new file mode 100644 > index 0000000000..17755e455f > --- /dev/null > +++ b/t/t4034/bash/expect > @@ -0,0 +1,38 @@ > +<BOLD>diff --git a/pre b/post<RESET> > +<BOLD>index 09ac008..60ba6a2 100644<RESET> > +<BOLD>--- a/pre<RESET> > +<BOLD>+++ b/post<RESET> > +<CYAN>@@ -1,33 +1,33 @@<RESET> > +<RED>my_var<RESET><GREEN>new_var<RESET>=10 > +x=<RED>123<RESET><GREEN>456<RESET> > +y=<RED>3.14<RESET><GREEN>2.71<RESET> > +z=<RED>.5<RESET><GREEN>.75<RESET> When do decimal numbers occur in shell scripts? Wouldn't it be more often the case that a fullstop is part of a regular expression or a file name or version number that happens to be surrounded by numbers? In that case, we would prefer to capture the digit sequences as separate words. > +echo <RED>$USER<RESET><GREEN>$USERNAME<RESET> > +${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>} > +((a<RED>+<RESET><GREEN>+=<RESET>b)) > +((a<RED>*<RESET><GREEN>*=<RESET>b)) > +((a<RED>/<RESET><GREEN>/=<RESET>b)) > +((a<RED>%<RESET><GREEN>%=<RESET>b)) > +((a<RED>|<RESET><GREEN>|=<RESET>b)) > +((a<RED>^<RESET><GREEN>^=<RESET>b)) > +((a<RED>=<RESET><GREEN>==<RESET>b)) > +((a<RED>!<RESET><GREEN>!=<RESET>b)) > +((a<RED><<RESET><GREEN><=<RESET>b)) > +((a<RED>><RESET><GREEN>>=<RESET>b)) > +$((a<RED><<RESET><GREEN><<<RESET>b)) > +$((a<RED>><RESET><GREEN>>><RESET>b)) > +$((a<RED>&<RESET><GREEN>&&<RESET>b)) > +$((a<RED>|<RESET><GREEN>||<RESET>b)) > +${a<RED>:<RESET><GREEN>:-<RESET>b} > +${a<RED>:<RESET><GREEN>:=<RESET>b} > +${a<RED>:<RESET><GREEN>:+<RESET>b} > +${a<RED>:<RESET><GREEN>:?<RESET>b} > +${a<RED>#<RESET><GREEN>##<RESET>*/} > +${a<RED>%<RESET><GREEN>%%<RESET>.*} > +${a<RED>^<RESET><GREEN>^^<RESET>} > +${a<RED>,<RESET><GREEN>,,<RESET>} > +${<GREEN>!<RESET>a} > +${a[<RED>*<RESET><GREEN>@<RESET>]} All good tests! All of | || & && would more often than not occur in commands rather than expressions. But the way these operators are tested here is fine, too. > +${a<RED>:2:3<RESET><GREEN>:4:6<RESET>} I am surprised to see :2:3 as a single token. It's bash's substring expansion. It would not be matched anyway in practice because these numbers are often variables or more complicated expressions, I would think, not integer constants. > +ls <RED>-a<RESET><GREEN>-x<RESET> > +ls <RED>--a<RESET><GREEN>--x<RESET> Can we extend the second of these two to have a multi-letter option? > diff --git a/userdiff.c b/userdiff.c > index 340c4eb4f7..655c8fe0b1 100644 > --- a/userdiff.c > +++ b/userdiff.c > @@ -59,20 +59,32 @@ PATTERNS("bash", > "(" > "(" > /* POSIX identifier with mandatory parentheses */ > - "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" > + "([a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" > "|" > /* Bashism identifier with optional parentheses */ > - "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" > + "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+)))" > ")" > - /* Optional whitespace */ > - "[ \t]*" > - /* Compound command starting with `{`, `(`, `((` or `[[` */ > - "(\\{|\\(\\(?|\\[\\[)" > + /* Everything after the function header is captured */ > + ".*$" > /* End of captured text */ > ")", > /* -- */ In the operator patterns that follow, it is not necessary to match operators that are just a single punctuation character. They are matched automatically, IIRC (but please correct me if I am wrong). Only multi-character operators should be matched. That means that in all patterns where one character is optional due to the '?' in a two-character-operator, you should drop the '?'. > - /* Characters not in the default $IFS value */ > - "[^ \t]+"), > + /* Identifiers: variable and function names */ > + "[a-zA-Z_][a-zA-Z0-9_]*" OK. But see below. > + /* Numeric constants: integers and decimals */ > + "|[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+" I would recommend not to match decimal numbers and instead have them as three words: integer-fullstop-integer. Generally, I advise to not include a sign in the word pattern of a number, because the sign is usually its own operator. But we do have numeric options, i.e., integers with a leading dash. But see below. > + /* Shell variables: $VAR, ${VAR} */ > + "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{" Shell variables do not have to begin with a letter. We have $0, $1 etc. Just do not require the first character after the $ to be a letter. > + /* Logical and comparison operators */ > + "|\\|\\||&&|<<|>>|==|!=|<=|>=" I'd call || and && "Command list separators" in the context of shell language. > + /* Assignment and arithmetic operators */ > + "|[-+*/%&|^!=<>]=?" These would then be only compund assignment operators. These patterns also match <= >= == != again. How about having all those (from this line and from the line above) that end in '=' into a single case? > + /* Additional parameter expansion operators */ > + "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?" Remove all single-character operators. Also, the :digits:digits word is strange. What is your rationale that you included it? > + /* Command-line options (to avoid splitting -option) */ > + "|--?[a-zA-Z0-9_-]+" This pattern is a beast for the matcher.Would it match the text "-----"? Yes, in more than one way! Would it match "--"? Yes! Would it match "-1000"? Yes! None of these are a problem, but actually desired. But it would not need the optional '-' in the second position. The '-' included in the range would match "--foo" just as well. This patter also matches negative numbers. So, here's the idea: Just turn this into "|[-a-zA-Z0-9_]+" And we have a pattern that matches identifiers, function names, integers, short options, long options, numeric options a.k.a. negative integers. It might match more and I hope that I do not miss a case that we do *not* want to match. If you do, let me know. Until then I'd leave it at that and fix it up when we encounter an important case that is mishandled by this pattern. > + /* Brackets and grouping symbols */ > + "|\\(|\\)|\\{|\\}|\\[|\\]"), OK. > PATTERNS("bibtex", > "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$", > /* -- */ -- Hannes ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell function forms 2025-05-16 7:25 ` Johannes Sixt @ 2025-05-17 13:09 ` Junio C Hamano 2025-05-18 7:41 ` Johannes Sixt 0 siblings, 1 reply; 37+ messages in thread From: Junio C Hamano @ 2025-05-17 13:09 UTC (permalink / raw) To: Johannes Sixt; +Cc: Moumita, Eric Sunshine, git Johannes Sixt <j6t@kdbg.org> writes: >> +y=<RED>3.14<RESET><GREEN>2.71<RESET> >> +z=<RED>.5<RESET><GREEN>.75<RESET> > > When do decimal numbers occur in shell scripts? Wouldn't it be more > often the case that a fullstop is part of a regular expression or a file > name or version number that happens to be surrounded by numbers? In that > case, we would prefer to capture the digit sequences as separate words. Sorry but I am confused. Do you want a filename "sample.3gp" treated as having separate parts, "sample", ".3", and "gp", instead of a single word? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell function forms 2025-05-17 13:09 ` Junio C Hamano @ 2025-05-18 7:41 ` Johannes Sixt 0 siblings, 0 replies; 37+ messages in thread From: Johannes Sixt @ 2025-05-18 7:41 UTC (permalink / raw) To: Junio C Hamano; +Cc: Moumita, Eric Sunshine, git Am 17.05.25 um 15:09 schrieb Junio C Hamano: > Johannes Sixt <j6t@kdbg.org> writes: > >>> +y=<RED>3.14<RESET><GREEN>2.71<RESET> >>> +z=<RED>.5<RESET><GREEN>.75<RESET> >> >> When do decimal numbers occur in shell scripts? Wouldn't it be more >> often the case that a fullstop is part of a regular expression or a file >> name or version number that happens to be surrounded by numbers? In that >> case, we would prefer to capture the digit sequences as separate words. > > Sorry but I am confused. > > Do you want a filename "sample.3gp" treated as having separate > parts, "sample", ".3", and "gp", instead of a single word? I didn't consider such a case. The cited paragraph could be interpreted to say that we want the words "sample", ".", "3", and "gp". But the pattern that we have in round v7 by a lucky strike actually gives "sample", ".", and "3gp" (I think), which is much better. -- Hannes ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH v7 0/1] Updated the word diff regex for Bash scripts 2025-05-11 14:11 ` [PATCH v6 0/1] Added the newline after the test in t/4018 Moumita 2025-05-11 14:11 ` [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita @ 2025-05-16 14:45 ` Moumita 2025-05-16 14:45 ` [PATCH v7 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita 1 sibling, 1 reply; 37+ messages in thread From: Moumita @ 2025-05-16 14:45 UTC (permalink / raw) To: git; +Cc: Moumita, Johannes Sixt, Eric Sunshine, Junio C Hamano I cleaned up the parameter expansion regexes to avoid matching single-character operators like `:` or `#` that are already tokenized. I also removed the `:[0-9]+(:[0-9]+)?` pattern, which didn’t have a clear use case. Variable matching now supports `$1`, `$2`, etc., not just `$foo`, by relaxing the first character requirement after the `$`. Also I removed the '?' after the second character of double character operators. There are two test files in t/4018 which test the capturing of the complete line in the hunk header in the cases of both posix style functions and bash style functions. Moumita Dhar (1): userdiff: extend Bash pattern to cover more shell function forms .../bash-bashism-style-complete-line-capture | 4 +++ .../bash-posix-style-complete-line-capture | 4 +++ .../bash-posix-style-single-command-function | 3 ++ t/t4034-diff-words.sh | 1 + t/t4034/bash/expect | 36 +++++++++++++++++++ t/t4034/bash/post | 31 ++++++++++++++++ t/t4034/bash/pre | 31 ++++++++++++++++ userdiff.c | 26 +++++++++----- 8 files changed, 128 insertions(+), 8 deletions(-) create mode 100644 t/t4018/bash-bashism-style-complete-line-capture create mode 100644 t/t4018/bash-posix-style-complete-line-capture create mode 100644 t/t4018/bash-posix-style-single-command-function create mode 100644 t/t4034/bash/expect create mode 100644 t/t4034/bash/post create mode 100644 t/t4034/bash/pre Range-diff against v6: 1: 464cb8a1eb ! 1: 314ac45fa2 userdiff: extend Bash pattern to cover more shell function forms @@ Commit message Signed-off-by: Moumita Dhar <dhar61595@gmail.com> - ## t/t4018/bash-bashism-style-multiline-function (new) ## + ## t/t4018/bash-bashism-style-complete-line-capture (new) ## @@ -+function RIGHT \ -+{ ++function myfunc # RIGHT ++{ + echo 'ChangeMe' +} - ## t/t4018/bash-hunk-header-complete-line-capture (new) ## + ## t/t4018/bash-posix-style-complete-line-capture (new) ## @@ +func() { # RIGHT + + ChangeMe -+} - - ## t/t4018/bash-posix-style-multiline-function (new) ## -@@ -+RIGHT() \ -+{ -+ ChangeMe +} ## t/t4018/bash-posix-style-single-command-function (new) ## @@ t/t4034/bash/expect (new) +<BOLD>index 09ac008..60ba6a2 100644<RESET> +<BOLD>--- a/pre<RESET> +<BOLD>+++ b/post<RESET> -+<CYAN>@@ -1,33 +1,33 @@<RESET> ++<CYAN>@@ -1,31 +1,31 @@<RESET> +<RED>my_var<RESET><GREEN>new_var<RESET>=10 +x=<RED>123<RESET><GREEN>456<RESET> -+y=<RED>3.14<RESET><GREEN>2.71<RESET> -+z=<RED>.5<RESET><GREEN>.75<RESET> ++echo <RED>$1<RESET><GREEN>$2<RESET> +echo <RED>$USER<RESET><GREEN>$USERNAME<RESET> +${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>} +((a<RED>+<RESET><GREEN>+=<RESET>b)) @@ t/t4034/bash/expect (new) +${a<RED>,<RESET><GREEN>,,<RESET>} +${<GREEN>!<RESET>a} +${a[<RED>*<RESET><GREEN>@<RESET>]} -+${a<RED>:2:3<RESET><GREEN>:4:6<RESET>} +ls <RED>-a<RESET><GREEN>-x<RESET> -+ls <RED>--a<RESET><GREEN>--x<RESET> ++ls <RED>--all<RESET><GREEN>--color<RESET> ## t/t4034/bash/post (new) ## @@ +new_var=10 +x=456 -+y=2.71 -+z=.75 ++echo $2 +echo $USERNAME +${HOMEDIR} +((a+=b)) @@ t/t4034/bash/post (new) +${a,,} +${!a} +${a[@]} -+${a:4:6} +ls -x -+ls --x ++ls --color ## t/t4034/bash/pre (new) ## @@ +my_var=10 +x=123 -+y=3.14 -+z=.5 ++echo $1 +echo $USER +${HOME} +((a+b)) @@ t/t4034/bash/pre (new) +${a,} +${a} +${a[*]} -+${a:2:3} +ls -a -+ls --a ++ls --all ## userdiff.c ## @@ userdiff.c: PATTERNS("bash", @@ userdiff.c: PATTERNS("bash", - "[^ \t]+"), + /* Identifiers: variable and function names */ + "[a-zA-Z_][a-zA-Z0-9_]*" -+ /* Numeric constants: integers and decimals */ -+ "|[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+" + /* Shell variables: $VAR, ${VAR} */ -+ "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{" -+ /* Logical and comparison operators */ -+ "|\\|\\||&&|<<|>>|==|!=|<=|>=" -+ /* Assignment and arithmetic operators */ -+ "|[-+*/%&|^!=<>]=?" ++ "|\\$[a-zA-Z0-9_]+|\\$\\{" ++ /*Command list separators and redirection operators */ ++ "|\\|\\||&&|<<|>>" ++ /* Operators ending in '=' (comparison + compound assignment) */ ++ "|==|!=|<=|>=|[-+*/%&|^]=" + /* Additional parameter expansion operators */ -+ "|:?=|:-|:\\+|:\\?|:|#|##|%|%%|\\^\\^?|,|,,?|!|@|:[0-9]+(:[0-9]+)?" ++ "|:=|:-|:\\+|:\\?|##|%%|\\^\\^|,," + /* Command-line options (to avoid splitting -option) */ -+ "|--?[a-zA-Z0-9_-]+" ++ "|[-a-zA-Z0-9_]+" + /* Brackets and grouping symbols */ + "|\\(|\\)|\\{|\\}|\\[|\\]"), PATTERNS("bibtex", -- 2.48.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH v7 1/1] userdiff: extend Bash pattern to cover more shell function forms 2025-05-16 14:45 ` [PATCH v7 0/1] Updated the word diff regex for Bash scripts Moumita @ 2025-05-16 14:45 ` Moumita 2025-05-16 17:45 ` Johannes Sixt 0 siblings, 1 reply; 37+ messages in thread From: Moumita @ 2025-05-16 14:45 UTC (permalink / raw) To: git; +Cc: Moumita Dhar, Johannes Sixt, Eric Sunshine, Junio C Hamano From: Moumita Dhar <dhar61595@gmail.com> The previous function regex required explicit matching of function bodies using `{`, `(`, `((`, or `[[`, which caused several issues: - It failed to capture valid functions where `{` was on the next line due to line continuation (`\`). - It did not recognize functions with single command body, such as `x () echo hello`. Replacing the function body matching logic with `.*$`, ensures that everything on the function definition line is captured. Additionally, the word regex is refined to better recognize shell syntax, including additional parameter expansion operators and command-line options. Signed-off-by: Moumita Dhar <dhar61595@gmail.com> --- .../bash-bashism-style-complete-line-capture | 4 +++ .../bash-posix-style-complete-line-capture | 4 +++ .../bash-posix-style-single-command-function | 3 ++ t/t4034-diff-words.sh | 1 + t/t4034/bash/expect | 36 +++++++++++++++++++ t/t4034/bash/post | 31 ++++++++++++++++ t/t4034/bash/pre | 31 ++++++++++++++++ userdiff.c | 26 +++++++++----- 8 files changed, 128 insertions(+), 8 deletions(-) create mode 100644 t/t4018/bash-bashism-style-complete-line-capture create mode 100644 t/t4018/bash-posix-style-complete-line-capture create mode 100644 t/t4018/bash-posix-style-single-command-function create mode 100644 t/t4034/bash/expect create mode 100644 t/t4034/bash/post create mode 100644 t/t4034/bash/pre diff --git a/t/t4018/bash-bashism-style-complete-line-capture b/t/t4018/bash-bashism-style-complete-line-capture new file mode 100644 index 0000000000..070b979fa6 --- /dev/null +++ b/t/t4018/bash-bashism-style-complete-line-capture @@ -0,0 +1,4 @@ +function myfunc # RIGHT +{ + echo 'ChangeMe' +} diff --git a/t/t4018/bash-posix-style-complete-line-capture b/t/t4018/bash-posix-style-complete-line-capture new file mode 100644 index 0000000000..b56942f322 --- /dev/null +++ b/t/t4018/bash-posix-style-complete-line-capture @@ -0,0 +1,4 @@ +func() { # RIGHT + + ChangeMe +} diff --git a/t/t4018/bash-posix-style-single-command-function b/t/t4018/bash-posix-style-single-command-function new file mode 100644 index 0000000000..398ae1c5d2 --- /dev/null +++ b/t/t4018/bash-posix-style-single-command-function @@ -0,0 +1,3 @@ +RIGHT() echo "hello" + + ChangeMe diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh index f51d3557f1..0be647c2fb 100755 --- a/t/t4034-diff-words.sh +++ b/t/t4034-diff-words.sh @@ -320,6 +320,7 @@ test_expect_success 'unset default driver' ' test_language_driver ada test_language_driver bibtex +test_language_driver bash test_language_driver cpp test_language_driver csharp test_language_driver css diff --git a/t/t4034/bash/expect b/t/t4034/bash/expect new file mode 100644 index 0000000000..1864ab25dc --- /dev/null +++ b/t/t4034/bash/expect @@ -0,0 +1,36 @@ +<BOLD>diff --git a/pre b/post<RESET> +<BOLD>index 09ac008..60ba6a2 100644<RESET> +<BOLD>--- a/pre<RESET> +<BOLD>+++ b/post<RESET> +<CYAN>@@ -1,31 +1,31 @@<RESET> +<RED>my_var<RESET><GREEN>new_var<RESET>=10 +x=<RED>123<RESET><GREEN>456<RESET> +echo <RED>$1<RESET><GREEN>$2<RESET> +echo <RED>$USER<RESET><GREEN>$USERNAME<RESET> +${<RED>HOME<RESET><GREEN>HOMEDIR<RESET>} +((a<RED>+<RESET><GREEN>+=<RESET>b)) +((a<RED>*<RESET><GREEN>*=<RESET>b)) +((a<RED>/<RESET><GREEN>/=<RESET>b)) +((a<RED>%<RESET><GREEN>%=<RESET>b)) +((a<RED>|<RESET><GREEN>|=<RESET>b)) +((a<RED>^<RESET><GREEN>^=<RESET>b)) +((a<RED>=<RESET><GREEN>==<RESET>b)) +((a<RED>!<RESET><GREEN>!=<RESET>b)) +((a<RED><<RESET><GREEN><=<RESET>b)) +((a<RED>><RESET><GREEN>>=<RESET>b)) +$((a<RED><<RESET><GREEN><<<RESET>b)) +$((a<RED>><RESET><GREEN>>><RESET>b)) +$((a<RED>&<RESET><GREEN>&&<RESET>b)) +$((a<RED>|<RESET><GREEN>||<RESET>b)) +${a<RED>:<RESET><GREEN>:-<RESET>b} +${a<RED>:<RESET><GREEN>:=<RESET>b} +${a<RED>:<RESET><GREEN>:+<RESET>b} +${a<RED>:<RESET><GREEN>:?<RESET>b} +${a<RED>#<RESET><GREEN>##<RESET>*/} +${a<RED>%<RESET><GREEN>%%<RESET>.*} +${a<RED>^<RESET><GREEN>^^<RESET>} +${a<RED>,<RESET><GREEN>,,<RESET>} +${<GREEN>!<RESET>a} +${a[<RED>*<RESET><GREEN>@<RESET>]} +ls <RED>-a<RESET><GREEN>-x<RESET> +ls <RED>--all<RESET><GREEN>--color<RESET> diff --git a/t/t4034/bash/post b/t/t4034/bash/post new file mode 100644 index 0000000000..2bbee8936d --- /dev/null +++ b/t/t4034/bash/post @@ -0,0 +1,31 @@ +new_var=10 +x=456 +echo $2 +echo $USERNAME +${HOMEDIR} +((a+=b)) +((a*=b)) +((a/=b)) +((a%=b)) +((a|=b)) +((a^=b)) +((a==b)) +((a!=b)) +((a<=b)) +((a>=b)) +$((a<<b)) +$((a>>b)) +$((a&&b)) +$((a||b)) +${a:-b} +${a:=b} +${a:+b} +${a:?b} +${a##*/} +${a%%.*} +${a^^} +${a,,} +${!a} +${a[@]} +ls -x +ls --color diff --git a/t/t4034/bash/pre b/t/t4034/bash/pre new file mode 100644 index 0000000000..8d22039c40 --- /dev/null +++ b/t/t4034/bash/pre @@ -0,0 +1,31 @@ +my_var=10 +x=123 +echo $1 +echo $USER +${HOME} +((a+b)) +((a*b)) +((a/b)) +((a%b)) +((a|b)) +((a^b)) +((a=b)) +((a!b)) +((a<b)) +((a>b)) +$((a<b)) +$((a>b)) +$((a&b)) +$((a|b)) +${a:b} +${a:b} +${a:b} +${a:b} +${a#*/} +${a%.*} +${a^} +${a,} +${a} +${a[*]} +ls -a +ls --all diff --git a/userdiff.c b/userdiff.c index 340c4eb4f7..7b18cbb669 100644 --- a/userdiff.c +++ b/userdiff.c @@ -59,20 +59,30 @@ PATTERNS("bash", "(" "(" /* POSIX identifier with mandatory parentheses */ - "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" + "([a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" "|" /* Bashism identifier with optional parentheses */ - "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" + "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+)))" ")" - /* Optional whitespace */ - "[ \t]*" - /* Compound command starting with `{`, `(`, `((` or `[[` */ - "(\\{|\\(\\(?|\\[\\[)" + /* Everything after the function header is captured */ + ".*$" /* End of captured text */ ")", /* -- */ - /* Characters not in the default $IFS value */ - "[^ \t]+"), + /* Identifiers: variable and function names */ + "[a-zA-Z_][a-zA-Z0-9_]*" + /* Shell variables: $VAR, ${VAR} */ + "|\\$[a-zA-Z0-9_]+|\\$\\{" + /*Command list separators and redirection operators */ + "|\\|\\||&&|<<|>>" + /* Operators ending in '=' (comparison + compound assignment) */ + "|==|!=|<=|>=|[-+*/%&|^]=" + /* Additional parameter expansion operators */ + "|:=|:-|:\\+|:\\?|##|%%|\\^\\^|,," + /* Command-line options (to avoid splitting -option) */ + "|[-a-zA-Z0-9_]+" + /* Brackets and grouping symbols */ + "|\\(|\\)|\\{|\\}|\\[|\\]"), PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$", /* -- */ -- 2.48.0 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH v7 1/1] userdiff: extend Bash pattern to cover more shell function forms 2025-05-16 14:45 ` [PATCH v7 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita @ 2025-05-16 17:45 ` Johannes Sixt 2025-05-16 21:56 ` Junio C Hamano 0 siblings, 1 reply; 37+ messages in thread From: Johannes Sixt @ 2025-05-16 17:45 UTC (permalink / raw) To: Moumita; +Cc: Eric Sunshine, Junio C Hamano, git All my comments have been addressed and this round looks good. Thank you! Acked-by: Johannes Sixt <j6t@kdbg.org> -- Hannes ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v7 1/1] userdiff: extend Bash pattern to cover more shell function forms 2025-05-16 17:45 ` Johannes Sixt @ 2025-05-16 21:56 ` Junio C Hamano 0 siblings, 0 replies; 37+ messages in thread From: Junio C Hamano @ 2025-05-16 21:56 UTC (permalink / raw) To: Johannes Sixt; +Cc: Moumita, Eric Sunshine, git Johannes Sixt <j6t@kdbg.org> writes: > All my comments have been addressed and this round looks good. Thank you! > > Acked-by: Johannes Sixt <j6t@kdbg.org> > > -- Hannes Thanks, both. Queued. ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2025-05-18 7:41 UTC | newest] Thread overview: 37+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-02-11 11:46 [PATCH 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Moumita 2025-02-11 11:46 ` [PATCH 1/1] Added built in function recognition for shell Moumita 2025-02-15 14:37 ` Johannes Sixt 2025-02-18 15:35 ` [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Moumita 2025-02-18 15:35 ` [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita 2025-02-18 19:30 ` Junio C Hamano 2025-02-22 18:15 ` Johannes Sixt 2025-02-24 16:28 ` Junio C Hamano 2025-02-18 23:38 ` Junio C Hamano 2025-02-22 18:14 ` Johannes Sixt 2025-02-18 17:30 ` [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Eric Sunshine 2025-03-28 20:05 ` [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns Moumita 2025-03-28 20:05 ` [PATCH v3 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita 2025-03-29 19:26 ` [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns Junio C Hamano 2025-03-30 12:28 ` MOUMITA DHAR 2025-03-30 13:39 ` [PATCH v4 0/1][GSOC] userdiff:Added newlines at the end of the test cases Moumita 2025-03-30 13:39 ` [PATCH v4 1/1][GSOC] userdiff: extend Bash pattern to cover more shell function forms Moumita 2025-05-02 21:27 ` Junio C Hamano 2025-05-06 16:30 ` Johannes Sixt 2025-05-10 11:37 ` MOUMITA DHAR 2025-05-10 12:40 ` Johannes Sixt 2025-05-11 12:58 ` [PATCH v5 0/1] Added the closing ")" to make sure is not unbalanced and corrected the tests for word diff Moumita 2025-05-11 12:58 ` [PATCH v5 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita 2025-05-11 13:28 ` Moumita 2025-05-11 13:28 ` Moumita 2025-05-11 13:37 ` Moumita 2025-05-11 14:11 ` [PATCH v6 0/1] Added the newline after the test in t/4018 Moumita 2025-05-11 14:11 ` [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita 2025-05-13 18:50 ` Junio C Hamano 2025-05-14 6:33 ` MOUMITA DHAR 2025-05-16 7:25 ` Johannes Sixt 2025-05-17 13:09 ` Junio C Hamano 2025-05-18 7:41 ` Johannes Sixt 2025-05-16 14:45 ` [PATCH v7 0/1] Updated the word diff regex for Bash scripts Moumita 2025-05-16 14:45 ` [PATCH v7 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita 2025-05-16 17:45 ` Johannes Sixt 2025-05-16 21:56 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).