git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Moumita <dhar61595@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms
Date: Tue, 18 Feb 2025 11:30:23 -0800	[thread overview]
Message-ID: <xmqqy0y3jbjk.fsf@gitster.g> (raw)
In-Reply-To: <20250218153537.16320-2-dhar61595@gmail.com> (Moumita's message of "Tue, 18 Feb 2025 21:05:27 +0530")

Moumita <dhar61595@gmail.com> writes:

>  PATTERNS("bash",
> -	 /* Optional leading indentation */
> +     /* Optional leading indentation */

What is this change about?

>  	 "^[ \t]*"
> -	 /* Start of captured text */
> +	 /* Start of captured function name */
>  	 "("
>  	 "("
> -	     /* POSIX identifier with mandatory parentheses */
> -	     "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))"
> +		 /* POSIX identifier with mandatory parentheses (allow spaces inside) */
> +		 "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\)"

Is indentation-change intended and required for this patch to work correctly?

>  	 "|"
> -	     /* Bashism identifier with optional parentheses */
> -	     "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))"
> +		 /* Bash-style function definitions, allowing optional `function` keyword */
> +		 "(?:function[ \t]+(?=[a-zA-Z_]))?[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))?"

Ditto.

Regular expressions are write-only language; please make sure that
you do not add any unnecessary changes to distract eyes of
reviewers from spotting the _real_ changes that improves the current
codebase.

>  	 ")"
>  	 /* Optional whitespace */
>  	 "[ \t]*"
> -	 /* Compound command starting with `{`, `(`, `((` or `[[` */
> -	 "(\\{|\\(\\(?|\\[\\[)"
> -	 /* End of captured text */
> +	 /* Allow function body to start with `{`, `(` (subshell), `[[` */
> +	 "(\\{|\\(|\\[\\[)"
> +	 /* End of captured function name */
>  	 ")",


>  	 /* -- */
> -	 /* Characters not in the default $IFS value */
> -	 "[^ \t]+"),

We used to pretty-much use "a run of non-whitespace characters is a
token".  Now we are a bit more picky.

Which may or may not be good, but it is hard to tell if it is an
improvement.

> +	 /* Identifiers: variable and function names */
> +	 "[a-zA-Z_][a-zA-Z0-9_]*"
> +	 /* Numeric constants: integers and decimals */
> +	 "|[-+]?[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+"
> +	 /* Shell variables: `$VAR`, `${VAR}` */
> +	 "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{[^}]+\\}"
> +	 /* Logical and comparison operators */
> +	 "|\\|\\||&&|<<|>>|==|!=|<=|>="
> +	 /* Assignment and arithmetic operators */
> +	 "|[-+*/%&|^!=<>]=?"
> +	 /* Command-line options (to avoid splitting `-option`) */
> +	 "|--?[a-zA-Z0-9_-]+"
> +	 /* Brackets and grouping symbols */
> +	 "|\\(|\\)|\\{|\\}|\\[|\\]"),

The fact that this patch does not have any changes to "t/" hierarchy
suggests me that we do not have existing tests to see how sample
text files in the supported languages are tokenized (otherwise the
above changes would require adjusting such existing tests), so I
think it should be left outside of this topic, but I wonder if
adding such tests gives us a good way to demonstrate the effect of
these changes to userdiff patterns.

Thanks.

  reply	other threads:[~2025-02-18 19:30 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-11 11:46 [PATCH 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Moumita
2025-02-11 11:46 ` [PATCH 1/1] Added built in function recognition for shell Moumita
2025-02-15 14:37   ` Johannes Sixt
2025-02-18 15:35 ` [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Moumita
2025-02-18 15:35   ` [PATCH v2 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
2025-02-18 19:30     ` Junio C Hamano [this message]
2025-02-22 18:15       ` Johannes Sixt
2025-02-24 16:28         ` Junio C Hamano
2025-02-18 23:38     ` Junio C Hamano
2025-02-22 18:14     ` Johannes Sixt
2025-02-18 17:30   ` [PATCH v2 0/1] [PATCH v2 0/1] [GSOC 2025] [Newbie] userdiff: add built-in pattern for shell scripts Eric Sunshine
2025-03-28 20:05   ` [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns Moumita
2025-03-28 20:05     ` [PATCH v3 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
2025-03-29 19:26     ` [PATCH v3 0/1] userdiff: improve Bash function and word regex patterns Junio C Hamano
2025-03-30 12:28       ` MOUMITA DHAR
2025-03-30 13:39     ` [PATCH v4 0/1][GSOC] userdiff:Added newlines at the end of the test cases Moumita
2025-03-30 13:39       ` [PATCH v4 1/1][GSOC] userdiff: extend Bash pattern to cover more shell function forms Moumita
2025-05-02 21:27         ` Junio C Hamano
2025-05-06 16:30         ` Johannes Sixt
2025-05-10 11:37           ` MOUMITA DHAR
2025-05-10 12:40             ` Johannes Sixt
2025-05-11 12:58       ` [PATCH v5 0/1] Added the closing ")" to make sure is not unbalanced and corrected the tests for word diff Moumita
2025-05-11 12:58         ` [PATCH v5 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
2025-05-11 13:28         ` Moumita
2025-05-11 13:28           ` Moumita
2025-05-11 13:37         ` Moumita
2025-05-11 14:11         ` [PATCH v6 0/1] Added the newline after the test in t/4018 Moumita
2025-05-11 14:11           ` [PATCH v6 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
2025-05-13 18:50             ` Junio C Hamano
2025-05-14  6:33               ` MOUMITA DHAR
2025-05-16  7:25             ` Johannes Sixt
2025-05-17 13:09               ` Junio C Hamano
2025-05-18  7:41                 ` Johannes Sixt
2025-05-16 14:45           ` [PATCH v7 0/1] Updated the word diff regex for Bash scripts Moumita
2025-05-16 14:45             ` [PATCH v7 1/1] userdiff: extend Bash pattern to cover more shell function forms Moumita
2025-05-16 17:45               ` Johannes Sixt
2025-05-16 21:56                 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqy0y3jbjk.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=dhar61595@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).