All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>, dana@dana.is
Subject: Re: [PATCH] wildmatch: change behavior of "foo**bar" in WM_PATHNAME mode
Date: Mon, 29 Oct 2018 14:24:00 +0100	[thread overview]
Message-ID: <87h8h4lwcv.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20181027084823.23382-1-pclouds@gmail.com>


On Sat, Oct 27 2018, Nguyễn Thái Ngọc Duy wrote:

> In WM_PATHNAME mode (or FNM_PATHNAME), '*' does not match '/' and '**'
> can but only in three patterns:
>
> - '**/' matches zero or more leading directories
> - '/**/' matches zero or more directories in between
> - '/**' matches zero or more trailing directories/files
>
> When '**' is present but not in one of these patterns, the current
> behavior is consider the pattern invalid and stop matching. In other
> words, 'foo**bar' never matches anything, whatever you throw at it.
>
> This behavior is arguably a bit confusing partly because we can't
> really tell the user their pattern is invalid so that they can fix
> it. So instead, tolerate it and make '**' act like two regular '*'s
> (which is essentially the same as a single asterisk). This behavior
> seems more predictable.
>
> Noticed-by: dana <dana@dana.is>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  Documentation/gitignore.txt | 3 ++-
>  t/t3070-wildmatch.sh        | 4 ++--
>  wildmatch.c                 | 4 ++--
>  wildmatch.h                 | 1 -
>  4 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/gitignore.txt b/Documentation/gitignore.txt
> index d107daaffd..1c94f08ff4 100644
> --- a/Documentation/gitignore.txt
> +++ b/Documentation/gitignore.txt
> @@ -129,7 +129,8 @@ full pathname may have special meaning:
>     matches zero or more directories. For example, "`a/**/b`"
>     matches "`a/b`", "`a/x/b`", "`a/x/y/b`" and so on.
>
> - - Other consecutive asterisks are considered invalid.
> + - Other consecutive asterisks are considered regular asterisks and
> +   will match according to the previous rules.
>
>  NOTES
>  -----
> diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh
> index 46aca0af10..891d4d7cb9 100755
> --- a/t/t3070-wildmatch.sh
> +++ b/t/t3070-wildmatch.sh
> @@ -237,7 +237,7 @@ match 0 0 0 0 foobar 'foo\*bar'
>  match 1 1 1 1 'f\oo' 'f\\oo'
>  match 1 1 1 1 ball '*[al]?'
>  match 0 0 0 0 ten '[ten]'
> -match 0 0 1 1 ten '**[!te]'
> +match 1 1 1 1 ten '**[!te]'
>  match 0 0 0 0 ten '**[!ten]'
>  match 1 1 1 1 ten 't[a-g]n'
>  match 0 0 0 0 ten 't[!a-g]n'
> @@ -253,7 +253,7 @@ match 1 1 1 1 ']' ']'
>  # Extended slash-matching features
>  match 0 0 1 1 'foo/baz/bar' 'foo*bar'
>  match 0 0 1 1 'foo/baz/bar' 'foo**bar'
> -match 0 0 1 1 'foobazbar' 'foo**bar'
> +match 1 1 1 1 'foobazbar' 'foo**bar'
>  match 1 1 1 1 'foo/baz/bar' 'foo/**/bar'
>  match 1 1 0 0 'foo/baz/bar' 'foo/**/**/bar'
>  match 1 1 1 1 'foo/b/a/z/bar' 'foo/**/bar'
> diff --git a/wildmatch.c b/wildmatch.c
> index d074c1be10..9e9e2a2f95 100644
> --- a/wildmatch.c
> +++ b/wildmatch.c
> @@ -104,8 +104,8 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
>  					    dowild(p + 1, text, flags) == WM_MATCH)
>  						return WM_MATCH;
>  					match_slash = 1;
> -				} else
> -					return WM_ABORT_MALFORMED;
> +				} else /* WM_PATHNAME is set */
> +					match_slash = 0;
>  			} else
>  				/* without WM_PATHNAME, '*' == '**' */
>  				match_slash = flags & WM_PATHNAME ? 0 : 1;
> diff --git a/wildmatch.h b/wildmatch.h
> index b8c826aa68..5993696298 100644
> --- a/wildmatch.h
> +++ b/wildmatch.h
> @@ -4,7 +4,6 @@
>  #define WM_CASEFOLD 1
>  #define WM_PATHNAME 2
>
> -#define WM_ABORT_MALFORMED 2
>  #define WM_NOMATCH 1
>  #define WM_MATCH 0
>  #define WM_ABORT_ALL -1

This patch looks good to me, but I think it's a bad state of affairs to
keep changing these semantics and not having something like a
"gitwildmatch" doc were we document this matching syntax.

Also I still need to dig up the work for using PCRE as an alternate
matching engine, the PCRE devs produced a bug-for-bug compatible version
of our wildmatch function (all the more reason to document it), so I
think they'll need to change it now that this is in, but I haven't
rebased those ancient patches yet.

Do you have any thoughts on how to proceed with getting this documented
/ into some stable state where we can specify it? Even if we don't end
up using PCRE as a matching engine (sometimes it was faster, sometimes
slower) I think it would be very useful if we can spew out "here's your
pattern as a regex" for self-documentation purposes.

Then that can be piped into e.g. "perl -Mre=debug" to see a step-by-step
guide for how the pattern compiles, and why it does or doesn't match a
given thing.

  parent reply	other threads:[~2018-10-29 13:24 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-11 10:19 [BUG] gitignore documentation inconsistent with actual behaviour dana
2018-10-11 10:37 ` dana
2018-10-11 11:08 ` Ævar Arnfjörð Bjarmason
2018-10-14  2:14   ` dana
2018-10-14 12:15   ` Duy Nguyen
2018-10-14 22:56     ` Junio C Hamano
2018-10-15 15:27       ` Duy Nguyen
2018-10-20  5:26 ` Duy Nguyen
2018-10-20  5:53   ` dana
2018-10-20  6:03     ` Duy Nguyen
2018-10-20  6:26       ` dana
2018-10-27  8:48 ` [PATCH] wildmatch: change behavior of "foo**bar" in WM_PATHNAME mode Nguyễn Thái Ngọc Duy
2018-10-28  6:25   ` Torsten Bögershausen
2018-10-28  6:35     ` Duy Nguyen
2018-10-29  2:28       ` Junio C Hamano
2018-10-29 13:24   ` Ævar Arnfjörð Bjarmason [this message]
2018-10-29 15:53     ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h8h4lwcv.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=dana@dana.is \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.