git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* macOS git grep change in required character classes
@ 2023-04-01 15:50 Matt Gardner
  2023-04-01 16:27 ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Matt Gardner @ 2023-04-01 15:50 UTC (permalink / raw)
  To: git

Thank you for filling out a Git bug report!

Please answer the following questions to help us understand your issue.

What did you do before the bug happened? (Steps to reproduce your issue)

$ mkdir git-grep-test
$ cd git-grep-test
$ git init
$ echo "sub test { return; }" > test.pl
$ git grep --untracked -E \\btest\\b

What did you expect to happen? (Expected behavior)

It should find the results like the following:

test.pl:sub test { return; }

What happened instead? (Actual behavior)

No results found

What's different between what you expected and what actually happened?

To get results, you have to use BSD style character classes:

$ git grep --untracked -E \[\[:\<:\]\]test\[\[:\>:\]\]

test.pl:sub test { return; }

Anything else you want to add:

Testing in both git 2.24 and 2.34 (the only other version I have
access to at the moment), \\b and other GNU style character classes
return results.

My best guess is that
https://github.com/git/git/commit/1819ad327b7a1f19540a819813b70a0e8a7f798f
is causing git grep -E to require BSD style regular expression
character classes.  I don't know if this is a bug or an unadvertised
change in behavior.  In either case, it is frustrating.  Any person or
tool would have to know which version of git they have and which
operating system they are on to get results.

Please review the rest of the bug report below.

You can delete any lines you don't wish to share.
[System Info]
git version:
git version 2.40.0
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
feature: fsmonitor--daemon
uname: Darwin 19.6.0 Darwin Kernel Version 19.6.0: Tue Jun 21 21:18:39
PDT 2022; root:xnu-6153.141.66~1/RELEASE_X86_64 x86_64
compiler info: clang: 12.0.0 (clang-1200.0.32.29)
libc info: no libc information available
$SHELL (typically, interactive shell): /bin/bash


[Enabled Hooks]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: macOS git grep change in required character classes
  2023-04-01 15:50 macOS git grep change in required character classes Matt Gardner
@ 2023-04-01 16:27 ` Junio C Hamano
  2023-04-01 17:17   ` Matt Gardner
  0 siblings, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2023-04-01 16:27 UTC (permalink / raw)
  To: Matt Gardner; +Cc: git

Matt Gardner <four712@gmail.com> writes:

> My best guess is that
> https://github.com/git/git/commit/1819ad327b7a1f19540a819813b70a0e8a7f798f
> is causing git grep -E to require BSD style regular expression
> character classes.  I don't know if this is a bug or an unadvertised
> change in behavior.

I think you diagnosed it correctly.  The story is "Once upon a time,
we declared that the regex library of macOS is so broken and
unusable.  We used a fallback definition to work it around, but
unfortunately the fallback library did not support multi-byte
matching correctly, which made some folks on macOS unhappy.  So we
let Git built with the regex library shipped with macOS starting
that commit, with one side effect that patterns you would feed Git
on that platform would behave more like patterns you give to other
tools on the platform."

So, it is not a bug in Git, it is a deliberate change in behaviour
with unintended consequences X-<.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: macOS git grep change in required character classes
  2023-04-01 16:27 ` Junio C Hamano
@ 2023-04-01 17:17   ` Matt Gardner
  2023-04-01 17:56     ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Matt Gardner @ 2023-04-01 17:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Even though the grep that ships with macOS is GNU compatible?

$ which grep
/usr/bin/grep
$ grep -V
grep (BSD grep) 2.5.1-FreeBSD
$ grep -E \\btest\\b test.pl
sub test { return; }

So, it isn't quite true that it makes it like other tools, especially
the most analogous tool, grep itself.

I think it would be a reasonable expectation that git grep and system
grep behave in the same manner.

On Sat, Apr 1, 2023 at 12:27 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matt Gardner <four712@gmail.com> writes:
>
> > My best guess is that
> > https://github.com/git/git/commit/1819ad327b7a1f19540a819813b70a0e8a7f798f
> > is causing git grep -E to require BSD style regular expression
> > character classes.  I don't know if this is a bug or an unadvertised
> > change in behavior.
>
> I think you diagnosed it correctly.  The story is "Once upon a time,
> we declared that the regex library of macOS is so broken and
> unusable.  We used a fallback definition to work it around, but
> unfortunately the fallback library did not support multi-byte
> matching correctly, which made some folks on macOS unhappy.  So we
> let Git built with the regex library shipped with macOS starting
> that commit, with one side effect that patterns you would feed Git
> on that platform would behave more like patterns you give to other
> tools on the platform."
>
> So, it is not a bug in Git, it is a deliberate change in behaviour
> with unintended consequences X-<.
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: macOS git grep change in required character classes
  2023-04-01 17:17   ` Matt Gardner
@ 2023-04-01 17:56     ` Junio C Hamano
  0 siblings, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2023-04-01 17:56 UTC (permalink / raw)
  To: Matt Gardner; +Cc: git

Matt Gardner <four712@gmail.com> writes:

> Even though the grep that ships with macOS is GNU compatible?
>
> $ which grep
> /usr/bin/grep
> $ grep -V
> grep (BSD grep) 2.5.1-FreeBSD
> $ grep -E \\btest\\b test.pl
> sub test { return; }

It seems that use of REG_ENHANCED bit (which gives some GNUism
enhancement to regex engine of BSD origin) is inconsistent even
among tools shipped by Apple,

cf. https://lore.kernel.org/git/4e03ea47-b0aa-d69e-6c54-fcbadb3b0641@web.de/

which may even contribute to the confusion.

I think we recently (of course this is after we stopped doing
NO_REGEX and switched to macOS native regex library) started using
the ENHANCED bit only for BRE and we do not use ENHANCED bit for
ERE, and the cited thread (which has "pcre" on the subject, but it
turns out that the symptom had nothing to do with pcre) discussed
possible use of the same enhanced bit for ERE by us.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-04-01 17:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-01 15:50 macOS git grep change in required character classes Matt Gardner
2023-04-01 16:27 ` Junio C Hamano
2023-04-01 17:17   ` Matt Gardner
2023-04-01 17:56     ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).