* git bug: Perl compatible regular expressions do not work as expected
@ 2023-03-25 12:31 Mario Grgic
2023-03-25 12:42 ` Kristoffer Haugsbakk
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: Mario Grgic @ 2023-03-25 12:31 UTC (permalink / raw)
To: git
Thank you for filling out a Git bug report!
Please answer the following questions to help us understand your issue.
What did you do before the bug happened? (Steps to reproduce your issue)
Initilize a new git repo, create a simple C file that you commit into the new
repo. Then search the git history using Perl compatible regular expression.
Basically,
mkdir test
cd test
git init
Create a file
cat <<END > test.c
int main(int argc, const char *argv[])
{
return 0;
}
END
git commit -m 'added test file'
Then run:
git log --all -p -G '\bmain\b'
git is compiled with pcre2 library support but does not find main word in the
file we just added.
What did you expect to happen? (Expected behavior)
git compiled with pcre2 library should support Perl compatible regular
expressions
What happened instead? (Actual behavior)
Nothing is found, when in fact the search for term is present in the git
history
What's different between what you expected and what actually happened?
git should print out the commit containing the search term.
Anything else you want to add:
git was compiled with pcre2 library. Here is output of otool -L
$ otool -L git
git:
/System/Library/Frameworks/CoreServices.framework/Versions/A/CoreServices (compatibility version 1.0.0, current version 1069.24.0)
/usr/local/lib/libpcre2-8.0.dylib (compatibility version 12.0.0, current version 12.2.0)
/usr/local/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.13)
/usr/lib/libiconv.2.dylib (compatibility version 7.0.0, current version 7.0.0)
/usr/local/lib/libintl.8.dylib (compatibility version 11.0.0, current version 11.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.100.1)
/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1677.104.0)
/usr/lib/libcharset.1.dylib (compatibility version 2.0.0, current version 2.0.0)
Please review the rest of the bug report below.
You can delete any lines you don't wish to share.
[System Info]
git version:
git version 2.40.0
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
feature: fsmonitor--daemon
uname: Darwin 19.6.0 Darwin Kernel Version 19.6.0: Tue Jun 21 21:18:39 PDT 2022; root:xnu-6153.141.66~1/RELEASE_X86_64 x86_64
compiler info: clang: 12.0.0 (clang-1200.0.32.21)
libc info: no libc information available
$SHELL (typically, interactive shell): /bin/bash
[Enabled Hooks]
not run from a git repository - no hooks to show
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: git bug: Perl compatible regular expressions do not work as expected 2023-03-25 12:31 git bug: Perl compatible regular expressions do not work as expected Mario Grgic @ 2023-03-25 12:42 ` Kristoffer Haugsbakk 2023-03-25 12:59 ` Mario Grgic 2023-03-25 13:04 ` demerphq ` (2 subsequent siblings) 3 siblings, 1 reply; 18+ messages in thread From: Kristoffer Haugsbakk @ 2023-03-25 12:42 UTC (permalink / raw) To: Mario Grgic; +Cc: git On Sat, Mar 25, 2023, at 13:31, Mario Grgic wrote: > What happened instead? (Actual behavior) > Nothing is found, when in fact the search for term is present in the git > history > > What's different between what you expected and what actually happened? > git should print out the commit containing the search term. Weird. It works for me. $ mkdir test $ cd test $ git init $ [add file] $ # This was missing from the instructions $ git add --all $ git commit -m 'added test file' $ git log --all -p -G '\bmain\b' commit 56fbac5e12649c4de95071cc1872569d7c34055e (HEAD -> main) Author: Kristoffer Haugsbakk <code@khaugsbakk.name> Date: Sat Mar 25 13:33:51 2023 +0100 added test file diff --git a/test.c b/test.c new file mode 100644 index 0000000..0bffa6a --- /dev/null +++ b/test.c @@ -0,0 +1,5 @@ +int main(int argc, const char *argv[]) +{ + return 0; +} + . [System Info] git version: git version 2.40.0 cpu: x86_64 no commit associated with this build sizeof-long: 8 sizeof-size_t: 8 shell-path: /bin/sh uname: Linux 5.4.0-144-generic #161~18.04.1-Ubuntu SMP Fri Feb 10 15:55:22 UTC 2023 x86_64 compiler info: gnuc: 7.5 libc info: glibc: 2.27 $SHELL (typically, interactive shell): /bin/bash [Enabled Hooks] -- Kristoffer Haugsbakk ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: git bug: Perl compatible regular expressions do not work as expected 2023-03-25 12:42 ` Kristoffer Haugsbakk @ 2023-03-25 12:59 ` Mario Grgic 0 siblings, 0 replies; 18+ messages in thread From: Mario Grgic @ 2023-03-25 12:59 UTC (permalink / raw) To: Kristoffer Haugsbakk; +Cc: git I did some bisecting and this is broken for me on git-2.39.2 as well, but works in git-2.38.4 and previous versions (tried 2.30.0 as well). > On Mar 25, 2023, at 8:42 AM, Kristoffer Haugsbakk <code@khaugsbakk.name> wrote: > > On Sat, Mar 25, 2023, at 13:31, Mario Grgic wrote: >> What happened instead? (Actual behavior) >> Nothing is found, when in fact the search for term is present in the git >> history >> >> What's different between what you expected and what actually happened? >> git should print out the commit containing the search term. > > Weird. It works for me. > > $ mkdir test > $ cd test > $ git init > $ [add file] > $ # This was missing from the instructions > $ git add --all > $ git commit -m 'added test file' > $ git log --all -p -G '\bmain\b' > commit 56fbac5e12649c4de95071cc1872569d7c34055e (HEAD -> main) > Author: Kristoffer Haugsbakk <code@khaugsbakk.name> > Date: Sat Mar 25 13:33:51 2023 +0100 > > added test file > > diff --git a/test.c b/test.c > new file mode 100644 > index 0000000..0bffa6a > --- /dev/null > +++ b/test.c > @@ -0,0 +1,5 @@ > +int main(int argc, const char *argv[]) > +{ > + return 0; > +} > + > > . > > [System Info] > git version: > git version 2.40.0 > cpu: x86_64 > no commit associated with this build > sizeof-long: 8 > sizeof-size_t: 8 > shell-path: /bin/sh > uname: Linux 5.4.0-144-generic #161~18.04.1-Ubuntu SMP Fri Feb 10 15:55:22 UTC 2023 x86_64 > compiler info: gnuc: 7.5 > libc info: glibc: 2.27 > $SHELL (typically, interactive shell): /bin/bash > > > [Enabled Hooks] > > -- > Kristoffer Haugsbakk ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: git bug: Perl compatible regular expressions do not work as expected 2023-03-25 12:31 git bug: Perl compatible regular expressions do not work as expected Mario Grgic 2023-03-25 12:42 ` Kristoffer Haugsbakk @ 2023-03-25 13:04 ` demerphq 2023-03-25 13:09 ` Mario Grgic 2023-03-25 14:16 ` Mario Grgic 2023-03-25 15:39 ` Mario Grgic 3 siblings, 1 reply; 18+ messages in thread From: demerphq @ 2023-03-25 13:04 UTC (permalink / raw) To: Mario Grgic; +Cc: git On Sat, 25 Mar 2023 at 14:00, Mario Grgic <mario_grgic@hotmail.com> wrote: > git log --all -p -G '\bmain\b' I think you want -P not -p. cheers, Yves ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: git bug: Perl compatible regular expressions do not work as expected 2023-03-25 13:04 ` demerphq @ 2023-03-25 13:09 ` Mario Grgic 2023-03-25 13:24 ` demerphq 2023-03-25 18:09 ` René Scharfe 0 siblings, 2 replies; 18+ messages in thread From: Mario Grgic @ 2023-03-25 13:09 UTC (permalink / raw) To: demerphq; +Cc: git The lowercase -p is to print the output in patch format. You can rewrite the command line as git log --all --patch --perl-regexp -G '\bmain\b’ I still get no output in any git version after 2.38.4 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: git bug: Perl compatible regular expressions do not work as expected 2023-03-25 13:09 ` Mario Grgic @ 2023-03-25 13:24 ` demerphq 2023-03-25 18:09 ` René Scharfe 1 sibling, 0 replies; 18+ messages in thread From: demerphq @ 2023-03-25 13:24 UTC (permalink / raw) To: Mario Grgic; +Cc: git On Sat, 25 Mar 2023 at 14:10, Mario Grgic <mario_grgic@hotmail.com> wrote: > > The lowercase -p is to print the output in patch format. You can rewrite the command line as > > git log --all --patch --perl-regexp -G '\bmain\b’ > > I still get no output in any git version after 2.38.4 RIght, sorry about that. My bad. Yves -- perl -Mre=debug -e "/just|another|perl|hacker/" ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: git bug: Perl compatible regular expressions do not work as expected 2023-03-25 13:09 ` Mario Grgic 2023-03-25 13:24 ` demerphq @ 2023-03-25 18:09 ` René Scharfe 2023-03-27 16:29 ` Junio C Hamano 1 sibling, 1 reply; 18+ messages in thread From: René Scharfe @ 2023-03-25 18:09 UTC (permalink / raw) To: Mario Grgic, demerphq; +Cc: git Am 25.03.23 um 14:09 schrieb Mario Grgic: > The lowercase -p is to print the output in patch format. You can rewrite the command line as > > git log --all --patch --perl-regexp -G '\bmain\b’ > > I still get no output in any git version after 2.38.4 -G doesn't support Perl regular expressions. --perl-regexp only affects --grep, --grep-reflog, --author, and --committer. Neither POSIX basic nor extended regular expressions support \b as word boundary. GNU regex and our compat/regex/ do, as extensions. macOS regex supports it if the flag REG_ENHANCED is given to regcomp(3). So perhaps this is rather a feature request to support Perl regular expressions for -G (and probably -S as well). Or to enable REG_ENHANCED for them, at least, like 54463d32ef (use enhanced basic regular expressions on macOS, 2023-01-08) did to get alternations for git grep on macOS. René ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: git bug: Perl compatible regular expressions do not work as expected 2023-03-25 18:09 ` René Scharfe @ 2023-03-27 16:29 ` Junio C Hamano 2023-03-27 17:23 ` René Scharfe 0 siblings, 1 reply; 18+ messages in thread From: Junio C Hamano @ 2023-03-27 16:29 UTC (permalink / raw) To: René Scharfe; +Cc: Mario Grgic, demerphq, git René Scharfe <l.s.r@web.de> writes: > Am 25.03.23 um 14:09 schrieb Mario Grgic: >> The lowercase -p is to print the output in patch format. You can rewrite the command line as >> >> git log --all --patch --perl-regexp -G '\bmain\b’ >> >> I still get no output in any git version after 2.38.4 > > -G doesn't support Perl regular expressions. --perl-regexp only affects > --grep, --grep-reflog, --author, and --committer. Neither POSIX basic > nor extended regular expressions support \b as word boundary. GNU regex > and our compat/regex/ do, as extensions. macOS regex supports it if the > flag REG_ENHANCED is given to regcomp(3). Good summary to unconfuse speculations in the thread. > So perhaps this is rather a feature request to support Perl regular > expressions for -G (and probably -S as well). Perhaps. I used to be a "it would be wonderful if pcre were usable everywhere" dreamer, but after seeing our share of bugs caused by use of pcre, I am not a huge proponent anymore. I do not object to such an enhancement at all, as long as it is done cleanly and in such a way that it is clear pcre cannot be used by accident when the user does not ask for it. > Or to enable REG_ENHANCED > for them, at least, like 54463d32ef (use enhanced basic regular > expressions on macOS, 2023-01-08) did to get alternations for git grep > on macOS. This one sounds like a reasonable thing, which may not have huge unintended fallout, to do. I am a bit surprised that we have to cover each individual callsite of regcomp(3), though. Doesn't the 54463d32ef fix use "#define regcomp git_regcomp" to cover everybody? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: git bug: Perl compatible regular expressions do not work as expected 2023-03-27 16:29 ` Junio C Hamano @ 2023-03-27 17:23 ` René Scharfe 2023-03-27 21:33 ` Junio C Hamano 0 siblings, 1 reply; 18+ messages in thread From: René Scharfe @ 2023-03-27 17:23 UTC (permalink / raw) To: Junio C Hamano; +Cc: Mario Grgic, demerphq, git Am 27.03.23 um 18:29 schrieb Junio C Hamano: > René Scharfe <l.s.r@web.de> writes: > >> Or to enable REG_ENHANCED >> for them, at least, like 54463d32ef (use enhanced basic regular >> expressions on macOS, 2023-01-08) did to get alternations for git grep >> on macOS. > > This one sounds like a reasonable thing, which may not have huge > unintended fallout, to do. I am a bit surprised that we have to > cover each individual callsite of regcomp(3), though. Doesn't the > 54463d32ef fix use "#define regcomp git_regcomp" to cover everybody? 54463d32ef only affects basic regular expressions (BRE), but -G and -S use extended ones (ERE). macOS allows both to be "enhanced". I have a hard time finding a readable reference to the documentation to brings us all onto the same page regarding what REG_ENHANCED actually does. [1] is in raw troff format, [2] isn't very pretty and shows ads. Anyway, the rather lengthy section "ENHANCED FEATURES" explains it. René [1] https://opensource.apple.com/source/Libc/Libc-1439.40.11/regex/FreeBSD/re_format.7.auto.html [2] https://www.unix.com/man-page/osx/7/re_format/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: git bug: Perl compatible regular expressions do not work as expected 2023-03-27 17:23 ` René Scharfe @ 2023-03-27 21:33 ` Junio C Hamano 2023-03-28 13:47 ` Junio C Hamano 0 siblings, 1 reply; 18+ messages in thread From: Junio C Hamano @ 2023-03-27 21:33 UTC (permalink / raw) To: René Scharfe; +Cc: Mario Grgic, demerphq, git René Scharfe <l.s.r@web.de> writes: > 54463d32ef only affects basic regular expressions (BRE), but -G and -S > use extended ones (ERE). macOS allows both to be "enhanced". > > I have a hard time finding a readable reference to the documentation to > brings us all onto the same page regarding what REG_ENHANCED actually > does. [1] is in raw troff format, [2] isn't very pretty and shows ads. Wow, [2] is also misleading in that it loses the distinction between '+' and '\+'; in [1] at least you can see the differences between \&+ and \e+ ;-) > Anyway, the rather lengthy section "ENHANCED FEATURES" explains it. I suspect that 54463d32ef was done in a conservative way to avoid unintended side effects to make ERE "enhanced". I am not 100% certain, but after reading the documentation you pointed at, I do not see a valid expression without ENHANCED flag starting to mean totally different thing with it (well, an extra '?' turning a pattern from greedy to minimal may count as such a change in semantics, but I do not see anybody sensible adding an extra '?' in a pattern in the first place). Perhaps it is safe to go ahead and loosen what 54463d32ef did, with something like this (this is not even compile tested, of course)? Makefile | 9 +++++++++ compat/regcomp_enhanced.c | 2 ++ git-compat-util.h | 2 +- 3 files changed, 12 insertions(+), 1 deletion(-) diff --git c/Makefile w/Makefile index 50ee51fde3..1fe0eb399e 100644 --- c/Makefile +++ w/Makefile @@ -293,6 +293,10 @@ include shared.mak # the flag REG_ENHANCED and you'd like to use it to enable enhanced basic # regular expressions. # +# Define USE_ENHANCED_REGULAR_EXPRESSIONS if your C library provides +# the flag REG_ENHANCED and you'd like to use it to enable enhanced +# regular expressions for both BRE and ERE. +# # Define HAVE_DEV_TTY if your system can open /dev/tty to interact with the # user. # @@ -2041,11 +2045,16 @@ ifdef NO_REGEX COMPAT_CFLAGS += -Icompat/regex COMPAT_OBJS += compat/regex/regex.o else +ifdef USE_ENHANCED_REGULAR_EXPRESSIONS + COMPAT_CFLAGS += -DUSE_ENHANCED_REGULAR_EXPRESSIONS + COMPAT_OBJS += compat/regcomp_enhanced.o +else ifdef USE_ENHANCED_BASIC_REGULAR_EXPRESSIONS COMPAT_CFLAGS += -DUSE_ENHANCED_BASIC_REGULAR_EXPRESSIONS COMPAT_OBJS += compat/regcomp_enhanced.o endif endif +endif ifdef NATIVE_CRLF BASIC_CFLAGS += -DNATIVE_CRLF endif diff --git c/compat/regcomp_enhanced.c w/compat/regcomp_enhanced.c index 84193ce53b..aadbbac6d3 100644 --- c/compat/regcomp_enhanced.c +++ w/compat/regcomp_enhanced.c @@ -3,7 +3,9 @@ int git_regcomp(regex_t *preg, const char *pattern, int cflags) { +#ifndef USE_ENHANCED_REGULAR_EXPRESSIONS if (!(cflags & REG_EXTENDED)) +#endif cflags |= REG_ENHANCED; return regcomp(preg, pattern, cflags); } diff --git c/git-compat-util.h w/git-compat-util.h index 1e6592624d..95d9b75a15 100644 --- c/git-compat-util.h +++ w/git-compat-util.h @@ -1377,7 +1377,7 @@ static inline int regexec_buf(const regex_t *preg, const char *buf, size_t size, return regexec(preg, buf, nmatch, pmatch, eflags | REG_STARTEND); } -#ifdef USE_ENHANCED_BASIC_REGULAR_EXPRESSIONS +#if defined(USE_ENHANCED_BASIC_REGULAR_EXPRESSIONS) || defined(USE_ENHANCED_REGULAR_EXPRESSIONS) int git_regcomp(regex_t *preg, const char *pattern, int cflags); #define regcomp git_regcomp #endif ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: git bug: Perl compatible regular expressions do not work as expected 2023-03-27 21:33 ` Junio C Hamano @ 2023-03-28 13:47 ` Junio C Hamano 2023-03-28 17:56 ` René Scharfe 0 siblings, 1 reply; 18+ messages in thread From: Junio C Hamano @ 2023-03-28 13:47 UTC (permalink / raw) To: René Scharfe; +Cc: Mario Grgic, demerphq, git Junio C Hamano <gitster@pobox.com> writes: > I suspect that 54463d32ef was done in a conservative way to avoid > unintended side effects to make ERE "enhanced". I am not 100% > certain, but after reading the documentation you pointed at, I do > not see a valid expression without ENHANCED flag starting to mean > totally different thing with it (well, an extra '?' turning a > pattern from greedy to minimal may count as such a change in > semantics, but I do not see anybody sensible adding an extra '?' > in a pattern in the first place). Sorry, but that is nonsense. We cannot avoid being backward incompatible if we suddenly flip the "enhanced" bit for BRE. A sane pattern written expecting non-enhanced BRE can change its meaning when the "enhanced" mode is enabled. But if "enhanced" is what users want, and if that is what the other tools on the platform use, then perhaps flipping the "enhanced" bit may not be a bad idea. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: git bug: Perl compatible regular expressions do not work as expected 2023-03-28 13:47 ` Junio C Hamano @ 2023-03-28 17:56 ` René Scharfe 0 siblings, 0 replies; 18+ messages in thread From: René Scharfe @ 2023-03-28 17:56 UTC (permalink / raw) To: Junio C Hamano; +Cc: Mario Grgic, demerphq, git Am 28.03.23 um 15:47 schrieb Junio C Hamano: > Junio C Hamano <gitster@pobox.com> writes: > >> I suspect that 54463d32ef was done in a conservative way to avoid >> unintended side effects to make ERE "enhanced". I am not 100% >> certain, but after reading the documentation you pointed at, I do >> not see a valid expression without ENHANCED flag starting to mean >> totally different thing with it (well, an extra '?' turning a >> pattern from greedy to minimal may count as such a change in >> semantics, but I do not see anybody sensible adding an extra '?' >> in a pattern in the first place). > > Sorry, but that is nonsense. We cannot avoid being backward > incompatible if we suddenly flip the "enhanced" bit for BRE. A > sane pattern written expecting non-enhanced BRE can change its > meaning when the "enhanced" mode is enabled. Right, and there are even more enhancements for ERE, i.e. more incompatibilities. Not sure how big of a problem that actually would be, though. > But if "enhanced" is what users want, and if that is what the other > tools on the platform use, then perhaps flipping the "enhanced" bit > may not be a bad idea. Apple's grep(1) uses it, with and without -E: https://github.com/apple-oss-distributions/text_cmds/search?q=REG_ENHANCED E.g. awk(1) and sed(1) don't, AFAICS. René ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: git bug: Perl compatible regular expressions do not work as expected 2023-03-25 12:31 git bug: Perl compatible regular expressions do not work as expected Mario Grgic 2023-03-25 12:42 ` Kristoffer Haugsbakk 2023-03-25 13:04 ` demerphq @ 2023-03-25 14:16 ` Mario Grgic 2023-03-25 15:39 ` Mario Grgic 3 siblings, 0 replies; 18+ messages in thread From: Mario Grgic @ 2023-03-25 14:16 UTC (permalink / raw) To: git According to git bisect, this got broken in the following commit: commit 1819ad327b7a1f19540a819813b70a0e8a7f798f Author: Diomidis Spinellis <dds@aueb.gr> Date: Fri Aug 26 11:58:15 2022 +0300 grep: fix multibyte regex handling under macOS The commit 29de20504e (Makefile: fix default regex settings on Darwin, 2013-05-11) fixed t0070-fundamental.sh under Darwin (macOS) by adopting Git's regex library. However, this library is compiled with NO_MBSUPPORT, which causes git-grep to work incorrectly on multibyte (e.g. UTF-8) files. Current macOS versions pass t0070-fundamental.sh with the native macOS regex library, which also supports multibyte characters. Adjust the Makefile to use the native regex library, and call setlocale(3) to set CTYPE according to the user's preference. The setlocale call is required on all platforms, but in platforms supporting gettext(3), setlocale was called as a side-effect of initializing gettext. Therefore, move the CTYPE setlocale call from gettext.c to common-main.c and the corresponding locale.h include into git-compat-util.h. Thanks to the global initialization of CTYPE setlocale, the test-tool regex command now works correctly with supported multibyte regexes, and is used to set the MB_REGEX test prerequisite by assessing a platform's support for them. Signed-off-by: Diomidis Spinellis <dds@aueb.gr> Signed-off-by: Junio C Hamano <gitster@pobox.com> Makefile | 2 +- common-main.c | 1 + gettext.c | 2 -- git-compat-util.h | 1 + t/t7810-grep.sh | 15 +++++++++++++++ 5 files changed, 18 insertions(+), 3 deletions(-) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: git bug: Perl compatible regular expressions do not work as expected 2023-03-25 12:31 git bug: Perl compatible regular expressions do not work as expected Mario Grgic ` (2 preceding siblings ...) 2023-03-25 14:16 ` Mario Grgic @ 2023-03-25 15:39 ` Mario Grgic 2023-03-27 16:30 ` Junio C Hamano 3 siblings, 1 reply; 18+ messages in thread From: Mario Grgic @ 2023-03-25 15:39 UTC (permalink / raw) To: git Confirming that putting back NO_REGEX = YesPlease in the Makefile fixes the problem. I.e. the following patch fixes it for me: --- Makefile 2023-03-25 11:24:01.000000000 -0400 +++ Makefile.patched 2023-03-25 11:25:11.000000000 -0400 @@ -1554,6 +1554,7 @@ APPLE_COMMON_CRYPTO = YesPlease COMPAT_CFLAGS += -DAPPLE_COMMON_CRYPTO endif + NO_REGEX = YesPlease PTHREAD_LIBS = endif Perhaps the real problem is that the build system should not use macOS native regex library is its configured to use —pcre2 library. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: git bug: Perl compatible regular expressions do not work as expected 2023-03-25 15:39 ` Mario Grgic @ 2023-03-27 16:30 ` Junio C Hamano 2023-03-27 17:22 ` Mario Grgic 0 siblings, 1 reply; 18+ messages in thread From: Junio C Hamano @ 2023-03-27 16:30 UTC (permalink / raw) To: Mario Grgic; +Cc: git Mario Grgic <mario_grgic@hotmail.com> writes: > Confirming that putting back NO_REGEX = YesPlease in the Makefile fixes the problem. I.e. the following patch fixes it for me: > > --- Makefile 2023-03-25 11:24:01.000000000 -0400 > +++ Makefile.patched 2023-03-25 11:25:11.000000000 -0400 > @@ -1554,6 +1554,7 @@ > APPLE_COMMON_CRYPTO = YesPlease > COMPAT_CFLAGS += -DAPPLE_COMMON_CRYPTO > endif > + NO_REGEX = YesPlease > PTHREAD_LIBS = > endif It will unfortunately break multibyte support on macOS by reverting what 1819ad32 (grep: fix multibyte regex handling under macOS, 2022-08-26) did. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: git bug: Perl compatible regular expressions do not work as expected 2023-03-27 16:30 ` Junio C Hamano @ 2023-03-27 17:22 ` Mario Grgic 2023-03-27 21:11 ` Junio C Hamano 0 siblings, 1 reply; 18+ messages in thread From: Mario Grgic @ 2023-03-27 17:22 UTC (permalink / raw) To: Junio C Hamano; +Cc: git In my case, I compiled git with pcre2 support, using third party PCRE2 library: https://github.com/PCRE2Project/pcre2 and PCRE and multibyte support in git works with it just fine. E.g. adding and committing a file: ``` #!/usr/bin/env node console.log('顔🏁'); ``` and searching it with git log --all -p -G '顔🏁’ works. Again, perhaps the solution here is to distinguish two cases: 1. git is compiled with native regex library on macOS 2. git is compiled with third party regex library with PCRE and multibyte support on macOS in case 2 everything works. Case 1 is perhaps more work on macOS and a feature request. > On Mar 27, 2023, at 12:30 PM, Junio C Hamano <gitster@pobox.com> wrote: > > Mario Grgic <mario_grgic@hotmail.com> writes: > >> Confirming that putting back NO_REGEX = YesPlease in the Makefile fixes the problem. I.e. the following patch fixes it for me: >> >> --- Makefile 2023-03-25 11:24:01.000000000 -0400 >> +++ Makefile.patched 2023-03-25 11:25:11.000000000 -0400 >> @@ -1554,6 +1554,7 @@ >> APPLE_COMMON_CRYPTO = YesPlease >> COMPAT_CFLAGS += -DAPPLE_COMMON_CRYPTO >> endif >> + NO_REGEX = YesPlease >> PTHREAD_LIBS = >> endif > > It will unfortunately break multibyte support on macOS by reverting > what 1819ad32 (grep: fix multibyte regex handling under macOS, > 2022-08-26) did. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: git bug: Perl compatible regular expressions do not work as expected 2023-03-27 17:22 ` Mario Grgic @ 2023-03-27 21:11 ` Junio C Hamano 2023-03-28 0:03 ` Mario Grgic 0 siblings, 1 reply; 18+ messages in thread From: Junio C Hamano @ 2023-03-27 21:11 UTC (permalink / raw) To: Mario Grgic; +Cc: git Mario Grgic <mario_grgic@hotmail.com> writes: [administrivia: do not top post] >> Mario Grgic <mario_grgic@hotmail.com> writes: >> ... >>> + NO_REGEX = YesPlease >>> PTHREAD_LIBS = >>> endif >> >> It will unfortunately break multibyte support on macOS by reverting >> what 1819ad32 (grep: fix multibyte regex handling under macOS, >> 2022-08-26) did. > In my case, I compiled git with pcre2 support, using third party > PCRE2 library: https://github.com/PCRE2Project/pcre2 and PCRE and > multibyte support in git works with it just fine. Sorry, you misunderstood. 1819ad32 is about enabling multi-byte support for normal regexp types, and does not have anything to do with pcre. By setting NO_REGEX, the build will not link with Apple's regex library but the one from compat/ and that version is what is used for -G and -E (not -P). -G/-E patterns with multi-byte would not work with compat/ stuff, but they should work when linked with Apple's regex library. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: git bug: Perl compatible regular expressions do not work as expected 2023-03-27 21:11 ` Junio C Hamano @ 2023-03-28 0:03 ` Mario Grgic 0 siblings, 0 replies; 18+ messages in thread From: Mario Grgic @ 2023-03-28 0:03 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Interesting. I could be wrong. Let me clarify what I did and what I observe. I edited the Makefile and put back the NO_REGEX = YesPlease line. Then I configured the build as ./configure —with-libpcre2=/usr/local then built and installed git. otool -L git shows git binary is dynamically linking the libpcre in /usr/local However, it looks like this third party library is also used for -G searches (plain POSIX regex or PCRE), since it accepts and correctly finds things like ‘\btext\b’. That is git log —all -p -G ‘\main\b’ works Furthermore, -G works with multibyte strings as well: git log --all -p -G '顔🏁’ What doesn’t work is multibyte strings with PCRESs, so for example this does not work: git log --all -p -G ‘\b顔🏁\b’ This doesn’t seem to be consistent with the thesis that regex library from compat is used for -G searches and that it doesn’t support multibyte strings? > On Mar 27, 2023, at 5:11 PM, Junio C Hamano <gitster@pobox.com> wrote: > > Mario Grgic <mario_grgic@hotmail.com> writes: > > [administrivia: do not top post] > >>> Mario Grgic <mario_grgic@hotmail.com> writes: >>> ... >>>> + NO_REGEX = YesPlease >>>> PTHREAD_LIBS = >>>> endif >>> >>> It will unfortunately break multibyte support on macOS by reverting >>> what 1819ad32 (grep: fix multibyte regex handling under macOS, >>> 2022-08-26) did. > >> In my case, I compiled git with pcre2 support, using third party >> PCRE2 library: https://github.com/PCRE2Project/pcre2 and PCRE and >> multibyte support in git works with it just fine. > > Sorry, you misunderstood. 1819ad32 is about enabling multi-byte > support for normal regexp types, and does not have anything to do > with pcre. By setting NO_REGEX, the build will not link with > Apple's regex library but the one from compat/ and that version is > what is used for -G and -E (not -P). -G/-E patterns with multi-byte > would not work with compat/ stuff, but they should work when linked > with Apple's regex library. > ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2023-03-28 17:56 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-03-25 12:31 git bug: Perl compatible regular expressions do not work as expected Mario Grgic 2023-03-25 12:42 ` Kristoffer Haugsbakk 2023-03-25 12:59 ` Mario Grgic 2023-03-25 13:04 ` demerphq 2023-03-25 13:09 ` Mario Grgic 2023-03-25 13:24 ` demerphq 2023-03-25 18:09 ` René Scharfe 2023-03-27 16:29 ` Junio C Hamano 2023-03-27 17:23 ` René Scharfe 2023-03-27 21:33 ` Junio C Hamano 2023-03-28 13:47 ` Junio C Hamano 2023-03-28 17:56 ` René Scharfe 2023-03-25 14:16 ` Mario Grgic 2023-03-25 15:39 ` Mario Grgic 2023-03-27 16:30 ` Junio C Hamano 2023-03-27 17:22 ` Mario Grgic 2023-03-27 21:11 ` Junio C Hamano 2023-03-28 0:03 ` Mario Grgic
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox