From: Junio C Hamano <gitster@pobox.com>
To: Diomidis Spinellis <dds@aueb.gr>
Cc: git@vger.kernel.org, Eric Sunshine <sunshine@sunshineco.com>,
avarab@gmail.com
Subject: Re: [PATCH v3] grep: fix multibyte regex handling under macOS
Date: Thu, 25 Aug 2022 17:20:30 -0700 [thread overview]
Message-ID: <xmqqpmgn26up.fsf@gitster.g> (raw)
In-Reply-To: 20220825082045.2662893-1-dds@aueb.gr
Diomidis Spinellis <dds@aueb.gr> writes:
> diff --git a/Makefile b/Makefile
> index 04d0fd1fe6..d1a9825715 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1427,7 +1427,6 @@ ifeq ($(uname_S),Darwin)
> APPLE_COMMON_CRYPTO = YesPlease
> COMPAT_CFLAGS += -DAPPLE_COMMON_CRYPTO
> endif
> - NO_REGEX = YesPlease
> PTHREAD_LIBS =
> endif
>
> @@ -2970,6 +2969,7 @@ GIT-BUILD-OPTIONS: FORCE
> @echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+
> @echo NO_PTHREADS=\''$(subst ','\'',$(subst ','\'',$(NO_PTHREADS)))'\' >>$@+
> @echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+
> + @echo NO_REGEX=\''$(subst ','\'',$(subst ','\'',$(NO_REGEX)))'\' >>$@+
> @echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+
> @echo PAGER_ENV=\''$(subst ','\'',$(subst ','\'',$(PAGER_ENV)))'\' >>$@+
> @echo DC_SHA1=\''$(subst ','\'',$(subst ','\'',$(DC_SHA1)))'\' >>$@+
Build part looks good to me.
> diff --git a/common-main.c b/common-main.c
> index c531372f3f..0a22861f1c 100644
> --- a/common-main.c
> +++ b/common-main.c
> @@ -40,6 +40,7 @@ int main(int argc, const char **argv)
>
> git_resolve_executable_dir(argv[0]);
>
> + setlocale(LC_CTYPE, "");
> git_setup_gettext();
>
> initialize_the_repository();
> diff --git a/gettext.c b/gettext.c
> index bb5ba1fe7c..f139008d0a 100644
> --- a/gettext.c
> +++ b/gettext.c
> @@ -10,7 +10,6 @@
> #include "config.h"
>
> #ifndef NO_GETTEXT
> -# include <locale.h>
> # include <libintl.h>
> # ifdef GIT_WINDOWS_NATIVE
>
> @@ -80,7 +79,6 @@ static int test_vsnprintf(const char *fmt, ...)
>
> static void init_gettext_charset(const char *domain)
> {
> - setlocale(LC_CTYPE, "");
> charset = locale_charset();
> bind_textdomain_codeset(domain, charset);
>
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 58d7708296..c6fa3c7469 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -212,6 +212,7 @@
> #endif
> #include <errno.h>
> #include <limits.h>
> +#include <locale.h>
> #ifdef NEEDS_SYS_PARAM_H
> #include <sys/param.h>
> #endif
I'll let others more familiar with the locale support to comment on
these changes. We are unconditionally including <locale.h> now;
before platforms that lack locale.h can set NO_GETTEXT but that will
no longer work as a "workaround" for them. I do not know if thta is
a practical downside to anybody, but it could be a problem.
> diff --git a/t/t7818-grep-multibyte.sh b/t/t7818-grep-multibyte.sh
> new file mode 100755
> index 0000000000..a3889f9822
> --- /dev/null
> +++ b/t/t7818-grep-multibyte.sh
Do we need a new test script for this?
> @@ -0,0 +1,34 @@
> +#!/bin/sh
> +
> +test_description='grep multibyte characters'
> +
> +. ./test-lib.sh
> +
> +# Multibyte regex search is only supported with a native regex library
> +# that supports it.
> +# (The supplied compatibility library is compiled with NO_MBSUPPORT.)
This file is not specific to Darwin; "... with a native regex
library" etc. is not something we want to see here.
> +test -z "$NO_REGEX" &&
> + LC_ALL=en_US.UTF-8 test-tool regex '^.$' '¿' &&
> + test_set_prereq MB_REGEX
We can safely drop 'test -z "$NO_REGEX" &&' part here, no? Even if
we omit it, those who built with NO_REGEX would fail "test-tool
regex" step above. And by omitting $NO_REGEX check, we do not have
to look for and update the condition when the fallback regex engine
we use starts supporting MB_REGEX.
> +if ! test_have_prereq MB_REGEX
> +then
> + skip_all='multibyte grep tests; Git compiled with NO_REGEX, NO_MBSUPPORT'
> + test_done
> +fi
I do not think if we need a single use prereq here. We can just
use whatever condition that is used to set MB_REGEX above and do the
skip-all thing here.
> +test_expect_success 'setup' '
> + test_write_lines "¿" >file &&
> + git add file &&
> + LC_ALL="en_US.UTF-8" &&
> + export LC_ALL
> +'
Missing inter-test blank line.
> +test_expect_success 'grep exactly one char in single-char multibyte file' '
> + git grep "^.$"
> +'
I am not sure how much value we are getting out of this test, which
is identical to what we already tested earlier above with "test-tool
regex".
> +test_expect_success 'grep two chars in single-char multibyte file' '
> + test_expect_code 1 git grep ".."
> +'
> +
> +test_done
next prev parent reply other threads:[~2022-08-26 0:20 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <pull.1313.v2.git.git.1661289990205.gitgitgadget () gmail ! com>
2022-08-25 8:20 ` [PATCH v3] grep: fix multibyte regex handling under macOS Diomidis Spinellis
2022-08-26 0:20 ` Junio C Hamano [this message]
2022-08-26 8:58 ` [PATCH v4] " Diomidis Spinellis
2022-09-13 17:32 ` Junio C Hamano
2022-09-13 18:09 ` Diomidis Spinellis
2022-09-13 20:41 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqpmgn26up.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=avarab@gmail.com \
--cc=dds@aueb.gr \
--cc=git@vger.kernel.org \
--cc=sunshine@sunshineco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.