All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Karsten Blees via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Karsten Blees <blees@dcon.de>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: Re: [PATCH v2 1/1] gettext: always use UTF-8 on native Windows
Date: Fri, 05 Jul 2019 00:53:52 +0200	[thread overview]
Message-ID: <87o92976nz.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <2d2253faef14e5157f8aac4534d9ac9640f3d5fa.1562186762.git.gitgitgadget@gmail.com>


On Wed, Jul 03 2019, Karsten Blees via GitGitGadget wrote:

> From: Karsten Blees <blees@dcon.de>
>
> On native Windows, Git exclusively uses UTF-8 for console output (both
> with MinTTY and native Win32 Console). Gettext uses `setlocale()` to
> determine the output encoding for translated text, however, MSVCRT's
> `setlocale()` does not support UTF-8. As a result, translated text is
> encoded in system encoding (as per `GetAPC()`), and non-ASCII chars are
> mangled in console output.
>
> Side note: There is actually a code page for UTF-8: 65001. In practice,
> it does not work as expected at least on Windows 7, though, so we cannot
> use it in Git. Besides, if we overrode the code page, any process
> spawned from Git would inherit that code page (as opposed to the code
> page configured for the current user), which would quite possibly break
> e.g. diff or merge helpers. So we really cannot override the code page.
>
> In `init_gettext_charset()`, Git calls gettext's
> `bind_textdomain_codeset()` with the character set obtained via
> `locale_charset()`; Let's override that latter function to force the
> encoding to UTF-8 on native Windows.
>
> In Git for Windows' SDK, there is a `libcharset.h` and therefore we
> define `HAVE_LIBCHARSET_H` in the MINGW-specific section in
> `config.mak.uname`, therefore we need to add the override before that
> conditionally-compiled code block.
>
> Rather than simply defining `locale_charset()` to return the string
> `"UTF-8"`, though, we are careful not to break `LC_ALL=C`: the
> `ab/no-kwset` patch series, for example, needs to have a way to prevent
> Git from expecting UTF-8-encoded input.

It's not just the ab/no-kwset I have cooking (but happy to have this
take that into account), but also anything grep-like is usually must
faster with LC_ALL=C. Isn't that also the case on Windows? Setting
locales affects a large variety of libc functions and third party
libraries (e.g. PCRE via us setting "use UTF-8" under locale).

> Signed-off-by: Karsten Blees <blees@dcon.de>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  gettext.c | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/gettext.c b/gettext.c
> index d4021d690c..3f2aca5c3b 100644
> --- a/gettext.c
> +++ b/gettext.c
> @@ -12,7 +12,25 @@
>  #ifndef NO_GETTEXT
>  #	include <locale.h>
>  #	include <libintl.h>
> -#	ifdef HAVE_LIBCHARSET_H
> +#	ifdef GIT_WINDOWS_NATIVE
> +
> +static const char *locale_charset(void)
> +{
> +	const char *env = getenv("LC_ALL"), *dot;
> +
> +	if (!env || !*env)
> +		env = getenv("LC_CTYPE");
> +	if (!env || !*env)
> +		env = getenv("LANG");
> +
> +	if (!env)
> +		return "UTF-8";
> +
> +	dot = strchr(env, '.');
> +	return !dot ? env : dot + 1;
> +}
> +
> +#	elif defined HAVE_LIBCHARSET_H
>  #		include <libcharset.h>
>  #	else
>  #		include <langinfo.h>

I'll take it on faith that this is what the locale_charset() should look
like.

I wonder if it wouldn't be better to always compile this function, and
just have init_gettext_charset() switch between the two. We've moved
more towards that sort of thing (e.g. with pthreads). I.e. prefer
redundant compilation to ifdefing platform-only code (which then only
gets compiled there). See "HAVE_THREADS" in the code.

It looks to me that with this patch the HAVE_LIBCHARSET_H docs in
"Makefile" become wrong. Shouldn't those be updated too?

We also still pass -DHAVE_LIBCHARSET_H to every file we compile, only to
never use it under GIT_WINDOWS_NATIVE, but perhaps fixing that isn't
possible with GIT_WINDOWS_NATIVE being a macro, and perhaps I've again
gotten the "native" v.s. "mingw" etc. relationship wrong in my head and
the HAVE_LIBCHARSET_H docs are fine.

It just seems wrong that we have both the configure script &
config.mak.uname look for / declare that we have libcharset.h, only to
at this late point not use libcharset.h at all. Couldn't we just know if
GIT_WINDOWS_NATIVE will be true earlier & move that check up, so it &
HAVE_LIBCHARSET_H can be mutually exclusive (with accompanying #error if
we have both)?

  reply	other threads:[~2019-07-04 22:53 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-27  8:44 [PATCH 0/1] gettext(windows): always use UTF-8 Johannes Schindelin via GitGitGadget
2019-06-27  8:44 ` [PATCH 1/1] gettext: always use UTF-8 on native Windows Karsten Blees via GitGitGadget
2019-07-03 11:26   ` Johannes Schindelin
2019-07-03 18:31     ` Junio C Hamano
2019-07-03 20:46 ` [PATCH v2 0/1] gettext(windows): always use UTF-8 Johannes Schindelin via GitGitGadget
2019-07-03 20:46   ` [PATCH v2 1/1] gettext: always use UTF-8 on native Windows Karsten Blees via GitGitGadget
2019-07-04 22:53     ` Ævar Arnfjörð Bjarmason [this message]
2019-07-08 12:57       ` Johannes Schindelin
2019-07-08 18:30       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o92976nz.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=blees@dcon.de \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.