* [PATCH 0/2] Let's use the Win32 API more precisely @ 2019-06-27 9:37 Johannes Schindelin via GitGitGadget 2019-06-27 9:37 ` [PATCH 1/2] mingw: get pw_name in UTF-8 format Johannes Schindelin via GitGitGadget 2019-06-27 9:37 ` [PATCH 2/2] mingw: use Unicode functions explicitly Johannes Schindelin via GitGitGadget 0 siblings, 2 replies; 8+ messages in thread From: Johannes Schindelin via GitGitGadget @ 2019-06-27 9:37 UTC (permalink / raw) To: git; +Cc: Junio C Hamano For many Win32 functions, there actually exist two variants: one that takes const char * ("ANSI", meaning the current code page) and wchar_t * ("Unicode", i.e. UTF-16, at least for all practical matters). These functions have "A" and "W" suffixes, respectively, e.g. GetFileAttributesW(). The symbols without this suffix are #defined to the *W() versions if the constant UNICODE is defined before including the Windows headers, and to *A() otherwise. Let's not rely on this constant, but explicitly say what we want: we want the Unicode versions, as they seem to be used by the ANSI flavor anyway. Johannes Schindelin (2): mingw: get pw_name in UTF-8 format mingw: use Unicode functions explicitly compat/mingw.c | 12 +++++++++--- compat/poll/poll.c | 2 +- compat/winansi.c | 10 ++++++---- 3 files changed, 16 insertions(+), 8 deletions(-) base-commit: aa25c82427ae70aebf3b8f970f2afd54e9a2a8c6 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-147%2Fdscho%2Fansi-unicode-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-147/dscho/ansi-unicode-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/147 -- gitgitgadget ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] mingw: get pw_name in UTF-8 format 2019-06-27 9:37 [PATCH 0/2] Let's use the Win32 API more precisely Johannes Schindelin via GitGitGadget @ 2019-06-27 9:37 ` Johannes Schindelin via GitGitGadget 2019-07-04 20:13 ` Beat Bolli 2019-06-27 9:37 ` [PATCH 2/2] mingw: use Unicode functions explicitly Johannes Schindelin via GitGitGadget 1 sibling, 1 reply; 8+ messages in thread From: Johannes Schindelin via GitGitGadget @ 2019-06-27 9:37 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Johannes Schindelin From: Johannes Schindelin <johannes.schindelin@gmx.de> Previously, we would have obtained the user name encoded in whatever the current code page is. Note: the "user name" here does not denote the full name but instead the short logon name. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- compat/mingw.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/compat/mingw.c b/compat/mingw.c index 9b6d2400e1..8526876262 100644 --- a/compat/mingw.c +++ b/compat/mingw.c @@ -1946,13 +1946,19 @@ struct passwd *getpwuid(int uid) static unsigned initialized; static char user_name[100]; static struct passwd *p; + wchar_t buf[100]; DWORD len; if (initialized) return p; - len = sizeof(user_name); - if (!GetUserName(user_name, &len)) { + len = sizeof(buf); + if (!GetUserNameW(buf, &len)) { + initialized = 1; + return NULL; + } + + if (xwcstoutf(user_name, buf, sizeof(user_name)) < 0) { initialized = 1; return NULL; } -- gitgitgadget ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] mingw: get pw_name in UTF-8 format 2019-06-27 9:37 ` [PATCH 1/2] mingw: get pw_name in UTF-8 format Johannes Schindelin via GitGitGadget @ 2019-07-04 20:13 ` Beat Bolli 2019-07-04 21:52 ` Johannes Schindelin 0 siblings, 1 reply; 8+ messages in thread From: Beat Bolli @ 2019-07-04 20:13 UTC (permalink / raw) To: Johannes Schindelin via GitGitGadget, git Cc: Junio C Hamano, Johannes Schindelin On 27.06.19 11:37, Johannes Schindelin via GitGitGadget wrote: > From: Johannes Schindelin <johannes.schindelin@gmx.de> > > Previously, we would have obtained the user name encoded in whatever the > current code page is. > > Note: the "user name" here does not denote the full name but instead the > short logon name. > > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> > --- > compat/mingw.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/compat/mingw.c b/compat/mingw.c > index 9b6d2400e1..8526876262 100644 > --- a/compat/mingw.c > +++ b/compat/mingw.c > @@ -1946,13 +1946,19 @@ struct passwd *getpwuid(int uid) > static unsigned initialized; > static char user_name[100]; > static struct passwd *p; > + wchar_t buf[100]; > DWORD len; > > if (initialized) > return p; > > - len = sizeof(user_name); > - if (!GetUserName(user_name, &len)) { > + len = sizeof(buf); I think this should be "len = sizeof(buf) / sizeof(buf[0])". GetUserNameW() takes the number of characters, not bytes. > + if (!GetUserNameW(buf, &len)) { > + initialized = 1; > + return NULL; > + } > + > + if (xwcstoutf(user_name, buf, sizeof(user_name)) < 0) { > initialized = 1; > return NULL; > } > Cheers, Beat ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] mingw: get pw_name in UTF-8 format 2019-07-04 20:13 ` Beat Bolli @ 2019-07-04 21:52 ` Johannes Schindelin 2019-07-04 21:54 ` Beat Bolli 0 siblings, 1 reply; 8+ messages in thread From: Johannes Schindelin @ 2019-07-04 21:52 UTC (permalink / raw) To: Beat Bolli; +Cc: Johannes Schindelin via GitGitGadget, git, Junio C Hamano Hi Beat, On Thu, 4 Jul 2019, Beat Bolli wrote: > On 27.06.19 11:37, Johannes Schindelin via GitGitGadget wrote: > > From: Johannes Schindelin <johannes.schindelin@gmx.de> > > > > Previously, we would have obtained the user name encoded in whatever the > > current code page is. > > > > Note: the "user name" here does not denote the full name but instead the > > short logon name. > > > > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> > > --- > > compat/mingw.c | 10 ++++++++-- > > 1 file changed, 8 insertions(+), 2 deletions(-) > > > > diff --git a/compat/mingw.c b/compat/mingw.c > > index 9b6d2400e1..8526876262 100644 > > --- a/compat/mingw.c > > +++ b/compat/mingw.c > > @@ -1946,13 +1946,19 @@ struct passwd *getpwuid(int uid) > > static unsigned initialized; > > static char user_name[100]; > > static struct passwd *p; > > + wchar_t buf[100]; > > DWORD len; > > > > if (initialized) > > return p; > > > > - len = sizeof(user_name); > > - if (!GetUserName(user_name, &len)) { > > + len = sizeof(buf); > > I think this should be "len = sizeof(buf) / sizeof(buf[0])". > > GetUserNameW() takes the number of characters, not bytes. Good catch. How about I use `ARRAY_SIZE(buf)`? Since this is already in `next`, I will prepare a follow-up patch. Ciao, Dscho > > > + if (!GetUserNameW(buf, &len)) { > > + initialized = 1; > > + return NULL; > > + } > > + > > + if (xwcstoutf(user_name, buf, sizeof(user_name)) < 0) { > > initialized = 1; > > return NULL; > > } > > > > Cheers, > Beat > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] mingw: get pw_name in UTF-8 format 2019-07-04 21:52 ` Johannes Schindelin @ 2019-07-04 21:54 ` Beat Bolli 0 siblings, 0 replies; 8+ messages in thread From: Beat Bolli @ 2019-07-04 21:54 UTC (permalink / raw) To: Johannes Schindelin Cc: Johannes Schindelin via GitGitGadget, git, Junio C Hamano Hi Dscho On 04.07.19 23:52, Johannes Schindelin wrote: > Hi Beat, > > On Thu, 4 Jul 2019, Beat Bolli wrote: > >> On 27.06.19 11:37, Johannes Schindelin via GitGitGadget wrote: >>> From: Johannes Schindelin <johannes.schindelin@gmx.de> >>> >>> Previously, we would have obtained the user name encoded in whatever the >>> current code page is. >>> >>> Note: the "user name" here does not denote the full name but instead the >>> short logon name. >>> >>> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> >>> --- >>> compat/mingw.c | 10 ++++++++-- >>> 1 file changed, 8 insertions(+), 2 deletions(-) >>> >>> diff --git a/compat/mingw.c b/compat/mingw.c >>> index 9b6d2400e1..8526876262 100644 >>> --- a/compat/mingw.c >>> +++ b/compat/mingw.c >>> @@ -1946,13 +1946,19 @@ struct passwd *getpwuid(int uid) >>> static unsigned initialized; >>> static char user_name[100]; >>> static struct passwd *p; >>> + wchar_t buf[100]; >>> DWORD len; >>> >>> if (initialized) >>> return p; >>> >>> - len = sizeof(user_name); >>> - if (!GetUserName(user_name, &len)) { >>> + len = sizeof(buf); >> >> I think this should be "len = sizeof(buf) / sizeof(buf[0])". >> >> GetUserNameW() takes the number of characters, not bytes. > > Good catch. > > How about I use `ARRAY_SIZE(buf)`? Since this is already in `next`, I will > prepare a follow-up patch. Of course, ARRAY_SIZE() is even better. I didn't remember it existed. Cheers, Beat > > Ciao, > Dscho > >> >>> + if (!GetUserNameW(buf, &len)) { >>> + initialized = 1; >>> + return NULL; >>> + } >>> + >>> + if (xwcstoutf(user_name, buf, sizeof(user_name)) < 0) { >>> initialized = 1; >>> return NULL; >>> } >>> >> >> Cheers, >> Beat >> ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] mingw: use Unicode functions explicitly 2019-06-27 9:37 [PATCH 0/2] Let's use the Win32 API more precisely Johannes Schindelin via GitGitGadget 2019-06-27 9:37 ` [PATCH 1/2] mingw: get pw_name in UTF-8 format Johannes Schindelin via GitGitGadget @ 2019-06-27 9:37 ` Johannes Schindelin via GitGitGadget 2019-06-27 17:26 ` Eric Sunshine 1 sibling, 1 reply; 8+ messages in thread From: Johannes Schindelin via GitGitGadget @ 2019-06-27 9:37 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Johannes Schindelin From: Johannes Schindelin <johannes.schindelin@gmx.de> Many Win32 API functions actually exist in two variants: one with the `A` suffix that takes ANSI parameters (`char *` or `const char *`) and one with the `W` suffix that takes Unicode parameters (`wchar_t *` or `const wchar_t *`). The ANSI variant assumes that the strings are encoded according to whatever is the current locale. This is not what Git wants to use on Windows: we assume that `char *` variables point to strings encoded in UTF-8. There is a pseudo UTF-8 locale on Windows, but it does not work as one might expect. In addition, if we overrode the user's locale, that would modify the behavior of programs spawned by Git (such as editors, difftools, etc), therefore we cannot use that pseudo locale. Further, it is actually highly encouraged to use the Unicode versions instead of the ANSI versions, so let's do precisely that. Note: when calling the Win32 API functions _without_ any suffix, it depends whether the `UNICODE` constant is defined before the relevant headers are #include'd. Without that constant, the ANSI variants are used. Let's be explicit and avoid that ambiguity. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- compat/mingw.c | 2 +- compat/poll/poll.c | 2 +- compat/winansi.c | 10 ++++++---- 3 files changed, 8 insertions(+), 6 deletions(-) diff --git a/compat/mingw.c b/compat/mingw.c index 8526876262..b8a62bf914 100644 --- a/compat/mingw.c +++ b/compat/mingw.c @@ -1407,7 +1407,7 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **deltaen do_unset_environment_variables(); /* Determine whether or not we are associated to a console */ - cons = CreateFile("CONOUT$", GENERIC_WRITE, + cons = CreateFileW(L"CONOUT$", GENERIC_WRITE, FILE_SHARE_WRITE, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL); if (cons == INVALID_HANDLE_VALUE) { diff --git a/compat/poll/poll.c b/compat/poll/poll.c index 4459408c7d..8f24b80252 100644 --- a/compat/poll/poll.c +++ b/compat/poll/poll.c @@ -150,7 +150,7 @@ win32_compute_revents (HANDLE h, int *p_sought) if (!once_only) { NtQueryInformationFile = (PNtQueryInformationFile) - GetProcAddress (GetModuleHandle ("ntdll.dll"), + GetProcAddress (GetModuleHandleW (L"ntdll.dll"), "NtQueryInformationFile"); once_only = TRUE; } diff --git a/compat/winansi.c b/compat/winansi.c index f4f08237f9..cd947e048e 100644 --- a/compat/winansi.c +++ b/compat/winansi.c @@ -599,7 +599,7 @@ int winansi_isatty(int fd) void winansi_init(void) { int con1, con2; - char name[32]; + wchar_t name[32]; /* check if either stdout or stderr is a console output screen buffer */ con1 = is_console(1); @@ -619,13 +619,15 @@ void winansi_init(void) } /* create a named pipe to communicate with the console thread */ - xsnprintf(name, sizeof(name), "\\\\.\\pipe\\winansi%lu", GetCurrentProcessId()); - hwrite = CreateNamedPipe(name, PIPE_ACCESS_OUTBOUND, + if (swprintf(name, ARRAY_SIZE(name) - 1, L"\\\\.\\pipe\\winansi%lu", + GetCurrentProcessId()) < 0) + die("Could not initialize winansi pipe name"); + hwrite = CreateNamedPipeW(name, PIPE_ACCESS_OUTBOUND, PIPE_TYPE_BYTE | PIPE_WAIT, 1, BUFFER_SIZE, 0, 0, NULL); if (hwrite == INVALID_HANDLE_VALUE) die_lasterr("CreateNamedPipe failed"); - hread = CreateFile(name, GENERIC_READ, 0, NULL, OPEN_EXISTING, 0, NULL); + hread = CreateFileW(name, GENERIC_READ, 0, NULL, OPEN_EXISTING, 0, NULL); if (hread == INVALID_HANDLE_VALUE) die_lasterr("CreateFile for named pipe failed"); -- gitgitgadget ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] mingw: use Unicode functions explicitly 2019-06-27 9:37 ` [PATCH 2/2] mingw: use Unicode functions explicitly Johannes Schindelin via GitGitGadget @ 2019-06-27 17:26 ` Eric Sunshine 2019-06-27 18:54 ` Johannes Schindelin 0 siblings, 1 reply; 8+ messages in thread From: Eric Sunshine @ 2019-06-27 17:26 UTC (permalink / raw) To: Johannes Schindelin via GitGitGadget Cc: Git List, Junio C Hamano, Johannes Schindelin On Thu, Jun 27, 2019 at 5:37 AM Johannes Schindelin via GitGitGadget <gitgitgadget@gmail.com> wrote: > Many Win32 API functions actually exist in two variants: one with > the `A` suffix that takes ANSI parameters (`char *` or `const char *`) > and one with the `W` suffix that takes Unicode parameters (`wchar_t *` > or `const wchar_t *`). > > The ANSI variant assumes that the strings are encoded according to > whatever is the current locale. This is not what Git wants to use on > Windows: we assume that `char *` variables point to strings encoded in > UTF-8. > > There is a pseudo UTF-8 locale on Windows, but it does not work > as one might expect. What does "does not work as one might expect" mean? The reader is left hanging, not knowing why or how the UTF-8 locale on Windows is undesirable. > In addition, if we overrode the user's locale, that > would modify the behavior of programs spawned by Git (such as editors, > difftools, etc), therefore we cannot use that pseudo locale. > > Further, it is actually highly encouraged to use the Unicode versions > instead of the ANSI versions, so let's do precisely that. > > Note: when calling the Win32 API functions _without_ any suffix, it > depends whether the `UNICODE` constant is defined before the relevant > headers are #include'd. Without that constant, the ANSI variants are > used. Let's be explicit and avoid that ambiguity. > > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] mingw: use Unicode functions explicitly 2019-06-27 17:26 ` Eric Sunshine @ 2019-06-27 18:54 ` Johannes Schindelin 0 siblings, 0 replies; 8+ messages in thread From: Johannes Schindelin @ 2019-06-27 18:54 UTC (permalink / raw) To: Eric Sunshine Cc: Johannes Schindelin via GitGitGadget, Git List, Junio C Hamano Hi Eric, On Thu, 27 Jun 2019, Eric Sunshine wrote: > On Thu, Jun 27, 2019 at 5:37 AM Johannes Schindelin via GitGitGadget > <gitgitgadget@gmail.com> wrote: > > Many Win32 API functions actually exist in two variants: one with > > the `A` suffix that takes ANSI parameters (`char *` or `const char *`) > > and one with the `W` suffix that takes Unicode parameters (`wchar_t *` > > or `const wchar_t *`). > > > > The ANSI variant assumes that the strings are encoded according to > > whatever is the current locale. This is not what Git wants to use on > > Windows: we assume that `char *` variables point to strings encoded in > > UTF-8. > > > > There is a pseudo UTF-8 locale on Windows, but it does not work > > as one might expect. > > What does "does not work as one might expect" mean? The reader is left > hanging, not knowing why or how the UTF-8 locale on Windows is > undesirable. Should I really bore the reader with half-details? At least one of those behaviors was reported as an unbootable Windows 7, but some others could not reproduce. I really did not want to write a novel here that is 1) largely irrelevant and 2) would take way too long to write because I forgot most of the details and would have to look them up again. Hopefully we can do without the full story here, as even one report that it does not work as expected is enough to make this an unfeasible option for us, and that's that as far as I am concerned. Ciao, Dscho > > > In addition, if we overrode the user's locale, that > > would modify the behavior of programs spawned by Git (such as editors, > > difftools, etc), therefore we cannot use that pseudo locale. > > > > Further, it is actually highly encouraged to use the Unicode versions > > instead of the ANSI versions, so let's do precisely that. > > > > Note: when calling the Win32 API functions _without_ any suffix, it > > depends whether the `UNICODE` constant is defined before the relevant > > headers are #include'd. Without that constant, the ANSI variants are > > used. Let's be explicit and avoid that ambiguity. > > > > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> > ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2019-07-04 21:54 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-06-27 9:37 [PATCH 0/2] Let's use the Win32 API more precisely Johannes Schindelin via GitGitGadget 2019-06-27 9:37 ` [PATCH 1/2] mingw: get pw_name in UTF-8 format Johannes Schindelin via GitGitGadget 2019-07-04 20:13 ` Beat Bolli 2019-07-04 21:52 ` Johannes Schindelin 2019-07-04 21:54 ` Beat Bolli 2019-06-27 9:37 ` [PATCH 2/2] mingw: use Unicode functions explicitly Johannes Schindelin via GitGitGadget 2019-06-27 17:26 ` Eric Sunshine 2019-06-27 18:54 ` Johannes Schindelin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).