git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] gitk: Fixing file name encoding issues.
@ 2025-02-18 22:42 Kazuhiro Kato via GitGitGadget
  2025-02-18 22:42 ` [PATCH 1/2] " Kazuhiro Kato via GitGitGadget
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Kazuhiro Kato via GitGitGadget @ 2025-02-18 22:42 UTC (permalink / raw)
  To: git; +Cc: Kazuhiro Kato

fix: file name encoding issues. fix: when resolving merge conflicts,
japanese file names become garbled.

Kazuhiro Kato (2):
  Fixing file name encoding issues.
  fix: when resolving merge conflicts, japanese file names become
    garbled.

 gitk-git/gitk | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)


base-commit: bc204b742735ae06f65bb20291c95985c9633b7f
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1886%2Fkkato233%2Ffix_filename_encoding-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1886/kkato233/fix_filename_encoding-v1
Pull-Request: https://github.com/git/git/pull/1886
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] Fixing file name encoding issues.
  2025-02-18 22:42 [PATCH 0/2] gitk: Fixing file name encoding issues Kazuhiro Kato via GitGitGadget
@ 2025-02-18 22:42 ` Kazuhiro Kato via GitGitGadget
  2025-02-19 17:30   ` Konstantin Khomoutov
  2025-02-18 22:42 ` [PATCH 2/2] fix: when resolving merge conflicts, japanese file names become garbled Kazuhiro Kato via GitGitGadget
  2025-02-18 22:52 ` [PATCH 0/2] gitk: Fixing file name encoding issues Junio C Hamano
  2 siblings, 1 reply; 7+ messages in thread
From: Kazuhiro Kato via GitGitGadget @ 2025-02-18 22:42 UTC (permalink / raw)
  To: git; +Cc: Kazuhiro Kato, Kazuhiro Kato

From: Kazuhiro Kato <kazuhiro.kato@hotmail.co.jp>

Signed-off-by: Kazuhiro Kato <kazuhiro.kato@hotmail.co.jp>
---
 gitk-git/gitk | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gitk-git/gitk b/gitk-git/gitk
index 47a7c1d29c4..88951ed2384 100755
--- a/gitk-git/gitk
+++ b/gitk-git/gitk
@@ -12379,6 +12379,7 @@ catch {
 if {$gitencoding == ""} {
     set gitencoding "utf-8"
 }
+encoding system utf-8
 set tclencoding [tcl_encoding $gitencoding]
 if {$tclencoding == {}} {
     puts stderr "Warning: encoding $gitencoding is not supported by Tcl/Tk"
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] fix: when resolving merge conflicts, japanese file names become garbled.
  2025-02-18 22:42 [PATCH 0/2] gitk: Fixing file name encoding issues Kazuhiro Kato via GitGitGadget
  2025-02-18 22:42 ` [PATCH 1/2] " Kazuhiro Kato via GitGitGadget
@ 2025-02-18 22:42 ` Kazuhiro Kato via GitGitGadget
  2025-02-19 18:11   ` Junio C Hamano
  2025-02-18 22:52 ` [PATCH 0/2] gitk: Fixing file name encoding issues Junio C Hamano
  2 siblings, 1 reply; 7+ messages in thread
From: Kazuhiro Kato via GitGitGadget @ 2025-02-18 22:42 UTC (permalink / raw)
  To: git; +Cc: Kazuhiro Kato, Kazuhiro Kato

From: Kazuhiro Kato <kazuhiro.kato@hotmail.co.jp>

Signed-off-by: Kazuhiro Kato <kazuhiro.kato@hotmail.co.jp>
---
 gitk-git/gitk | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gitk-git/gitk b/gitk-git/gitk
index 88951ed2384..f4f8dbd5fad 100755
--- a/gitk-git/gitk
+++ b/gitk-git/gitk
@@ -8205,12 +8205,13 @@ proc parseblobdiffline {ids line} {
 
         if {$type eq "--cc"} {
             # start of a new file in a merge diff
-            set fname [string range $line 10 end]
+            set fname_raw [string range $line 10 end]
+            set fname [encoding convertfrom $fname_raw]
             if {[lsearch -exact $treediffs($ids) $fname] < 0} {
                 lappend treediffs($ids) $fname
                 add_flist [list $fname]
             }
-
+            set fname $fname_raw
         } else {
             set line [string range $line 11 end]
             # If the name hasn't changed the length will be odd,
@@ -8310,6 +8311,7 @@ proc parseblobdiffline {ids line} {
             set diffinhdr 0
             return
         }
+        set line [encoding convertfrom $line]
         $ctext insert end "$line\n" filesep
 
     } else {
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] gitk: Fixing file name encoding issues.
  2025-02-18 22:42 [PATCH 0/2] gitk: Fixing file name encoding issues Kazuhiro Kato via GitGitGadget
  2025-02-18 22:42 ` [PATCH 1/2] " Kazuhiro Kato via GitGitGadget
  2025-02-18 22:42 ` [PATCH 2/2] fix: when resolving merge conflicts, japanese file names become garbled Kazuhiro Kato via GitGitGadget
@ 2025-02-18 22:52 ` Junio C Hamano
  2 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2025-02-18 22:52 UTC (permalink / raw)
  To: Kazuhiro Kato via GitGitGadget; +Cc: git, Kazuhiro Kato, Johannes Sixt

"Kazuhiro Kato via GitGitGadget" <gitgitgadget@gmail.com> writes:

> fix: file name encoding issues. fix: when resolving merge conflicts,
> japanese file names become garbled.
>
> Kazuhiro Kato (2):
>   Fixing file name encoding issues.
>   fix: when resolving merge conflicts, japanese file names become
>     garbled.
>
>  gitk-git/gitk | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)

Please

 - base your changes to j6t's gitk repository

   https://github.com/j6t/gitk

   where the file your patch would touch should appear at the top
   level of the working tree.  This "independent" history is then
   merged into my tree with "git pull -Xsubtree=gitk-git".

 - have the e-mail address of the gitk maintainer on the Cc: line,
   i.e. "Cc: Johannes Sixt <j6t@kdbg.org>".

Thanks.


cf.
 https://lore.kernel.org/git/b2038430-62dc-41fa-86c2-c0a14bd25e0f@kdbg.org/
 https://lore.kernel.org/git/7b826bba-11cf-4f45-8292-937522dbaf29@kdbg.org/
 https://lore.kernel.org/git/5ccc1943-c2a3-4896-a858-aa5fd6cdd426@kdbg.org/


>
>
> base-commit: bc204b742735ae06f65bb20291c95985c9633b7f
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1886%2Fkkato233%2Ffix_filename_encoding-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1886/kkato233/fix_filename_encoding-v1
> Pull-Request: https://github.com/git/git/pull/1886

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] Fixing file name encoding issues.
  2025-02-18 22:42 ` [PATCH 1/2] " Kazuhiro Kato via GitGitGadget
@ 2025-02-19 17:30   ` Konstantin Khomoutov
  2025-02-19 18:02     ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Konstantin Khomoutov @ 2025-02-19 17:30 UTC (permalink / raw)
  To: git; +Cc: Kazuhiro Kato

On Tue, Feb 18, 2025 at 10:42:25PM +0000, Kazuhiro Kato via GitGitGadget wrote:

[...]
> diff --git a/gitk-git/gitk b/gitk-git/gitk
> index 47a7c1d29c4..88951ed2384 100755
> --- a/gitk-git/gitk
> +++ b/gitk-git/gitk
> @@ -12379,6 +12379,7 @@ catch {
>  if {$gitencoding == ""} {
>      set gitencoding "utf-8"
>  }
> +encoding system utf-8
>  set tclencoding [tcl_encoding $gitencoding]
>  if {$tclencoding == {}} {
>      puts stderr "Warning: encoding $gitencoding is not supported by Tcl/Tk"

I'm not sure one should sensibly do this - except to implement some
well-understood and well-documented kludge, simply because the "system
encoding" is supposed to be set by the Tcl runtime.

Let's demonstrate (on a Linux-based system with UTF-8 locale):

  tmp$ touch 'привет мир.txt'
  tmp$ tclsh
  % encoding system
  utf-8
  % glob привет*.txt
  {привет мир.txt}
  % encoding system cp1251
  % encoding system
  cp1251
  % glob привет*.txt
  no files matched glob pattern "привет*.txt"
  %

Here, CP1251 is a Windows "code page" for Cyrillic; it's what
[encoding system] reports on Windows systems.

Note that I create a file whose name is two words in Cyrillic script encoded
in UTF-8, and Tcl is fine finding and dispaying this file (via its "glob"
command). But as soon as I change the Tcl's system encoding to another
8-bit Cyrillic encoding, globbing stops working.

Moreover, if I'd do [glob *.txt] so that my file would be matched anyway,
its named would not be readable since Tcl would re-encode it from CP1251
to Unicode, but the name is encoded in UTF-8, not CP1251.

In other words, your patch (supposedly) works on UTF-8-based systems
which is common to Linux-based OSes and MacOS, but I'm afraid it won't work on
Windows.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] Fixing file name encoding issues.
  2025-02-19 17:30   ` Konstantin Khomoutov
@ 2025-02-19 18:02     ` Junio C Hamano
  0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2025-02-19 18:02 UTC (permalink / raw)
  To: Konstantin Khomoutov; +Cc: git, Kazuhiro Kato

Konstantin Khomoutov <kostix@bswap.ru> writes:

> On Tue, Feb 18, 2025 at 10:42:25PM +0000, Kazuhiro Kato via GitGitGadget wrote:
>
> [...]
>> diff --git a/gitk-git/gitk b/gitk-git/gitk
>> index 47a7c1d29c4..88951ed2384 100755
>> --- a/gitk-git/gitk
>> +++ b/gitk-git/gitk
>> @@ -12379,6 +12379,7 @@ catch {
>>  if {$gitencoding == ""} {
>>      set gitencoding "utf-8"
>>  }
>> +encoding system utf-8
>>  set tclencoding [tcl_encoding $gitencoding]
>>  if {$tclencoding == {}} {
>>      puts stderr "Warning: encoding $gitencoding is not supported by Tcl/Tk"
>
> I'm not sure one should sensibly do this - except to implement some
> well-understood and well-documented kludge, simply because the "system
> encoding" is supposed to be set by the Tcl runtime.
> ...
> In other words, your patch (supposedly) works on UTF-8-based systems
> which is common to Linux-based OSes and MacOS, but I'm afraid it won't work on
> Windows.

In other words, things should work without "encoding system blah"
forcing a particular encoding that Tcl may not agree with?  Would
this mean perhaps in a "curious" repository with paths encoded in
something Tcl does not expect to be used (e.g., on a UTF-8 system
somehow EUC-jp is used for paths containing Japanese characters), it
needs to be possible to specify a "curious" encoding either with an
end-user on-demand action (e.g., menu items) or with an repository
configuration (e.g., gitk.pathencoding = euc-jp)?

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] fix: when resolving merge conflicts, japanese file names become garbled.
  2025-02-18 22:42 ` [PATCH 2/2] fix: when resolving merge conflicts, japanese file names become garbled Kazuhiro Kato via GitGitGadget
@ 2025-02-19 18:11   ` Junio C Hamano
  0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2025-02-19 18:11 UTC (permalink / raw)
  To: Kazuhiro Kato via GitGitGadget; +Cc: git, Kazuhiro Kato

"Kazuhiro Kato via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Kazuhiro Kato <kazuhiro.kato@hotmail.co.jp>

Here is a place to give a bit more context.  In what way the current
code is wrong, what end-user visible symptoms are brought due to
that wrongness, what is the correct way to implement it, etc.

> Signed-off-by: Kazuhiro Kato <kazuhiro.kato@hotmail.co.jp>
> ---
>  gitk-git/gitk | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/gitk-git/gitk b/gitk-git/gitk
> index 88951ed2384..f4f8dbd5fad 100755
> --- a/gitk-git/gitk
> +++ b/gitk-git/gitk
> @@ -8205,12 +8205,13 @@ proc parseblobdiffline {ids line} {
>  
>          if {$type eq "--cc"} {
>              # start of a new file in a merge diff
> -            set fname [string range $line 10 end]
> +            set fname_raw [string range $line 10 end]
> +            set fname [encoding convertfrom $fname_raw]

Is this "the Tcl read from git things as sequence of bytes, not
characters, so somebody needs to pass the bytes to "encoding"
function to turn them into a sequence of characters?  Unless
everything is US-ASCII, that is.

If that is the case, presumably the $line has a sequence of bytes,
so it may be wrong to chop it at 10th position (presumably that's
10th byte, not 10th character) when we are trying to teach the code
to deal with non-ASCII data, no?  

I am reasonably sure that [string length "diff --git"] is where 10
comes from, and that prefix will always be in ASCII, but it feels
safer and kosher if we converted the whole line first and then
chopped off the prefix.

The patch title says Japanese, but I would imagine this applies to
anything non-ASCII, so it would be better to retitle the patch to
say "non-ASCII" instead to signal that the issue the patch fixes
applies more widely.

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-02-19 18:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-18 22:42 [PATCH 0/2] gitk: Fixing file name encoding issues Kazuhiro Kato via GitGitGadget
2025-02-18 22:42 ` [PATCH 1/2] " Kazuhiro Kato via GitGitGadget
2025-02-19 17:30   ` Konstantin Khomoutov
2025-02-19 18:02     ` Junio C Hamano
2025-02-18 22:42 ` [PATCH 2/2] fix: when resolving merge conflicts, japanese file names become garbled Kazuhiro Kato via GitGitGadget
2025-02-19 18:11   ` Junio C Hamano
2025-02-18 22:52 ` [PATCH 0/2] gitk: Fixing file name encoding issues Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).