* [PATCH 0/2] gitk: Fixing file name encoding issues. @ 2025-02-18 22:42 Kazuhiro Kato via GitGitGadget 2025-02-18 22:42 ` [PATCH 1/2] " Kazuhiro Kato via GitGitGadget ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Kazuhiro Kato via GitGitGadget @ 2025-02-18 22:42 UTC (permalink / raw) To: git; +Cc: Kazuhiro Kato fix: file name encoding issues. fix: when resolving merge conflicts, japanese file names become garbled. Kazuhiro Kato (2): Fixing file name encoding issues. fix: when resolving merge conflicts, japanese file names become garbled. gitk-git/gitk | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) base-commit: bc204b742735ae06f65bb20291c95985c9633b7f Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1886%2Fkkato233%2Ffix_filename_encoding-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1886/kkato233/fix_filename_encoding-v1 Pull-Request: https://github.com/git/git/pull/1886 -- gitgitgadget ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] Fixing file name encoding issues. 2025-02-18 22:42 [PATCH 0/2] gitk: Fixing file name encoding issues Kazuhiro Kato via GitGitGadget @ 2025-02-18 22:42 ` Kazuhiro Kato via GitGitGadget 2025-02-19 17:30 ` Konstantin Khomoutov 2025-02-18 22:42 ` [PATCH 2/2] fix: when resolving merge conflicts, japanese file names become garbled Kazuhiro Kato via GitGitGadget 2025-02-18 22:52 ` [PATCH 0/2] gitk: Fixing file name encoding issues Junio C Hamano 2 siblings, 1 reply; 7+ messages in thread From: Kazuhiro Kato via GitGitGadget @ 2025-02-18 22:42 UTC (permalink / raw) To: git; +Cc: Kazuhiro Kato, Kazuhiro Kato From: Kazuhiro Kato <kazuhiro.kato@hotmail.co.jp> Signed-off-by: Kazuhiro Kato <kazuhiro.kato@hotmail.co.jp> --- gitk-git/gitk | 1 + 1 file changed, 1 insertion(+) diff --git a/gitk-git/gitk b/gitk-git/gitk index 47a7c1d29c4..88951ed2384 100755 --- a/gitk-git/gitk +++ b/gitk-git/gitk @@ -12379,6 +12379,7 @@ catch { if {$gitencoding == ""} { set gitencoding "utf-8" } +encoding system utf-8 set tclencoding [tcl_encoding $gitencoding] if {$tclencoding == {}} { puts stderr "Warning: encoding $gitencoding is not supported by Tcl/Tk" -- gitgitgadget ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] Fixing file name encoding issues. 2025-02-18 22:42 ` [PATCH 1/2] " Kazuhiro Kato via GitGitGadget @ 2025-02-19 17:30 ` Konstantin Khomoutov 2025-02-19 18:02 ` Junio C Hamano 0 siblings, 1 reply; 7+ messages in thread From: Konstantin Khomoutov @ 2025-02-19 17:30 UTC (permalink / raw) To: git; +Cc: Kazuhiro Kato On Tue, Feb 18, 2025 at 10:42:25PM +0000, Kazuhiro Kato via GitGitGadget wrote: [...] > diff --git a/gitk-git/gitk b/gitk-git/gitk > index 47a7c1d29c4..88951ed2384 100755 > --- a/gitk-git/gitk > +++ b/gitk-git/gitk > @@ -12379,6 +12379,7 @@ catch { > if {$gitencoding == ""} { > set gitencoding "utf-8" > } > +encoding system utf-8 > set tclencoding [tcl_encoding $gitencoding] > if {$tclencoding == {}} { > puts stderr "Warning: encoding $gitencoding is not supported by Tcl/Tk" I'm not sure one should sensibly do this - except to implement some well-understood and well-documented kludge, simply because the "system encoding" is supposed to be set by the Tcl runtime. Let's demonstrate (on a Linux-based system with UTF-8 locale): tmp$ touch 'привет мир.txt' tmp$ tclsh % encoding system utf-8 % glob привет*.txt {привет мир.txt} % encoding system cp1251 % encoding system cp1251 % glob привет*.txt no files matched glob pattern "привет*.txt" % Here, CP1251 is a Windows "code page" for Cyrillic; it's what [encoding system] reports on Windows systems. Note that I create a file whose name is two words in Cyrillic script encoded in UTF-8, and Tcl is fine finding and dispaying this file (via its "glob" command). But as soon as I change the Tcl's system encoding to another 8-bit Cyrillic encoding, globbing stops working. Moreover, if I'd do [glob *.txt] so that my file would be matched anyway, its named would not be readable since Tcl would re-encode it from CP1251 to Unicode, but the name is encoded in UTF-8, not CP1251. In other words, your patch (supposedly) works on UTF-8-based systems which is common to Linux-based OSes and MacOS, but I'm afraid it won't work on Windows. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] Fixing file name encoding issues. 2025-02-19 17:30 ` Konstantin Khomoutov @ 2025-02-19 18:02 ` Junio C Hamano 0 siblings, 0 replies; 7+ messages in thread From: Junio C Hamano @ 2025-02-19 18:02 UTC (permalink / raw) To: Konstantin Khomoutov; +Cc: git, Kazuhiro Kato Konstantin Khomoutov <kostix@bswap.ru> writes: > On Tue, Feb 18, 2025 at 10:42:25PM +0000, Kazuhiro Kato via GitGitGadget wrote: > > [...] >> diff --git a/gitk-git/gitk b/gitk-git/gitk >> index 47a7c1d29c4..88951ed2384 100755 >> --- a/gitk-git/gitk >> +++ b/gitk-git/gitk >> @@ -12379,6 +12379,7 @@ catch { >> if {$gitencoding == ""} { >> set gitencoding "utf-8" >> } >> +encoding system utf-8 >> set tclencoding [tcl_encoding $gitencoding] >> if {$tclencoding == {}} { >> puts stderr "Warning: encoding $gitencoding is not supported by Tcl/Tk" > > I'm not sure one should sensibly do this - except to implement some > well-understood and well-documented kludge, simply because the "system > encoding" is supposed to be set by the Tcl runtime. > ... > In other words, your patch (supposedly) works on UTF-8-based systems > which is common to Linux-based OSes and MacOS, but I'm afraid it won't work on > Windows. In other words, things should work without "encoding system blah" forcing a particular encoding that Tcl may not agree with? Would this mean perhaps in a "curious" repository with paths encoded in something Tcl does not expect to be used (e.g., on a UTF-8 system somehow EUC-jp is used for paths containing Japanese characters), it needs to be possible to specify a "curious" encoding either with an end-user on-demand action (e.g., menu items) or with an repository configuration (e.g., gitk.pathencoding = euc-jp)? Thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] fix: when resolving merge conflicts, japanese file names become garbled. 2025-02-18 22:42 [PATCH 0/2] gitk: Fixing file name encoding issues Kazuhiro Kato via GitGitGadget 2025-02-18 22:42 ` [PATCH 1/2] " Kazuhiro Kato via GitGitGadget @ 2025-02-18 22:42 ` Kazuhiro Kato via GitGitGadget 2025-02-19 18:11 ` Junio C Hamano 2025-02-18 22:52 ` [PATCH 0/2] gitk: Fixing file name encoding issues Junio C Hamano 2 siblings, 1 reply; 7+ messages in thread From: Kazuhiro Kato via GitGitGadget @ 2025-02-18 22:42 UTC (permalink / raw) To: git; +Cc: Kazuhiro Kato, Kazuhiro Kato From: Kazuhiro Kato <kazuhiro.kato@hotmail.co.jp> Signed-off-by: Kazuhiro Kato <kazuhiro.kato@hotmail.co.jp> --- gitk-git/gitk | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/gitk-git/gitk b/gitk-git/gitk index 88951ed2384..f4f8dbd5fad 100755 --- a/gitk-git/gitk +++ b/gitk-git/gitk @@ -8205,12 +8205,13 @@ proc parseblobdiffline {ids line} { if {$type eq "--cc"} { # start of a new file in a merge diff - set fname [string range $line 10 end] + set fname_raw [string range $line 10 end] + set fname [encoding convertfrom $fname_raw] if {[lsearch -exact $treediffs($ids) $fname] < 0} { lappend treediffs($ids) $fname add_flist [list $fname] } - + set fname $fname_raw } else { set line [string range $line 11 end] # If the name hasn't changed the length will be odd, @@ -8310,6 +8311,7 @@ proc parseblobdiffline {ids line} { set diffinhdr 0 return } + set line [encoding convertfrom $line] $ctext insert end "$line\n" filesep } else { -- gitgitgadget ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] fix: when resolving merge conflicts, japanese file names become garbled. 2025-02-18 22:42 ` [PATCH 2/2] fix: when resolving merge conflicts, japanese file names become garbled Kazuhiro Kato via GitGitGadget @ 2025-02-19 18:11 ` Junio C Hamano 0 siblings, 0 replies; 7+ messages in thread From: Junio C Hamano @ 2025-02-19 18:11 UTC (permalink / raw) To: Kazuhiro Kato via GitGitGadget; +Cc: git, Kazuhiro Kato "Kazuhiro Kato via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Kazuhiro Kato <kazuhiro.kato@hotmail.co.jp> Here is a place to give a bit more context. In what way the current code is wrong, what end-user visible symptoms are brought due to that wrongness, what is the correct way to implement it, etc. > Signed-off-by: Kazuhiro Kato <kazuhiro.kato@hotmail.co.jp> > --- > gitk-git/gitk | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/gitk-git/gitk b/gitk-git/gitk > index 88951ed2384..f4f8dbd5fad 100755 > --- a/gitk-git/gitk > +++ b/gitk-git/gitk > @@ -8205,12 +8205,13 @@ proc parseblobdiffline {ids line} { > > if {$type eq "--cc"} { > # start of a new file in a merge diff > - set fname [string range $line 10 end] > + set fname_raw [string range $line 10 end] > + set fname [encoding convertfrom $fname_raw] Is this "the Tcl read from git things as sequence of bytes, not characters, so somebody needs to pass the bytes to "encoding" function to turn them into a sequence of characters? Unless everything is US-ASCII, that is. If that is the case, presumably the $line has a sequence of bytes, so it may be wrong to chop it at 10th position (presumably that's 10th byte, not 10th character) when we are trying to teach the code to deal with non-ASCII data, no? I am reasonably sure that [string length "diff --git"] is where 10 comes from, and that prefix will always be in ASCII, but it feels safer and kosher if we converted the whole line first and then chopped off the prefix. The patch title says Japanese, but I would imagine this applies to anything non-ASCII, so it would be better to retitle the patch to say "non-ASCII" instead to signal that the issue the patch fixes applies more widely. Thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/2] gitk: Fixing file name encoding issues. 2025-02-18 22:42 [PATCH 0/2] gitk: Fixing file name encoding issues Kazuhiro Kato via GitGitGadget 2025-02-18 22:42 ` [PATCH 1/2] " Kazuhiro Kato via GitGitGadget 2025-02-18 22:42 ` [PATCH 2/2] fix: when resolving merge conflicts, japanese file names become garbled Kazuhiro Kato via GitGitGadget @ 2025-02-18 22:52 ` Junio C Hamano 2 siblings, 0 replies; 7+ messages in thread From: Junio C Hamano @ 2025-02-18 22:52 UTC (permalink / raw) To: Kazuhiro Kato via GitGitGadget; +Cc: git, Kazuhiro Kato, Johannes Sixt "Kazuhiro Kato via GitGitGadget" <gitgitgadget@gmail.com> writes: > fix: file name encoding issues. fix: when resolving merge conflicts, > japanese file names become garbled. > > Kazuhiro Kato (2): > Fixing file name encoding issues. > fix: when resolving merge conflicts, japanese file names become > garbled. > > gitk-git/gitk | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) Please - base your changes to j6t's gitk repository https://github.com/j6t/gitk where the file your patch would touch should appear at the top level of the working tree. This "independent" history is then merged into my tree with "git pull -Xsubtree=gitk-git". - have the e-mail address of the gitk maintainer on the Cc: line, i.e. "Cc: Johannes Sixt <j6t@kdbg.org>". Thanks. cf. https://lore.kernel.org/git/b2038430-62dc-41fa-86c2-c0a14bd25e0f@kdbg.org/ https://lore.kernel.org/git/7b826bba-11cf-4f45-8292-937522dbaf29@kdbg.org/ https://lore.kernel.org/git/5ccc1943-c2a3-4896-a858-aa5fd6cdd426@kdbg.org/ > > > base-commit: bc204b742735ae06f65bb20291c95985c9633b7f > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1886%2Fkkato233%2Ffix_filename_encoding-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1886/kkato233/fix_filename_encoding-v1 > Pull-Request: https://github.com/git/git/pull/1886 ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-02-19 18:11 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-02-18 22:42 [PATCH 0/2] gitk: Fixing file name encoding issues Kazuhiro Kato via GitGitGadget 2025-02-18 22:42 ` [PATCH 1/2] " Kazuhiro Kato via GitGitGadget 2025-02-19 17:30 ` Konstantin Khomoutov 2025-02-19 18:02 ` Junio C Hamano 2025-02-18 22:42 ` [PATCH 2/2] fix: when resolving merge conflicts, japanese file names become garbled Kazuhiro Kato via GitGitGadget 2025-02-19 18:11 ` Junio C Hamano 2025-02-18 22:52 ` [PATCH 0/2] gitk: Fixing file name encoding issues Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).