* [PATCH (GITK) v3 0/4] Enhance encoding support.
@ 2008-10-13 8:12 Alexander Gavrilov
2008-10-13 8:12 ` [PATCH (GITK) v3 1/4] gitk: Port new encoding logic from git-gui Alexander Gavrilov
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Alexander Gavrilov @ 2008-10-13 8:12 UTC (permalink / raw)
To: git; +Cc: Paul Mackerras, Johannes Sixt
Currently GUI tools don't provide enough support for
viewing files that contain non-ASCII characters. This set of
patches addresses this issue. The git-gui part of the series
is based on patches that were in the 'pu' branch of the
git-gui repository.
UPDATE: the git-gui part of the series is in master.
File encoding can be specified in the following ways:
1) It defaults to the current locale encoding.
2) It can be overridden by setting the gui.encoding option.
3) It can be further set on per-file basis by specifying
the 'encoding' attribute in gitattributes.
4) Finally, git-gui allows directly selecting encodings
through a popup menu.
Since git apparently cannot work with filenames in non-locale
encodings anyway, I did not try to do anything about it apart
from fixing some obvious bugs.
CHANGES v2:
- Wrote better comments for the first three patches.
- Added a patch to speed up loading of very large diffs.
CHANGES v3:
- Killed get_cached_encoding, replacing it with get_path_encoding.
- Squashed file name and content patches into one.
- Added the tcl_encoding reimplementation patch. I should
say that without the second cache this really makes a
difference with thousand-file diffs.
GITK:
gitk: Port new encoding logic from git-gui.
---
gitk | 39 ++++++++++++++++++++++++++++++++++++---
1 files changed, 36 insertions(+), 3 deletions(-)
gitk: Enhance file encoding support.
---
gitk | 35 +++++++++++++++++++++++++++--------
1 files changed, 27 insertions(+), 8 deletions(-)
gitk: Implement batch lookup and caching of encoding attrs.
---
gitk | 35 ++++++++++++++++++++++++++++++++++-
1 files changed, 34 insertions(+), 1 deletions(-)
gitk: Optimize encoding name resolution using a lookup table.
---
gitk | 84 ++++++++++++++++++++++++++++++++++++++++++-----------------------
1 files changed, 54 insertions(+), 30 deletions(-)
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH (GITK) v3 1/4] gitk: Port new encoding logic from git-gui. 2008-10-13 8:12 [PATCH (GITK) v3 0/4] Enhance encoding support Alexander Gavrilov @ 2008-10-13 8:12 ` Alexander Gavrilov 2008-10-13 8:12 ` [PATCH (GITK) v3 2/4] gitk: Enhance file encoding support Alexander Gavrilov 2008-10-15 12:32 ` [PATCH (GITK) v3 0/4] Enhance encoding support Paul Mackerras 2008-10-15 12:38 ` Paul Mackerras 2 siblings, 1 reply; 8+ messages in thread From: Alexander Gavrilov @ 2008-10-13 8:12 UTC (permalink / raw) To: git; +Cc: Paul Mackerras, Johannes Sixt Recently handling of file encodings in GIT-GUI has been greatly enhanced. It now follows these rules: - File encoding defaults to the system encoding. - It can be overridden by setting the gui.encoding option. - Finally, the 'encoding' attribute is checked on per-file basis; it has the last word. This patch inserts copies of git-gui functions necessary for supporting this into gitk. The following patch will connect them to the rest of the code. Note: Since git-check-attr does not provide support for reading attributes from trees, attribute lookup is done using files from the working directory. It also extends the range of supported encoding names, to match changes in git-gui. Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com> Tested-by: Johannes Sixt <johannes.sixt@telecom.at> --- gitk | 39 ++++++++++++++++++++++++++++++++++++--- 1 files changed, 36 insertions(+), 3 deletions(-) diff --git a/gitk b/gitk index dce17e9..8682838 100755 --- a/gitk +++ b/gitk @@ -9727,7 +9727,7 @@ set encoding_aliases { { ISO-8859-16 iso-ir-226 ISO_8859-16:2001 ISO_8859-16 latin10 l10 } { GBK CP936 MS936 windows-936 } { JIS_Encoding csJISEncoding } - { Shift_JIS MS_Kanji csShiftJIS } + { Shift_JIS MS_Kanji csShiftJIS ShiftJIS Shift-JIS } { Extended_UNIX_Code_Packed_Format_for_Japanese csEUCPkdFmtJapanese EUC-JP } { Extended_UNIX_Code_Fixed_Width_for_Japanese csEUCFixWidJapanese } @@ -9769,7 +9769,7 @@ proc tcl_encoding {enc} { set i [lsearch -exact $lcnames $enc] if {$i < 0} { # look for "isonnn" instead of "iso-nnn" or "iso_nnn" - if {[regsub {^iso[-_]} $enc iso encx]} { + if {[regsub {^(iso|cp|ibm|jis)[-_]} $enc {\1} encx]} { set i [lsearch -exact $lcnames $encx] } } @@ -9781,7 +9781,7 @@ proc tcl_encoding {enc} { foreach e $ll { set i [lsearch -exact $lcnames $e] if {$i < 0} { - if {[regsub {^iso[-_]} $e iso ex]} { + if {[regsub {^(iso|cp|ibm|jis)[-_]} $e {\1} ex]} { set i [lsearch -exact $lcnames $ex] } } @@ -9796,6 +9796,34 @@ proc tcl_encoding {enc} { return {} } +proc gitattr {path attr default} { + if {[catch {set r [exec git check-attr $attr -- $path]}]} { + set r unspecified + } else { + set r [join [lrange [split $r :] 2 end] :] + regsub {^ } $r {} r + } + if {$r eq {unspecified}} { + return $default + } + return $r +} + +proc get_path_encoding {path} { + global gui_encoding + set tcl_enc [tcl_encoding $gui_encoding] + if {$tcl_enc eq {}} { + set tcl_enc [encoding system] + } + if {$path ne {}} { + set enc2 [tcl_encoding [gitattr $path encoding $tcl_enc]] + if {$enc2 ne {}} { + set tcl_enc $enc2 + } + } + return $tcl_enc +} + # First check that Tcl/Tk is recent enough if {[catch {package require Tk 8.4} err]} { show_error {} . [mc "Sorry, gitk cannot run with this version of Tcl/Tk.\n\ @@ -9818,6 +9846,11 @@ if {$tclencoding == {}} { puts stderr "Warning: encoding $gitencoding is not supported by Tcl/Tk" } +set gui_encoding [encoding system] +catch { + set gui_encoding [exec git config --get gui.encoding] +} + set mainfont {Helvetica 9} set textfont {Courier 9} set uifont {Helvetica 9 bold} -- 1.6.0.20.g6148bc ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH (GITK) v3 2/4] gitk: Enhance file encoding support. 2008-10-13 8:12 ` [PATCH (GITK) v3 1/4] gitk: Port new encoding logic from git-gui Alexander Gavrilov @ 2008-10-13 8:12 ` Alexander Gavrilov 2008-10-13 8:12 ` [PATCH (GITK) v3 3/4] gitk: Implement batch lookup and caching of encoding attrs Alexander Gavrilov 0 siblings, 1 reply; 8+ messages in thread From: Alexander Gavrilov @ 2008-10-13 8:12 UTC (permalink / raw) To: git; +Cc: Paul Mackerras, Johannes Sixt The previous patch has added functions that implement per-file contents encoding selection logic. This one modifies core gitk code to use them. Since diffs usually contain multiple files, which may have different values of the encoding attribute, they are initially read as binary, and encoding conversion is applied to individual lines, based on the active set of diff headers. As a side job, this patch also fixes some bugs in handling of non-ASCII filenames. Core git apparently supports only locale-encoded filenames, so processing is done using the system encoding. Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com> Tested-by: Johannes Sixt <johannes.sixt@telecom.at> --- gitk | 35 +++++++++++++++++++++++++++-------- 1 files changed, 27 insertions(+), 8 deletions(-) diff --git a/gitk b/gitk index 8682838..5f35f61 100755 --- a/gitk +++ b/gitk @@ -6229,7 +6229,7 @@ proc gettree {id} { set treepending $id set treefilelist($id) {} set treeidlist($id) {} - fconfigure $gtf -blocking 0 + fconfigure $gtf -blocking 0 -encoding binary filerun $gtf [list gettreeline $gtf $id] } } else { @@ -6251,11 +6251,12 @@ proc gettreeline {gtf id} { set line [string range $line 0 [expr {$i-1}]] if {$diffids ne $nullid2 && [lindex $line 1] ne "blob"} continue set sha1 [lindex $line 2] - if {[string index $fname 0] eq "\""} { - set fname [lindex $fname 0] - } lappend treeidlist($id) $sha1 } + if {[string index $fname 0] eq "\""} { + set fname [lindex $fname 0] + } + set fname [encoding convertfrom $fname] lappend treefilelist($id) $fname } if {![eof $gtf]} { @@ -6296,7 +6297,7 @@ proc showfile {f} { return } } - fconfigure $bf -blocking 0 + fconfigure $bf -blocking 0 -encoding [get_path_encoding $f] filerun $bf [list getblobline $bf $diffids] $ctext config -state normal clear_ctext $commentend @@ -6334,6 +6335,7 @@ proc mergediff {id} { global diffids global parents global diffcontext + global diffencoding global limitdiffs vfilelimit curview set diffmergeid $id @@ -6347,9 +6349,10 @@ proc mergediff {id} { error_popup "[mc "Error getting merge diffs:"] $err" return } - fconfigure $mdf -blocking 0 + fconfigure $mdf -blocking 0 -encoding binary set mdifffd($id) $mdf set np [llength $parents($curview,$id)] + set diffencoding [get_path_encoding {}] settabs $np filerun $mdf [list getmergediffline $mdf $id $np] } @@ -6357,6 +6360,7 @@ proc mergediff {id} { proc getmergediffline {mdf id np} { global diffmergeid ctext cflist mergemax global difffilestart mdifffd + global diffencoding $ctext conf -state normal set nr 0 @@ -6368,18 +6372,22 @@ proc getmergediffline {mdf id np} { } if {[regexp {^diff --cc (.*)} $line match fname]} { # start of a new file + set fname [encoding convertfrom $fname] $ctext insert end "\n" set here [$ctext index "end - 1c"] lappend difffilestart $here add_flist [list $fname] + set diffencoding [get_path_encoding $fname] set l [expr {(78 - [string length $fname]) / 2}] set pad [string range "----------------------------------------" 1 $l] $ctext insert end "$pad $fname $pad\n" filesep } elseif {[regexp {^@@} $line]} { + set line [encoding convertfrom $diffencoding $line] $ctext insert end "$line\n" hunksep } elseif {[regexp {^[0-9a-f]{40}$} $line] || [regexp {^index} $line]} { # do nothing } else { + set line [encoding convertfrom $diffencoding $line] # parse the prefix - one ' ', '-' or '+' for each parent set spaces {} set minuses {} @@ -6514,7 +6522,7 @@ proc gettreediffs {ids} { set treepending $ids set treediff {} - fconfigure $gdtf -blocking 0 + fconfigure $gdtf -blocking 0 -encoding binary filerun $gdtf [list gettreediffline $gdtf $ids] } @@ -6530,6 +6538,7 @@ proc gettreediffline {gdtf ids} { if {[string index $file 0] eq "\""} { set file [lindex $file 0] } + set file [encoding convertfrom $file] lappend treediff $file } } @@ -6587,6 +6596,7 @@ proc getblobdiffs {ids} { global diffcontext global ignorespace global limitdiffs vfilelimit curview + global diffencoding set cmd [diffcmd $ids "-p -C --no-commit-id -U$diffcontext"] if {$ignorespace} { @@ -6600,7 +6610,8 @@ proc getblobdiffs {ids} { return } set diffinhdr 0 - fconfigure $bdf -blocking 0 + set diffencoding [get_path_encoding {}] + fconfigure $bdf -blocking 0 -encoding binary set blobdifffd($ids) $bdf filerun $bdf [list getblobdiffline $bdf $diffids] } @@ -6634,6 +6645,7 @@ proc getblobdiffline {bdf ids} { global diffids blobdifffd ctext curdiffstart global diffnexthead diffnextnote difffilestart global diffinhdr treediffs + global diffencoding set nr 0 $ctext conf -state normal @@ -6671,10 +6683,13 @@ proc getblobdiffline {bdf ids} { } else { set fname [string range $line 2 [expr {$i - 1}]] } + set fname [encoding convertfrom $fname] + set diffencoding [get_path_encoding $fname] makediffhdr $fname $ids } elseif {[regexp {^@@ -([0-9]+)(,[0-9]+)? \+([0-9]+)(,[0-9]+)? @@(.*)} \ $line match f1l f1c f2l f2c rest]} { + set line [encoding convertfrom $diffencoding $line] $ctext insert end "$line\n" hunksep set diffinhdr 0 @@ -6684,6 +6699,7 @@ proc getblobdiffline {bdf ids} { if {[string index $fname 0] eq "\""} { set fname [lindex $fname 0] } + set fname [encoding convertfrom $fname] set i [lsearch -exact $treediffs($ids) $fname] if {$i >= 0} { setinlist difffilestart $i $curdiffstart @@ -6694,6 +6710,8 @@ proc getblobdiffline {bdf ids} { if {[string index $fname 0] eq "\""} { set fname [lindex $fname 0] } + set fname [encoding convertfrom $fname] + set diffencoding [get_path_encoding $fname] makediffhdr $fname $ids } elseif {[string compare -length 3 $line "---"] == 0} { # do nothing @@ -6705,6 +6723,7 @@ proc getblobdiffline {bdf ids} { $ctext insert end "$line\n" filesep } else { + set line [encoding convertfrom $diffencoding $line] set x [string range $line 0 0] if {$x == "-" || $x == "+"} { set tag [expr {$x == "+"}] -- 1.6.0.20.g6148bc ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH (GITK) v3 3/4] gitk: Implement batch lookup and caching of encoding attrs. 2008-10-13 8:12 ` [PATCH (GITK) v3 2/4] gitk: Enhance file encoding support Alexander Gavrilov @ 2008-10-13 8:12 ` Alexander Gavrilov 2008-10-13 8:12 ` [PATCH (GITK) v3 4/4] gitk: Optimize encoding name resolution using a lookup table Alexander Gavrilov 0 siblings, 1 reply; 8+ messages in thread From: Alexander Gavrilov @ 2008-10-13 8:12 UTC (permalink / raw) To: git; +Cc: Paul Mackerras, Johannes Sixt When the diff contains thousands of files, calling git-check-attr once per file is very slow. With this patch gitk does attribute lookup in batches of 30 files while reading the diff file list, which leads to a very noticeable speedup. It may be possible to reimplement this even more efficiently, if git-check-attr is modified to support a --stdin-paths option. Additionally, it should quote the ':' character in file paths, or provide a more robust way of column separation. Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com> Tested-by: Johannes Sixt <johannes.sixt@telecom.at> --- gitk | 35 ++++++++++++++++++++++++++++++++++- 1 files changed, 34 insertions(+), 1 deletions(-) diff --git a/gitk b/gitk index 5f35f61..8cd3171 100755 --- a/gitk +++ b/gitk @@ -6531,6 +6531,7 @@ proc gettreediffline {gdtf ids} { global cmitmode vfilelimit curview limitdiffs set nr 0 + set sublist {} while {[incr nr] <= 1000 && [gets $gdtf line] >= 0} { set i [string first "\t" $line] if {$i >= 0} { @@ -6540,8 +6541,10 @@ proc gettreediffline {gdtf ids} { } set file [encoding convertfrom $file] lappend treediff $file + lappend sublist $file } } + cache_gitattr encoding $sublist if {![eof $gdtf]} { return [expr {$nr >= 1000? 2: 1}] } @@ -9816,18 +9819,48 @@ proc tcl_encoding {enc} { } proc gitattr {path attr default} { - if {[catch {set r [exec git check-attr $attr -- $path]}]} { + global path_attr_cache + if {[info exists path_attr_cache($attr,$path)]} { + set r $path_attr_cache($attr,$path) + } elseif {[catch {set r [exec git check-attr $attr -- $path]}]} { set r unspecified } else { set r [join [lrange [split $r :] 2 end] :] regsub {^ } $r {} r } + set path_attr_cache($attr,$path) $r if {$r eq {unspecified}} { return $default } return $r } +proc cache_gitattr {attr pathlist} { + global path_attr_cache + set newlist {} + foreach path $pathlist { + if {[info exists path_attr_cache($attr,$path)]} continue + lappend newlist $path + } + while {$newlist ne {}} { + set head [lrange $newlist 0 29] + set newlist [lrange $newlist 30 end] + if {![catch {set rlist [eval exec git check-attr $attr -- $head]}]} { + foreach row [split $rlist "\n"] { + set cols [split $row :] + set path [lindex $cols 0] + set value [join [lrange $cols 2 end] :] + if {[string index $path 0] eq "\""} { + set path [encoding convertfrom [lindex $path 0]] + } + regsub {^ } $value {} value + set path_attr_cache($attr,$path) $value + } + } + update + } +} + proc get_path_encoding {path} { global gui_encoding set tcl_enc [tcl_encoding $gui_encoding] -- 1.6.0.20.g6148bc ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH (GITK) v3 4/4] gitk: Optimize encoding name resolution using a lookup table. 2008-10-13 8:12 ` [PATCH (GITK) v3 3/4] gitk: Implement batch lookup and caching of encoding attrs Alexander Gavrilov @ 2008-10-13 8:12 ` Alexander Gavrilov 0 siblings, 0 replies; 8+ messages in thread From: Alexander Gavrilov @ 2008-10-13 8:12 UTC (permalink / raw) To: git; +Cc: Paul Mackerras, Johannes Sixt Current implementation of tcl_encoding uses linear search, and is rather slow when called hundreds of times. This commit reimplements it using a lookup table, which is efficiently calculated on the first run. After that, resolution costs two calls to [info exists], and one actual fetch from the table. This is a port of git-gui commit a1c3feb7fac7. Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com> --- gitk | 84 ++++++++++++++++++++++++++++++++++++++++++----------------------- 1 files changed, 54 insertions(+), 30 deletions(-) diff --git a/gitk b/gitk index 8cd3171..134df00 100755 --- a/gitk +++ b/gitk @@ -9783,39 +9783,63 @@ set encoding_aliases { { Big5 csBig5 } } -proc tcl_encoding {enc} { - global encoding_aliases - set names [encoding names] - set lcnames [string tolower $names] - set enc [string tolower $enc] - set i [lsearch -exact $lcnames $enc] - if {$i < 0} { - # look for "isonnn" instead of "iso-nnn" or "iso_nnn" - if {[regsub {^(iso|cp|ibm|jis)[-_]} $enc {\1} encx]} { - set i [lsearch -exact $lcnames $encx] +proc build_encoding_table {} { + global encoding_aliases encoding_lookup_table + + # Prepare the lookup list; cannot use lsort -nocase because + # of compatibility issues with older Tcl (e.g. in msysgit) + set names [list] + foreach item [encoding names] { + lappend names [list [string tolower $item] $item] + } + set names [lsort -ascii -index 0 $names] + # neither can we use lsearch -index + set lnames [list] + foreach item $names { + lappend lnames [lindex $item 0] + } + + foreach grp $encoding_aliases { + set target {} + foreach item $grp { + set i [lsearch -sorted -ascii $lnames \ + [string tolower $item]] + if {$i >= 0} { + set target [lindex $names $i 1] + break + } + } + if {$target eq {}} continue + foreach item $grp { + set encoding_lookup_table([string tolower $item]) $target + } } - } - if {$i < 0} { - foreach l $encoding_aliases { - set ll [string tolower $l] - if {[lsearch -exact $ll $enc] < 0} continue - # look through the aliases for one that tcl knows about - foreach e $ll { - set i [lsearch -exact $lcnames $e] - if {$i < 0} { - if {[regsub {^(iso|cp|ibm|jis)[-_]} $e {\1} ex]} { - set i [lsearch -exact $lcnames $ex] - } + + foreach item $names { + set encoding_lookup_table([lindex $item 0]) [lindex $item 1] + } +} + +proc tcl_encoding {enc} { + global encoding_lookup_table + if {$enc eq {}} { + return {} + } + if {![info exists encoding_lookup_table]} { + build_encoding_table + } + set enc [string tolower $enc] + if {![info exists encoding_lookup_table($enc)]} { + # look for "isonnn" instead of "iso-nnn" or "iso_nnn" + if {[regsub {^(iso|cp|ibm|jis)[-_]} $enc {\1} encx]} { + set enc $encx } - if {$i >= 0} break - } - break } - } - if {$i >= 0} { - return [lindex $names $i] - } - return {} + if {[info exists encoding_lookup_table($enc)]} { + return $encoding_lookup_table($enc) + } else { + return {} + } } proc gitattr {path attr default} { -- 1.6.0.20.g6148bc ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH (GITK) v3 0/4] Enhance encoding support. 2008-10-13 8:12 [PATCH (GITK) v3 0/4] Enhance encoding support Alexander Gavrilov 2008-10-13 8:12 ` [PATCH (GITK) v3 1/4] gitk: Port new encoding logic from git-gui Alexander Gavrilov @ 2008-10-15 12:32 ` Paul Mackerras 2008-10-15 12:38 ` Paul Mackerras 2 siblings, 0 replies; 8+ messages in thread From: Paul Mackerras @ 2008-10-15 12:32 UTC (permalink / raw) To: Alexander Gavrilov; +Cc: git, Johannes Sixt Alexander Gavrilov writes: > Currently GUI tools don't provide enough support for > viewing files that contain non-ASCII characters. This set of > patches addresses this issue. The git-gui part of the series > is based on patches that were in the 'pu' branch of the > git-gui repository. Thanks. I applied 1/4 and 2/4 as one commit and 3/4 as another. However, it seems invoking git check-attr is quite slow, or at least, there is a large startup cost which is not really amortized by batching up only 30 paths per call. As a test I tried displaying the diff in the kernel tree from 2.6.27 to Linus' master as of a day or so ago (3fa8749e) which has 5277 files different. With the 30-way batching this takes over 6 seconds to get the file encodings (all of which are "unspecified" since I don't have any .gitattributes files), which we need before we display the list of changed files, which before only took 0.12 seconds -- so we're 50x slower at showing the list of changed files as a result of adding the per-file encoding support. Also, each call to cache_gitattr was taking 1.2 seconds, which is too long since file handlers are supposed to limit themselves to less than 100ms (preferably less than 50ms) of processing on any one call, and the cache_gitattr call was making gettreediffline take way too long. So I have added another commit which makes the per-file encoding support optional and default to off for now. It also increases the amount of batching in cache_gitattr (except under windows) and reduces the number of lines that gettreediffline does when per-file encoding support is on. Along the way, it also converts the new code to the gitk coding style, cleans up a few things and adds a simple cache to tclencoding. If git check-attr gets the --stdin-paths flag added, that will be good, except then getblobdiffline will need to do something clever if it needs the encoding for a file but the result hasn't come back from git-check-attr yet. That could get quite tricky in fact... Paul. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH (GITK) v3 0/4] Enhance encoding support. 2008-10-13 8:12 [PATCH (GITK) v3 0/4] Enhance encoding support Alexander Gavrilov 2008-10-13 8:12 ` [PATCH (GITK) v3 1/4] gitk: Port new encoding logic from git-gui Alexander Gavrilov 2008-10-15 12:32 ` [PATCH (GITK) v3 0/4] Enhance encoding support Paul Mackerras @ 2008-10-15 12:38 ` Paul Mackerras 2008-10-15 13:09 ` Alexander Gavrilov 2 siblings, 1 reply; 8+ messages in thread From: Paul Mackerras @ 2008-10-15 12:38 UTC (permalink / raw) To: Alexander Gavrilov; +Cc: git, Johannes Sixt Alexander Gavrilov writes: > Since git apparently cannot work with filenames in non-locale > encodings anyway, I did not try to do anything about it apart > from fixing some obvious bugs. What we did before was read filenames and convert them from the system encoding (done implicitly by gets) before unquoting filenames that were quoted. What we do now with your patch 1/2 is that we read the filenames in binary and unquote any quoted filenames before converting from the system encoding. So I don't think your patch would have made as much difference as it might appear. If there is a reason for unquoting before converting from the system encoding rather than after, it seems pretty subtle to me and wasn't explained in the patch description. An explanation, preferably with examples, would be useful. Also, you didn't say whether you found the "obvious bugs" by inspection or by encountering their effects in actual running (and if so, what those effects were). That information is also good to have in the patch description. Paul. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH (GITK) v3 0/4] Enhance encoding support. 2008-10-15 12:38 ` Paul Mackerras @ 2008-10-15 13:09 ` Alexander Gavrilov 0 siblings, 0 replies; 8+ messages in thread From: Alexander Gavrilov @ 2008-10-15 13:09 UTC (permalink / raw) To: Paul Mackerras; +Cc: git, Johannes Sixt On Wed, Oct 15, 2008 at 4:38 PM, Paul Mackerras <paulus@samba.org> wrote: > Alexander Gavrilov writes: > >> Since git apparently cannot work with filenames in non-locale >> encodings anyway, I did not try to do anything about it apart >> from fixing some obvious bugs. > > What we did before was read filenames and convert them from the system > encoding (done implicitly by gets) before unquoting filenames that > were quoted. What we do now with your patch 1/2 is that we read the > filenames in binary and unquote any quoted filenames before converting > from the system encoding. So I don't think your patch would have made > as much difference as it might appear. If there is a reason for > unquoting before converting from the system encoding rather than > after, it seems pretty subtle to me and wasn't explained in the patch > description. An explanation, preferably with examples, would be > useful. The reason is that non-ASCII characters may be quoted too, so the string that we read looks like "\204\206\204y\204s\204\200\204r\204y\204~\204p.txt". There is no point decoding it before unquoting. > Also, you didn't say whether you found the "obvious bugs" by > inspection or by encountering their effects in actual running (and if > so, what those effects were). That information is also good to have > in the patch description. I actually created a test repository with non-ASCII filenames. If I remember it correctly, the bugs manifested as strings in the tree view appearing as if they were decoded using ISO-8859-1 (the result of decoding before unquoting), or unstaged files being listed quoted as above. Now the remaining encoding issues are: 1) Commit messages that are loaded through readcommit are decoded using the system encoding. It is rare, but it happens. This is a bug. proc readcommit {id} { if {[catch {set contents [exec git cat-file commit $id]}]} return parsecommit $id $contents 0 } 2) Gitk cannot process commits stored in multiple different encodings: they all are decoded using the current value of i18n.commitencoding. This seems to be low priority, because most GUI users are better off using utf-8 for their commits anyway. Alexander ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-10-15 13:10 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-10-13 8:12 [PATCH (GITK) v3 0/4] Enhance encoding support Alexander Gavrilov 2008-10-13 8:12 ` [PATCH (GITK) v3 1/4] gitk: Port new encoding logic from git-gui Alexander Gavrilov 2008-10-13 8:12 ` [PATCH (GITK) v3 2/4] gitk: Enhance file encoding support Alexander Gavrilov 2008-10-13 8:12 ` [PATCH (GITK) v3 3/4] gitk: Implement batch lookup and caching of encoding attrs Alexander Gavrilov 2008-10-13 8:12 ` [PATCH (GITK) v3 4/4] gitk: Optimize encoding name resolution using a lookup table Alexander Gavrilov 2008-10-15 12:32 ` [PATCH (GITK) v3 0/4] Enhance encoding support Paul Mackerras 2008-10-15 12:38 ` Paul Mackerras 2008-10-15 13:09 ` Alexander Gavrilov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox