* [PATCH (GITK) v2 0/4] Encoding support in GUI @ 2008-09-30 11:00 Alexander Gavrilov 2008-09-30 11:00 ` [PATCH (GITK) v2 1/4] gitk: Port new encoding logic from git-gui Alexander Gavrilov 0 siblings, 1 reply; 12+ messages in thread From: Alexander Gavrilov @ 2008-09-30 11:00 UTC (permalink / raw) To: git; +Cc: Paul Mackerras, Johannes Sixt Currently GUI tools don't provide enough support for viewing files that contain non-ASCII characters. This set of patches addresses this issue. The git-gui part of the series is based on patches that were in the 'pu' branch of the git-gui repository. UPDATE: the git-gui part of the series is in master. File encoding can be specified in the following ways: 1) It defaults to the current locale encoding. 2) It can be overridden by setting the gui.encoding option. 3) It can be further set on per-file basis by specifying the 'encoding' attribute in gitattributes. 4) Finally, git-gui allows directly selecting encodings through a popup menu. Since git apparently cannot work with filenames in non-locale encodings anyway, I did not try to do anything about it apart from fixing some obvious bugs. CHANGES: - Wrote better comments for the first three patches. - Added a patch to speed up loading of very large diffs. GITK: gitk: Port new encoding logic from git-gui. --- gitk | 50 +++++++++++++++++++++++++++++++++++++++++++++++--- 1 files changed, 47 insertions(+), 3 deletions(-) gitk: Implement file contents encoding support. --- gitk | 19 ++++++++++++++++--- 1 files changed, 16 insertions(+), 3 deletions(-) gitk: Support filenames in the locale encoding. --- gitk | 16 +++++++++++----- 1 files changed, 11 insertions(+), 5 deletions(-) gitk: Implement batch lookup and caching of encoding attrs. --- gitk | 35 ++++++++++++++++++++++++++++++++++- 1 files changed, 34 insertions(+), 1 deletions(-) ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH (GITK) v2 1/4] gitk: Port new encoding logic from git-gui. 2008-09-30 11:00 [PATCH (GITK) v2 0/4] Encoding support in GUI Alexander Gavrilov @ 2008-09-30 11:00 ` Alexander Gavrilov 2008-09-30 11:00 ` [PATCH (GITK) v2 2/4] gitk: Implement file contents encoding support Alexander Gavrilov 0 siblings, 1 reply; 12+ messages in thread From: Alexander Gavrilov @ 2008-09-30 11:00 UTC (permalink / raw) To: git; +Cc: Paul Mackerras, Johannes Sixt Recently handling of file encodings in GIT-GUI has been greatly enhanced. It now follows these rules: - File encoding defaults to the system encoding. - It can be overridden by setting the gui.encoding option. - Finally, the 'encoding' attribute is checked on per-file basis; it has the last word. This patch inserts copies of git-gui functions necessary for supporting this into gitk. The following patch will connect them to the rest of the code. Note: Since git-check-attr does not provide support for reading attributes from trees, attribute lookup is done using files from the working directory. It also extends the range of supported encoding names, to match changes in git-gui. Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com> Tested-by: Johannes Sixt <johannes.sixt@telecom.at> --- gitk | 50 +++++++++++++++++++++++++++++++++++++++++++++++--- 1 files changed, 47 insertions(+), 3 deletions(-) diff --git a/gitk b/gitk index 546627f..b210f79 100755 --- a/gitk +++ b/gitk @@ -9726,7 +9726,7 @@ set encoding_aliases { { ISO-8859-16 iso-ir-226 ISO_8859-16:2001 ISO_8859-16 latin10 l10 } { GBK CP936 MS936 windows-936 } { JIS_Encoding csJISEncoding } - { Shift_JIS MS_Kanji csShiftJIS } + { Shift_JIS MS_Kanji csShiftJIS ShiftJIS Shift-JIS } { Extended_UNIX_Code_Packed_Format_for_Japanese csEUCPkdFmtJapanese EUC-JP } { Extended_UNIX_Code_Fixed_Width_for_Japanese csEUCFixWidJapanese } @@ -9768,7 +9768,7 @@ proc tcl_encoding {enc} { set i [lsearch -exact $lcnames $enc] if {$i < 0} { # look for "isonnn" instead of "iso-nnn" or "iso_nnn" - if {[regsub {^iso[-_]} $enc iso encx]} { + if {[regsub {^(iso|cp|ibm|jis)[-_]} $enc {\1} encx]} { set i [lsearch -exact $lcnames $encx] } } @@ -9780,7 +9780,7 @@ proc tcl_encoding {enc} { foreach e $ll { set i [lsearch -exact $lcnames $e] if {$i < 0} { - if {[regsub {^iso[-_]} $e iso ex]} { + if {[regsub {^(iso|cp|ibm|jis)[-_]} $e {\1} ex]} { set i [lsearch -exact $lcnames $ex] } } @@ -9795,6 +9795,45 @@ proc tcl_encoding {enc} { return {} } +proc gitattr {path attr default} { + if {[catch {set r [exec git check-attr $attr -- $path]}]} { + set r unspecified + } else { + set r [join [lrange [split $r :] 2 end] :] + regsub {^ } $r {} r + } + if {$r eq {unspecified}} { + return $default + } + return $r +} + +proc get_path_encoding {path} { + global gui_encoding + set tcl_enc [tcl_encoding $gui_encoding] + if {$tcl_enc eq {}} { + set tcl_enc [encoding system] + } + if {$path ne {}} { + set enc2 [tcl_encoding [gitattr $path encoding $tcl_enc]] + if {$enc2 ne {}} { + set tcl_enc $enc2 + } + } + return $tcl_enc +} + +proc get_cached_encoding {path} { + global path_encoding_cache + if {[info exists path_encoding_cache($path)]} { + return $path_encoding_cache($path) + } else { + set enc [get_path_encoding $path] + set path_encoding_cache($path) $enc + return $enc + } +} + # First check that Tcl/Tk is recent enough if {[catch {package require Tk 8.4} err]} { show_error {} . [mc "Sorry, gitk cannot run with this version of Tcl/Tk.\n\ @@ -9817,6 +9856,11 @@ if {$tclencoding == {}} { puts stderr "Warning: encoding $gitencoding is not supported by Tcl/Tk" } +set gui_encoding [encoding system] +catch { + set gui_encoding [exec git config --get gui.encoding] +} + set mainfont {Helvetica 9} set textfont {Courier 9} set uifont {Helvetica 9 bold} -- 1.6.0.20.g6148bc ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH (GITK) v2 2/4] gitk: Implement file contents encoding support. 2008-09-30 11:00 ` [PATCH (GITK) v2 1/4] gitk: Port new encoding logic from git-gui Alexander Gavrilov @ 2008-09-30 11:00 ` Alexander Gavrilov 2008-09-30 11:00 ` [PATCH (GITK) v2 3/4] gitk: Support filenames in the locale encoding Alexander Gavrilov 0 siblings, 1 reply; 12+ messages in thread From: Alexander Gavrilov @ 2008-09-30 11:00 UTC (permalink / raw) To: git; +Cc: Paul Mackerras, Johannes Sixt The previous patch has added functions that implement per-file encoding selection logic. This one modifies core gitk code to use them. Since diffs usually contain multiple files, which may have different values of the encoding attribute, they are initially read as binary, and encoding conversion is applied to individual lines, based on the active set of diff headers. Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com> Tested-by: Johannes Sixt <johannes.sixt@telecom.at> --- gitk | 19 ++++++++++++++++--- 1 files changed, 16 insertions(+), 3 deletions(-) diff --git a/gitk b/gitk index b210f79..98923e7 100755 --- a/gitk +++ b/gitk @@ -6295,7 +6295,7 @@ proc showfile {f} { return } } - fconfigure $bf -blocking 0 + fconfigure $bf -blocking 0 -encoding [get_cached_encoding $f] filerun $bf [list getblobline $bf $diffids] $ctext config -state normal clear_ctext $commentend @@ -6333,6 +6333,7 @@ proc mergediff {id} { global diffids global parents global diffcontext + global diffencoding global limitdiffs vfilelimit curview set diffmergeid $id @@ -6346,9 +6347,10 @@ proc mergediff {id} { error_popup "[mc "Error getting merge diffs:"] $err" return } - fconfigure $mdf -blocking 0 + fconfigure $mdf -blocking 0 -encoding binary set mdifffd($id) $mdf set np [llength $parents($curview,$id)] + set diffencoding [get_cached_encoding {}] settabs $np filerun $mdf [list getmergediffline $mdf $id $np] } @@ -6356,6 +6358,7 @@ proc mergediff {id} { proc getmergediffline {mdf id np} { global diffmergeid ctext cflist mergemax global difffilestart mdifffd + global diffencoding $ctext conf -state normal set nr 0 @@ -6371,14 +6374,17 @@ proc getmergediffline {mdf id np} { set here [$ctext index "end - 1c"] lappend difffilestart $here add_flist [list $fname] + set diffencoding [get_cached_encoding $fname] set l [expr {(78 - [string length $fname]) / 2}] set pad [string range "----------------------------------------" 1 $l] $ctext insert end "$pad $fname $pad\n" filesep } elseif {[regexp {^@@} $line]} { + set line [encoding convertfrom $diffencoding $line] $ctext insert end "$line\n" hunksep } elseif {[regexp {^[0-9a-f]{40}$} $line] || [regexp {^index} $line]} { # do nothing } else { + set line [encoding convertfrom $diffencoding $line] # parse the prefix - one ' ', '-' or '+' for each parent set spaces {} set minuses {} @@ -6586,6 +6592,7 @@ proc getblobdiffs {ids} { global diffcontext global ignorespace global limitdiffs vfilelimit curview + global diffencoding set cmd [diffcmd $ids "-p -C --no-commit-id -U$diffcontext"] if {$ignorespace} { @@ -6599,7 +6606,8 @@ proc getblobdiffs {ids} { return } set diffinhdr 0 - fconfigure $bdf -blocking 0 + set diffencoding [get_cached_encoding {}] + fconfigure $bdf -blocking 0 -encoding binary set blobdifffd($ids) $bdf filerun $bdf [list getblobdiffline $bdf $diffids] } @@ -6633,6 +6641,7 @@ proc getblobdiffline {bdf ids} { global diffids blobdifffd ctext curdiffstart global diffnexthead diffnextnote difffilestart global diffinhdr treediffs + global diffencoding set nr 0 $ctext conf -state normal @@ -6670,10 +6679,12 @@ proc getblobdiffline {bdf ids} { } else { set fname [string range $line 2 [expr {$i - 1}]] } + set diffencoding [get_cached_encoding $fname] makediffhdr $fname $ids } elseif {[regexp {^@@ -([0-9]+)(,[0-9]+)? \+([0-9]+)(,[0-9]+)? @@(.*)} \ $line match f1l f1c f2l f2c rest]} { + set line [encoding convertfrom $diffencoding $line] $ctext insert end "$line\n" hunksep set diffinhdr 0 @@ -6693,6 +6704,7 @@ proc getblobdiffline {bdf ids} { if {[string index $fname 0] eq "\""} { set fname [lindex $fname 0] } + set diffencoding [get_cached_encoding $fname] makediffhdr $fname $ids } elseif {[string compare -length 3 $line "---"] == 0} { # do nothing @@ -6704,6 +6716,7 @@ proc getblobdiffline {bdf ids} { $ctext insert end "$line\n" filesep } else { + set line [encoding convertfrom $diffencoding $line] set x [string range $line 0 0] if {$x == "-" || $x == "+"} { set tag [expr {$x == "+"}] -- 1.6.0.20.g6148bc ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH (GITK) v2 3/4] gitk: Support filenames in the locale encoding. 2008-09-30 11:00 ` [PATCH (GITK) v2 2/4] gitk: Implement file contents encoding support Alexander Gavrilov @ 2008-09-30 11:00 ` Alexander Gavrilov 2008-09-30 11:00 ` [PATCH (GITK) v2 4/4] gitk: Implement batch lookup and caching of encoding attrs Alexander Gavrilov 2008-10-10 11:21 ` [PATCH (GITK) v2 3/4] gitk: Support filenames in the locale encoding Paul Mackerras 0 siblings, 2 replies; 12+ messages in thread From: Alexander Gavrilov @ 2008-09-30 11:00 UTC (permalink / raw) To: git; +Cc: Paul Mackerras, Johannes Sixt The previous patch changed the encoding used for reading diffs to binary, which broke non-ASCII filename support. This one fixes the breakage, together with some existing bugs related to filename encoding. Since core git apparently assumes that filenames are specified in the locale encoding, this is what we should use for them in gitk as well. Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com> Tested-by: Johannes Sixt <johannes.sixt@telecom.at> --- gitk | 16 +++++++++++----- 1 files changed, 11 insertions(+), 5 deletions(-) diff --git a/gitk b/gitk index 98923e7..254faa1 100755 --- a/gitk +++ b/gitk @@ -6228,7 +6228,7 @@ proc gettree {id} { set treepending $id set treefilelist($id) {} set treeidlist($id) {} - fconfigure $gtf -blocking 0 + fconfigure $gtf -blocking 0 -encoding binary filerun $gtf [list gettreeline $gtf $id] } } else { @@ -6250,11 +6250,12 @@ proc gettreeline {gtf id} { set line [string range $line 0 [expr {$i-1}]] if {$diffids ne $nullid2 && [lindex $line 1] ne "blob"} continue set sha1 [lindex $line 2] - if {[string index $fname 0] eq "\""} { - set fname [lindex $fname 0] - } lappend treeidlist($id) $sha1 } + if {[string index $fname 0] eq "\""} { + set fname [lindex $fname 0] + } + set fname [encoding convertfrom $fname] lappend treefilelist($id) $fname } if {![eof $gtf]} { @@ -6370,6 +6371,7 @@ proc getmergediffline {mdf id np} { } if {[regexp {^diff --cc (.*)} $line match fname]} { # start of a new file + set fname [encoding convertfrom $fname] $ctext insert end "\n" set here [$ctext index "end - 1c"] lappend difffilestart $here @@ -6519,7 +6521,7 @@ proc gettreediffs {ids} { set treepending $ids set treediff {} - fconfigure $gdtf -blocking 0 + fconfigure $gdtf -blocking 0 -encoding binary filerun $gdtf [list gettreediffline $gdtf $ids] } @@ -6535,6 +6537,7 @@ proc gettreediffline {gdtf ids} { if {[string index $file 0] eq "\""} { set file [lindex $file 0] } + set file [encoding convertfrom $file] lappend treediff $file } } @@ -6679,6 +6682,7 @@ proc getblobdiffline {bdf ids} { } else { set fname [string range $line 2 [expr {$i - 1}]] } + set fname [encoding convertfrom $fname] set diffencoding [get_cached_encoding $fname] makediffhdr $fname $ids @@ -6694,6 +6698,7 @@ proc getblobdiffline {bdf ids} { if {[string index $fname 0] eq "\""} { set fname [lindex $fname 0] } + set fname [encoding convertfrom $fname] set i [lsearch -exact $treediffs($ids) $fname] if {$i >= 0} { setinlist difffilestart $i $curdiffstart @@ -6704,6 +6709,7 @@ proc getblobdiffline {bdf ids} { if {[string index $fname 0] eq "\""} { set fname [lindex $fname 0] } + set fname [encoding convertfrom $fname] set diffencoding [get_cached_encoding $fname] makediffhdr $fname $ids } elseif {[string compare -length 3 $line "---"] == 0} { -- 1.6.0.20.g6148bc ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH (GITK) v2 4/4] gitk: Implement batch lookup and caching of encoding attrs. 2008-09-30 11:00 ` [PATCH (GITK) v2 3/4] gitk: Support filenames in the locale encoding Alexander Gavrilov @ 2008-09-30 11:00 ` Alexander Gavrilov 2008-10-10 11:48 ` Paul Mackerras 2008-10-10 11:21 ` [PATCH (GITK) v2 3/4] gitk: Support filenames in the locale encoding Paul Mackerras 1 sibling, 1 reply; 12+ messages in thread From: Alexander Gavrilov @ 2008-09-30 11:00 UTC (permalink / raw) To: git; +Cc: Paul Mackerras, Johannes Sixt When the diff contains thousands of files, calling git-check-attr once per file is very slow. With this patch gitk does attribute lookup in batches of 30 files while reading the diff file list, which leads to a very noticeable speedup. It may be possible to reimplement this even more efficiently, if git-check-attr is modified to support a --stdin-paths option. Additionally, it should quote the ':' character in file paths, or provide a more robust way of column separation. Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com> Tested-by: Johannes Sixt <johannes.sixt@telecom.at> --- gitk | 35 ++++++++++++++++++++++++++++++++++- 1 files changed, 34 insertions(+), 1 deletions(-) diff --git a/gitk b/gitk index 1355aa2..cf557c3 100755 --- a/gitk +++ b/gitk @@ -6530,6 +6530,7 @@ proc gettreediffline {gdtf ids} { global cmitmode vfilelimit curview limitdiffs set nr 0 + set sublist {} while {[incr nr] <= 1000 && [gets $gdtf line] >= 0} { set i [string first "\t" $line] if {$i >= 0} { @@ -6539,8 +6540,10 @@ proc gettreediffline {gdtf ids} { } set file [encoding convertfrom $file] lappend treediff $file + lappend sublist $file } } + cache_gitattr encoding $sublist if {![eof $gdtf]} { return [expr {$nr >= 1000? 2: 1}] } @@ -9839,18 +9842,48 @@ proc tcl_encoding {enc} { } proc gitattr {path attr default} { - if {[catch {set r [exec git check-attr $attr -- $path]}]} { + global path_attr_cache + if {[info exists path_attr_cache($attr,$path)]} { + set r $path_attr_cache($attr,$path) + } elseif {[catch {set r [exec git check-attr $attr -- $path]}]} { set r unspecified } else { set r [join [lrange [split $r :] 2 end] :] regsub {^ } $r {} r } + set path_attr_cache($attr,$path) $r if {$r eq {unspecified}} { return $default } return $r } +proc cache_gitattr {attr pathlist} { + global path_attr_cache + set newlist {} + foreach path $pathlist { + if {[info exists path_attr_cache($attr,$path)]} continue + lappend newlist $path + } + while {$newlist ne {}} { + set head [lrange $newlist 0 29] + set newlist [lrange $newlist 30 end] + if {![catch {set rlist [eval exec git check-attr $attr -- $head]}]} { + foreach row [split $rlist "\n"] { + set cols [split $row :] + set path [lindex $cols 0] + set value [join [lrange $cols 2 end] :] + if {[string index $path 0] eq "\""} { + set path [encoding convertfrom [lindex $path 0]] + } + regsub {^ } $value {} value + set path_attr_cache($attr,$path) $value + } + } + update + } +} + proc get_path_encoding {path} { global gui_encoding set tcl_enc [tcl_encoding $gui_encoding] -- 1.6.0.20.g6148bc ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH (GITK) v2 4/4] gitk: Implement batch lookup and caching of encoding attrs. 2008-09-30 11:00 ` [PATCH (GITK) v2 4/4] gitk: Implement batch lookup and caching of encoding attrs Alexander Gavrilov @ 2008-10-10 11:48 ` Paul Mackerras 2008-10-10 12:22 ` Alexander Gavrilov 0 siblings, 1 reply; 12+ messages in thread From: Paul Mackerras @ 2008-10-10 11:48 UTC (permalink / raw) To: Alexander Gavrilov; +Cc: git, Johannes Sixt Alexander Gavrilov writes: > When the diff contains thousands of files, calling git-check-attr > once per file is very slow. With this patch gitk does attribute > lookup in batches of 30 files while reading the diff file list, > which leads to a very noticeable speedup. Why only 30 at a time? The logic would be simpler if cache_gitattr just did all the paths in one call to git check-attr, and it should be able to cope with 1000 paths in one call, I would think, which is the most that gettreediffline will give it. Also, I wonder why we now have two levels of caching of the encoding attribute. Your patch 1/4 introduced path_encoding_cache, which was fine, but now we have path_attr_cache as well, which seems to me to serve exactly the same function since the encoding is the only attribute we ever ask about. Surely we don't need both caches? Even with this batching, I am a bit concerned that adding the encoding support might make things noticeably slower for people who don't need any encoding support (which would be the majority, I think). We may end up needing an option to turn off the checking of the encoding attribute. Paul. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH (GITK) v2 4/4] gitk: Implement batch lookup and caching of encoding attrs. 2008-10-10 11:48 ` Paul Mackerras @ 2008-10-10 12:22 ` Alexander Gavrilov 2008-10-11 0:31 ` Paul Mackerras 0 siblings, 1 reply; 12+ messages in thread From: Alexander Gavrilov @ 2008-10-10 12:22 UTC (permalink / raw) To: Paul Mackerras; +Cc: git, Johannes Sixt On Fri, Oct 10, 2008 at 3:48 PM, Paul Mackerras <paulus@samba.org> wrote: > Alexander Gavrilov writes: > >> When the diff contains thousands of files, calling git-check-attr >> once per file is very slow. With this patch gitk does attribute >> lookup in batches of 30 files while reading the diff file list, >> which leads to a very noticeable speedup. > > Why only 30 at a time? The logic would be simpler if cache_gitattr > just did all the paths in one call to git check-attr, and it should be > able to cope with 1000 paths in one call, I would think, which is the > most that gettreediffline will give it. OS-enforced command-line size limit on Windows is 32K. Cramming in 1000 paths would leave only 32 characters for each path. > Also, I wonder why we now have two levels of caching of the encoding > attribute. Your patch 1/4 introduced path_encoding_cache, which was > fine, but now we have path_attr_cache as well, which seems to me to > serve exactly the same function since the encoding is the only > attribute we ever ask about. Surely we don't need both caches? If the (git-gui) patch that reimplements the tcl_encoding procedure is applied, we may drop the path_encoding_cache. Current implementation is too slow for batch lookup, especially if the encoding is actually not supported, and without the cache the lookup would be done on every loading of a diff. > Even with this batching, I am a bit concerned that adding the encoding > support might make things noticeably slower for people who don't need > any encoding support (which would be the majority, I think). We may > end up needing an option to turn off the checking of the encoding > attribute. I hope that most diffs don't contain thousands of files at once. And actual huge diffs are likely to be relatively slow to load anyway. Alexander ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH (GITK) v2 4/4] gitk: Implement batch lookup and caching of encoding attrs. 2008-10-10 12:22 ` Alexander Gavrilov @ 2008-10-11 0:31 ` Paul Mackerras 2008-10-11 9:28 ` Alexander Gavrilov 0 siblings, 1 reply; 12+ messages in thread From: Paul Mackerras @ 2008-10-11 0:31 UTC (permalink / raw) To: Alexander Gavrilov; +Cc: git, Johannes Sixt Alexander Gavrilov writes: > OS-enforced command-line size limit on Windows is 32K. Cramming in > 1000 paths would leave only 32 characters for each path. Eeek. OK. > > Also, I wonder why we now have two levels of caching of the encoding > > attribute. Your patch 1/4 introduced path_encoding_cache, which was > > fine, but now we have path_attr_cache as well, which seems to me to > > serve exactly the same function since the encoding is the only > > attribute we ever ask about. Surely we don't need both caches? > > If the (git-gui) patch that reimplements the tcl_encoding procedure is > applied, we may drop the path_encoding_cache. Current implementation > is too slow for batch lookup, especially if the encoding is actually > not supported, and without the cache the lookup would be done on every > loading of a diff. I was thinking more in terms of dropping the path_attr_cache actually. Actually, if [tcl_encoding] is slow, then why is $gui_encoding the untranslated version, so that we do [tcl_encoding $gui_encoding] on each call to get_path_encoding? Why don't we do the tcl_encoding call once and have $gui_encoding be the result of that? In fact $gui_encoding should be the result of this code (from get_path_encoding): set tcl_enc [tcl_encoding $gui_encoding] if {$tcl_enc eq {}} { set tcl_enc [encoding system] } And if [tcl_encoding] is slow, then it should have a cache. There's only likely to be at most 2 or 3 values it gets called for, and it's a constant function. At this point, what I think I might do is apply your set of patches (but with 2/4 and 3/4 folded into a single patch) and then go through and do another commit that addresses the concerns I've raised. OK? Paul. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH (GITK) v2 4/4] gitk: Implement batch lookup and caching of encoding attrs. 2008-10-11 0:31 ` Paul Mackerras @ 2008-10-11 9:28 ` Alexander Gavrilov 2008-10-11 12:03 ` Paul Mackerras 0 siblings, 1 reply; 12+ messages in thread From: Alexander Gavrilov @ 2008-10-11 9:28 UTC (permalink / raw) To: Paul Mackerras; +Cc: git, Johannes Sixt On Saturday 11 October 2008 04:31:13 Paul Mackerras wrote: > > > Also, I wonder why we now have two levels of caching of the encoding > > > attribute. Your patch 1/4 introduced path_encoding_cache, which was > > > fine, but now we have path_attr_cache as well, which seems to me to > > > serve exactly the same function since the encoding is the only > > > attribute we ever ask about. Surely we don't need both caches? > > > > If the (git-gui) patch that reimplements the tcl_encoding procedure is > > applied, we may drop the path_encoding_cache. Current implementation > > is too slow for batch lookup, especially if the encoding is actually > > not supported, and without the cache the lookup would be done on every > > loading of a diff. > > I was thinking more in terms of dropping the path_attr_cache actually. Since gitattr is a general-purpose function that can read any attribute, I decided that it should use its own cache. Basically, all this double-caching issue is fallout from my failure to anticipate the need of batch attribute lookup from the beginning... > Actually, if [tcl_encoding] is slow, then why is $gui_encoding the > untranslated version, so that we do [tcl_encoding $gui_encoding] on > each call to get_path_encoding? Why don't we do the tcl_encoding call > once and have $gui_encoding be the result of that? In fact > $gui_encoding should be the result of this code (from > get_path_encoding): > > set tcl_enc [tcl_encoding $gui_encoding] > if {$tcl_enc eq {}} { > set tcl_enc [encoding system] > } Well, that code was copied from git-gui, where it looks like this: set tcl_enc [tcl_encoding [get_config gui.encoding]] I.e. there is no "$gui_encoding" variable, although get_config does use a cache. > And if [tcl_encoding] is slow, then it should have a cache. There's > only likely to be at most 2 or 3 values it gets called for, and it's > a constant function. In git-gui the slowdown appeared during the construction of the menu listing all available encodings, so a simple cache would not have helped. I reimplemented it using a lookup table to resolve aliases (constructed on the first run). But it can be thought of as a precalculated cache. > At this point, what I think I might do is apply your set of patches > (but with 2/4 and 3/4 folded into a single patch) and then go through > and do another commit that addresses the concerns I've raised. OK? Maybe I should resend the patches, scrapping path_encoding_cache, and adding the optimized version of tcl_encoding? Alexander ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH (GITK) v2 4/4] gitk: Implement batch lookup and caching of encoding attrs. 2008-10-11 9:28 ` Alexander Gavrilov @ 2008-10-11 12:03 ` Paul Mackerras 0 siblings, 0 replies; 12+ messages in thread From: Paul Mackerras @ 2008-10-11 12:03 UTC (permalink / raw) To: Alexander Gavrilov; +Cc: git, Johannes Sixt Alexander Gavrilov writes: > > And if [tcl_encoding] is slow, then it should have a cache. There's > > only likely to be at most 2 or 3 values it gets called for, and it's > > a constant function. > > In git-gui the slowdown appeared during the construction of the menu > listing all available encodings, so a simple cache would not have helped. > I reimplemented it using a lookup table to resolve aliases (constructed > on the first run). But it can be thought of as a precalculated cache. Hmmm, one that uses more time and memory than it needs to for gitk's use... I guess it's not a lot, but it still seems unnecessary, unless you can see a need for a menu of encodings in gitk. > > At this point, what I think I might do is apply your set of patches > > (but with 2/4 and 3/4 folded into a single patch) and then go through > > and do another commit that addresses the concerns I've raised. OK? > > Maybe I should resend the patches, scrapping path_encoding_cache, > and adding the optimized version of tcl_encoding? OK. Paul. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH (GITK) v2 3/4] gitk: Support filenames in the locale encoding. 2008-09-30 11:00 ` [PATCH (GITK) v2 3/4] gitk: Support filenames in the locale encoding Alexander Gavrilov 2008-09-30 11:00 ` [PATCH (GITK) v2 4/4] gitk: Implement batch lookup and caching of encoding attrs Alexander Gavrilov @ 2008-10-10 11:21 ` Paul Mackerras 2008-10-10 11:23 ` Paul Mackerras 1 sibling, 1 reply; 12+ messages in thread From: Paul Mackerras @ 2008-10-10 11:21 UTC (permalink / raw) To: Alexander Gavrilov; +Cc: git, Johannes Sixt Alexander Gavrilov writes: > The previous patch changed the encoding used for > reading diffs to binary, which broke non-ASCII filename > support. This one fixes the breakage, together with some > existing bugs related to filename encoding. Sounds like this patch should be combined with the previous one. > @@ -6250,11 +6250,12 @@ proc gettreeline {gtf id} { > set line [string range $line 0 [expr {$i-1}]] > if {$diffids ne $nullid2 && [lindex $line 1] ne "blob"} continue > set sha1 [lindex $line 2] > - if {[string index $fname 0] eq "\""} { > - set fname [lindex $fname 0] > - } > lappend treeidlist($id) $sha1 > } > + if {[string index $fname 0] eq "\""} { > + set fname [lindex $fname 0] > + } Concerning this part of the change, do we know whether git ls-files will quote filenames containing special characters or not? Paul. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH (GITK) v2 3/4] gitk: Support filenames in the locale encoding. 2008-10-10 11:21 ` [PATCH (GITK) v2 3/4] gitk: Support filenames in the locale encoding Paul Mackerras @ 2008-10-10 11:23 ` Paul Mackerras 0 siblings, 0 replies; 12+ messages in thread From: Paul Mackerras @ 2008-10-10 11:23 UTC (permalink / raw) To: Alexander Gavrilov; +Cc: git, Johannes Sixt I wrote: > > @@ -6250,11 +6250,12 @@ proc gettreeline {gtf id} { > > set line [string range $line 0 [expr {$i-1}]] > > if {$diffids ne $nullid2 && [lindex $line 1] ne "blob"} continue > > set sha1 [lindex $line 2] > > - if {[string index $fname 0] eq "\""} { > > - set fname [lindex $fname 0] > > - } > > lappend treeidlist($id) $sha1 > > } > > + if {[string index $fname 0] eq "\""} { > > + set fname [lindex $fname 0] > > + } > > Concerning this part of the change, do we know whether git ls-files > will quote filenames containing special characters or not? A little experiment has convinced me that it does. So this part looks OK. Paul. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2008-10-11 12:04 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-09-30 11:00 [PATCH (GITK) v2 0/4] Encoding support in GUI Alexander Gavrilov 2008-09-30 11:00 ` [PATCH (GITK) v2 1/4] gitk: Port new encoding logic from git-gui Alexander Gavrilov 2008-09-30 11:00 ` [PATCH (GITK) v2 2/4] gitk: Implement file contents encoding support Alexander Gavrilov 2008-09-30 11:00 ` [PATCH (GITK) v2 3/4] gitk: Support filenames in the locale encoding Alexander Gavrilov 2008-09-30 11:00 ` [PATCH (GITK) v2 4/4] gitk: Implement batch lookup and caching of encoding attrs Alexander Gavrilov 2008-10-10 11:48 ` Paul Mackerras 2008-10-10 12:22 ` Alexander Gavrilov 2008-10-11 0:31 ` Paul Mackerras 2008-10-11 9:28 ` Alexander Gavrilov 2008-10-11 12:03 ` Paul Mackerras 2008-10-10 11:21 ` [PATCH (GITK) v2 3/4] gitk: Support filenames in the locale encoding Paul Mackerras 2008-10-10 11:23 ` Paul Mackerras
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).