git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH (GITK) v2 0/4] Encoding support in GUI
@ 2008-09-30 11:00 Alexander Gavrilov
  2008-09-30 11:00 ` [PATCH (GITK) v2 1/4] gitk: Port new encoding logic from git-gui Alexander Gavrilov
  0 siblings, 1 reply; 12+ messages in thread
From: Alexander Gavrilov @ 2008-09-30 11:00 UTC (permalink / raw)
  To: git; +Cc: Paul Mackerras, Johannes Sixt

Currently GUI tools don't provide enough support for
viewing files that contain non-ASCII characters. This set of
patches addresses this issue. The git-gui part of the series
is based on patches that were in the 'pu' branch of the
git-gui repository.

UPDATE: the git-gui part of the series is in master.

File encoding can be specified in the following ways:

1) It defaults to the current locale encoding.
2) It can be overridden by setting the gui.encoding option.
3) It can be further set on per-file basis by specifying
   the 'encoding' attribute in gitattributes.
4) Finally, git-gui allows directly selecting encodings
   through a popup menu.

Since git apparently cannot work with filenames in non-locale
encodings anyway, I did not try to do anything about it apart
from fixing some obvious bugs.


CHANGES:

- Wrote better comments for the first three patches.
- Added a patch to speed up loading of very large diffs.


GITK:
	gitk: Port new encoding logic from git-gui.
	---
	gitk |   50 +++++++++++++++++++++++++++++++++++++++++++++++---
	1 files changed, 47 insertions(+), 3 deletions(-)

	gitk: Implement file contents encoding support.
	---
	gitk |   19 ++++++++++++++++---
	1 files changed, 16 insertions(+), 3 deletions(-)

	gitk: Support filenames in the locale encoding.
	---
	gitk |   16 +++++++++++-----
	1 files changed, 11 insertions(+), 5 deletions(-)

	gitk: Implement batch lookup and caching of encoding attrs.
	---
	gitk |   35 ++++++++++++++++++++++++++++++++++-
	1 files changed, 34 insertions(+), 1 deletions(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH (GITK) v2 1/4] gitk: Port new encoding logic from git-gui.
  2008-09-30 11:00 [PATCH (GITK) v2 0/4] Encoding support in GUI Alexander Gavrilov
@ 2008-09-30 11:00 ` Alexander Gavrilov
  2008-09-30 11:00   ` [PATCH (GITK) v2 2/4] gitk: Implement file contents encoding support Alexander Gavrilov
  0 siblings, 1 reply; 12+ messages in thread
From: Alexander Gavrilov @ 2008-09-30 11:00 UTC (permalink / raw)
  To: git; +Cc: Paul Mackerras, Johannes Sixt

Recently handling of file encodings in GIT-GUI has
been greatly enhanced. It now follows these rules:

- File encoding defaults to the system encoding.
- It can be overridden by setting the gui.encoding option.
- Finally, the 'encoding' attribute is checked on
  per-file basis; it has the last word.

This patch inserts copies of git-gui functions necessary
for supporting this into gitk. The following patch will
connect them to the rest of the code.

Note: Since git-check-attr does not provide support for
reading attributes from trees, attribute lookup is done
using files from the working directory.

It also extends the range of supported encoding
names, to match changes in git-gui.

Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com>
Tested-by: Johannes Sixt <johannes.sixt@telecom.at>
---
 gitk |   50 +++++++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/gitk b/gitk
index 546627f..b210f79 100755
--- a/gitk
+++ b/gitk
@@ -9726,7 +9726,7 @@ set encoding_aliases {
     { ISO-8859-16 iso-ir-226 ISO_8859-16:2001 ISO_8859-16 latin10 l10 }
     { GBK CP936 MS936 windows-936 }
     { JIS_Encoding csJISEncoding }
-    { Shift_JIS MS_Kanji csShiftJIS }
+    { Shift_JIS MS_Kanji csShiftJIS ShiftJIS Shift-JIS }
     { Extended_UNIX_Code_Packed_Format_for_Japanese csEUCPkdFmtJapanese
       EUC-JP }
     { Extended_UNIX_Code_Fixed_Width_for_Japanese csEUCFixWidJapanese }
@@ -9768,7 +9768,7 @@ proc tcl_encoding {enc} {
     set i [lsearch -exact $lcnames $enc]
     if {$i < 0} {
 	# look for "isonnn" instead of "iso-nnn" or "iso_nnn"
-	if {[regsub {^iso[-_]} $enc iso encx]} {
+	if {[regsub {^(iso|cp|ibm|jis)[-_]} $enc {\1} encx]} {
 	    set i [lsearch -exact $lcnames $encx]
 	}
     }
@@ -9780,7 +9780,7 @@ proc tcl_encoding {enc} {
 	    foreach e $ll {
 		set i [lsearch -exact $lcnames $e]
 		if {$i < 0} {
-		    if {[regsub {^iso[-_]} $e iso ex]} {
+		    if {[regsub {^(iso|cp|ibm|jis)[-_]} $e {\1} ex]} {
 			set i [lsearch -exact $lcnames $ex]
 		    }
 		}
@@ -9795,6 +9795,45 @@ proc tcl_encoding {enc} {
     return {}
 }
 
+proc gitattr {path attr default} {
+	if {[catch {set r [exec git check-attr $attr -- $path]}]} {
+		set r unspecified
+	} else {
+		set r [join [lrange [split $r :] 2 end] :]
+		regsub {^ } $r {} r
+	}
+	if {$r eq {unspecified}} {
+		return $default
+	}
+	return $r
+}
+
+proc get_path_encoding {path} {
+	global gui_encoding
+	set tcl_enc [tcl_encoding $gui_encoding]
+	if {$tcl_enc eq {}} {
+		set tcl_enc [encoding system]
+	}
+	if {$path ne {}} {
+		set enc2 [tcl_encoding [gitattr $path encoding $tcl_enc]]
+		if {$enc2 ne {}} {
+			set tcl_enc $enc2
+		}
+	}
+	return $tcl_enc
+}
+
+proc get_cached_encoding {path} {
+	global path_encoding_cache
+	if {[info exists path_encoding_cache($path)]} {
+		return $path_encoding_cache($path)
+	} else {
+		set enc [get_path_encoding $path]
+		set path_encoding_cache($path) $enc
+		return $enc
+	}
+}
+
 # First check that Tcl/Tk is recent enough
 if {[catch {package require Tk 8.4} err]} {
     show_error {} . [mc "Sorry, gitk cannot run with this version of Tcl/Tk.\n\
@@ -9817,6 +9856,11 @@ if {$tclencoding == {}} {
     puts stderr "Warning: encoding $gitencoding is not supported by Tcl/Tk"
 }
 
+set gui_encoding [encoding system]
+catch {
+	set gui_encoding [exec git config --get gui.encoding]
+}
+
 set mainfont {Helvetica 9}
 set textfont {Courier 9}
 set uifont {Helvetica 9 bold}
-- 
1.6.0.20.g6148bc

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH (GITK) v2 2/4] gitk: Implement file contents encoding support.
  2008-09-30 11:00 ` [PATCH (GITK) v2 1/4] gitk: Port new encoding logic from git-gui Alexander Gavrilov
@ 2008-09-30 11:00   ` Alexander Gavrilov
  2008-09-30 11:00     ` [PATCH (GITK) v2 3/4] gitk: Support filenames in the locale encoding Alexander Gavrilov
  0 siblings, 1 reply; 12+ messages in thread
From: Alexander Gavrilov @ 2008-09-30 11:00 UTC (permalink / raw)
  To: git; +Cc: Paul Mackerras, Johannes Sixt

The previous patch has added functions that implement
per-file encoding selection logic. This one modifies
core gitk code to use them.

Since diffs usually contain multiple files, which may
have different values of the encoding attribute, they
are initially read as binary, and encoding conversion
is applied to individual lines, based on the active
set of diff headers.

Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com>
Tested-by: Johannes Sixt <johannes.sixt@telecom.at>
---
 gitk |   19 ++++++++++++++++---
 1 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/gitk b/gitk
index b210f79..98923e7 100755
--- a/gitk
+++ b/gitk
@@ -6295,7 +6295,7 @@ proc showfile {f} {
 	    return
 	}
     }
-    fconfigure $bf -blocking 0
+    fconfigure $bf -blocking 0 -encoding [get_cached_encoding $f]
     filerun $bf [list getblobline $bf $diffids]
     $ctext config -state normal
     clear_ctext $commentend
@@ -6333,6 +6333,7 @@ proc mergediff {id} {
     global diffids
     global parents
     global diffcontext
+    global diffencoding
     global limitdiffs vfilelimit curview
 
     set diffmergeid $id
@@ -6346,9 +6347,10 @@ proc mergediff {id} {
 	error_popup "[mc "Error getting merge diffs:"] $err"
 	return
     }
-    fconfigure $mdf -blocking 0
+    fconfigure $mdf -blocking 0 -encoding binary
     set mdifffd($id) $mdf
     set np [llength $parents($curview,$id)]
+    set diffencoding [get_cached_encoding {}]
     settabs $np
     filerun $mdf [list getmergediffline $mdf $id $np]
 }
@@ -6356,6 +6358,7 @@ proc mergediff {id} {
 proc getmergediffline {mdf id np} {
     global diffmergeid ctext cflist mergemax
     global difffilestart mdifffd
+    global diffencoding
 
     $ctext conf -state normal
     set nr 0
@@ -6371,14 +6374,17 @@ proc getmergediffline {mdf id np} {
 	    set here [$ctext index "end - 1c"]
 	    lappend difffilestart $here
 	    add_flist [list $fname]
+	    set diffencoding [get_cached_encoding $fname]
 	    set l [expr {(78 - [string length $fname]) / 2}]
 	    set pad [string range "----------------------------------------" 1 $l]
 	    $ctext insert end "$pad $fname $pad\n" filesep
 	} elseif {[regexp {^@@} $line]} {
+	    set line [encoding convertfrom $diffencoding $line]
 	    $ctext insert end "$line\n" hunksep
 	} elseif {[regexp {^[0-9a-f]{40}$} $line] || [regexp {^index} $line]} {
 	    # do nothing
 	} else {
+	    set line [encoding convertfrom $diffencoding $line]
 	    # parse the prefix - one ' ', '-' or '+' for each parent
 	    set spaces {}
 	    set minuses {}
@@ -6586,6 +6592,7 @@ proc getblobdiffs {ids} {
     global diffcontext
     global ignorespace
     global limitdiffs vfilelimit curview
+    global diffencoding
 
     set cmd [diffcmd $ids "-p -C --no-commit-id -U$diffcontext"]
     if {$ignorespace} {
@@ -6599,7 +6606,8 @@ proc getblobdiffs {ids} {
 	return
     }
     set diffinhdr 0
-    fconfigure $bdf -blocking 0
+    set diffencoding [get_cached_encoding {}]
+    fconfigure $bdf -blocking 0 -encoding binary
     set blobdifffd($ids) $bdf
     filerun $bdf [list getblobdiffline $bdf $diffids]
 }
@@ -6633,6 +6641,7 @@ proc getblobdiffline {bdf ids} {
     global diffids blobdifffd ctext curdiffstart
     global diffnexthead diffnextnote difffilestart
     global diffinhdr treediffs
+    global diffencoding
 
     set nr 0
     $ctext conf -state normal
@@ -6670,10 +6679,12 @@ proc getblobdiffline {bdf ids} {
 	    } else {
 		set fname [string range $line 2 [expr {$i - 1}]]
 	    }
+	    set diffencoding [get_cached_encoding $fname]
 	    makediffhdr $fname $ids
 
 	} elseif {[regexp {^@@ -([0-9]+)(,[0-9]+)? \+([0-9]+)(,[0-9]+)? @@(.*)} \
 		       $line match f1l f1c f2l f2c rest]} {
+	    set line [encoding convertfrom $diffencoding $line]
 	    $ctext insert end "$line\n" hunksep
 	    set diffinhdr 0
 
@@ -6693,6 +6704,7 @@ proc getblobdiffline {bdf ids} {
 		if {[string index $fname 0] eq "\""} {
 		    set fname [lindex $fname 0]
 		}
+		set diffencoding [get_cached_encoding $fname]
 		makediffhdr $fname $ids
 	    } elseif {[string compare -length 3 $line "---"] == 0} {
 		# do nothing
@@ -6704,6 +6716,7 @@ proc getblobdiffline {bdf ids} {
 	    $ctext insert end "$line\n" filesep
 
 	} else {
+	    set line [encoding convertfrom $diffencoding $line]
 	    set x [string range $line 0 0]
 	    if {$x == "-" || $x == "+"} {
 		set tag [expr {$x == "+"}]
-- 
1.6.0.20.g6148bc

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH (GITK) v2 3/4] gitk: Support filenames in the locale encoding.
  2008-09-30 11:00   ` [PATCH (GITK) v2 2/4] gitk: Implement file contents encoding support Alexander Gavrilov
@ 2008-09-30 11:00     ` Alexander Gavrilov
  2008-09-30 11:00       ` [PATCH (GITK) v2 4/4] gitk: Implement batch lookup and caching of encoding attrs Alexander Gavrilov
  2008-10-10 11:21       ` [PATCH (GITK) v2 3/4] gitk: Support filenames in the locale encoding Paul Mackerras
  0 siblings, 2 replies; 12+ messages in thread
From: Alexander Gavrilov @ 2008-09-30 11:00 UTC (permalink / raw)
  To: git; +Cc: Paul Mackerras, Johannes Sixt

The previous patch changed the encoding used for
reading diffs to binary, which broke non-ASCII filename
support. This one fixes the breakage, together with some
existing bugs related to filename encoding.

Since core git apparently assumes that filenames are
specified in the locale encoding, this is what we should
use for them in gitk as well.

Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com>
Tested-by: Johannes Sixt <johannes.sixt@telecom.at>
---
 gitk |   16 +++++++++++-----
 1 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/gitk b/gitk
index 98923e7..254faa1 100755
--- a/gitk
+++ b/gitk
@@ -6228,7 +6228,7 @@ proc gettree {id} {
 	    set treepending $id
 	    set treefilelist($id) {}
 	    set treeidlist($id) {}
-	    fconfigure $gtf -blocking 0
+	    fconfigure $gtf -blocking 0 -encoding binary
 	    filerun $gtf [list gettreeline $gtf $id]
 	}
     } else {
@@ -6250,11 +6250,12 @@ proc gettreeline {gtf id} {
 	    set line [string range $line 0 [expr {$i-1}]]
 	    if {$diffids ne $nullid2 && [lindex $line 1] ne "blob"} continue
 	    set sha1 [lindex $line 2]
-	    if {[string index $fname 0] eq "\""} {
-		set fname [lindex $fname 0]
-	    }
 	    lappend treeidlist($id) $sha1
 	}
+	if {[string index $fname 0] eq "\""} {
+	    set fname [lindex $fname 0]
+	}
+	set fname [encoding convertfrom $fname]
 	lappend treefilelist($id) $fname
     }
     if {![eof $gtf]} {
@@ -6370,6 +6371,7 @@ proc getmergediffline {mdf id np} {
 	}
 	if {[regexp {^diff --cc (.*)} $line match fname]} {
 	    # start of a new file
+	    set fname [encoding convertfrom $fname]
 	    $ctext insert end "\n"
 	    set here [$ctext index "end - 1c"]
 	    lappend difffilestart $here
@@ -6519,7 +6521,7 @@ proc gettreediffs {ids} {
 
     set treepending $ids
     set treediff {}
-    fconfigure $gdtf -blocking 0
+    fconfigure $gdtf -blocking 0 -encoding binary
     filerun $gdtf [list gettreediffline $gdtf $ids]
 }
 
@@ -6535,6 +6537,7 @@ proc gettreediffline {gdtf ids} {
 	    if {[string index $file 0] eq "\""} {
 		set file [lindex $file 0]
 	    }
+	    set file [encoding convertfrom $file]
 	    lappend treediff $file
 	}
     }
@@ -6679,6 +6682,7 @@ proc getblobdiffline {bdf ids} {
 	    } else {
 		set fname [string range $line 2 [expr {$i - 1}]]
 	    }
+	    set fname [encoding convertfrom $fname]
 	    set diffencoding [get_cached_encoding $fname]
 	    makediffhdr $fname $ids
 
@@ -6694,6 +6698,7 @@ proc getblobdiffline {bdf ids} {
 		if {[string index $fname 0] eq "\""} {
 		    set fname [lindex $fname 0]
 		}
+		set fname [encoding convertfrom $fname]
 		set i [lsearch -exact $treediffs($ids) $fname]
 		if {$i >= 0} {
 		    setinlist difffilestart $i $curdiffstart
@@ -6704,6 +6709,7 @@ proc getblobdiffline {bdf ids} {
 		if {[string index $fname 0] eq "\""} {
 		    set fname [lindex $fname 0]
 		}
+		set fname [encoding convertfrom $fname]
 		set diffencoding [get_cached_encoding $fname]
 		makediffhdr $fname $ids
 	    } elseif {[string compare -length 3 $line "---"] == 0} {
-- 
1.6.0.20.g6148bc

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH (GITK) v2 4/4] gitk: Implement batch lookup and caching of encoding attrs.
  2008-09-30 11:00     ` [PATCH (GITK) v2 3/4] gitk: Support filenames in the locale encoding Alexander Gavrilov
@ 2008-09-30 11:00       ` Alexander Gavrilov
  2008-10-10 11:48         ` Paul Mackerras
  2008-10-10 11:21       ` [PATCH (GITK) v2 3/4] gitk: Support filenames in the locale encoding Paul Mackerras
  1 sibling, 1 reply; 12+ messages in thread
From: Alexander Gavrilov @ 2008-09-30 11:00 UTC (permalink / raw)
  To: git; +Cc: Paul Mackerras, Johannes Sixt

When the diff contains thousands of files, calling git-check-attr
once per file is very slow. With this patch gitk does attribute
lookup in batches of 30 files while reading the diff file list,
which leads to a very noticeable speedup.

It may be possible to reimplement this even more efficiently,
if git-check-attr is modified to support a --stdin-paths option.
Additionally, it should quote the ':' character in file paths,
or provide a more robust way of column separation.

Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com>
Tested-by: Johannes Sixt <johannes.sixt@telecom.at>
---
 gitk |   35 ++++++++++++++++++++++++++++++++++-
 1 files changed, 34 insertions(+), 1 deletions(-)

diff --git a/gitk b/gitk
index 1355aa2..cf557c3 100755
--- a/gitk
+++ b/gitk
@@ -6530,6 +6530,7 @@ proc gettreediffline {gdtf ids} {
     global cmitmode vfilelimit curview limitdiffs
 
     set nr 0
+    set sublist {}
     while {[incr nr] <= 1000 && [gets $gdtf line] >= 0} {
 	set i [string first "\t" $line]
 	if {$i >= 0} {
@@ -6539,8 +6540,10 @@ proc gettreediffline {gdtf ids} {
 	    }
 	    set file [encoding convertfrom $file]
 	    lappend treediff $file
+	    lappend sublist $file
 	}
     }
+    cache_gitattr encoding $sublist
     if {![eof $gdtf]} {
 	return [expr {$nr >= 1000? 2: 1}]
     }
@@ -9839,18 +9842,48 @@ proc tcl_encoding {enc} {
 }
 
 proc gitattr {path attr default} {
-	if {[catch {set r [exec git check-attr $attr -- $path]}]} {
+	global path_attr_cache
+	if {[info exists path_attr_cache($attr,$path)]} {
+		set r $path_attr_cache($attr,$path)
+	} elseif {[catch {set r [exec git check-attr $attr -- $path]}]} {
 		set r unspecified
 	} else {
 		set r [join [lrange [split $r :] 2 end] :]
 		regsub {^ } $r {} r
 	}
+	set path_attr_cache($attr,$path) $r
 	if {$r eq {unspecified}} {
 		return $default
 	}
 	return $r
 }
 
+proc cache_gitattr {attr pathlist} {
+	global path_attr_cache
+	set newlist {}
+	foreach path $pathlist {
+		if {[info exists path_attr_cache($attr,$path)]} continue
+		lappend newlist $path
+	}
+	while {$newlist ne {}} {
+		set head [lrange $newlist 0 29]
+		set newlist [lrange $newlist 30 end]
+		if {![catch {set rlist [eval exec git check-attr $attr -- $head]}]} {
+			foreach row [split $rlist "\n"] {
+				set cols [split $row :]
+				set path [lindex $cols 0]
+				set value [join [lrange $cols 2 end] :]
+				if {[string index $path 0] eq "\""} {
+					set path [encoding convertfrom [lindex $path 0]]
+				}
+				regsub {^ } $value {} value
+				set path_attr_cache($attr,$path) $value
+			}
+		}
+		update
+	}
+}
+
 proc get_path_encoding {path} {
 	global gui_encoding
 	set tcl_enc [tcl_encoding $gui_encoding]
-- 
1.6.0.20.g6148bc

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH (GITK) v2 3/4] gitk: Support filenames in the locale encoding.
  2008-09-30 11:00     ` [PATCH (GITK) v2 3/4] gitk: Support filenames in the locale encoding Alexander Gavrilov
  2008-09-30 11:00       ` [PATCH (GITK) v2 4/4] gitk: Implement batch lookup and caching of encoding attrs Alexander Gavrilov
@ 2008-10-10 11:21       ` Paul Mackerras
  2008-10-10 11:23         ` Paul Mackerras
  1 sibling, 1 reply; 12+ messages in thread
From: Paul Mackerras @ 2008-10-10 11:21 UTC (permalink / raw)
  To: Alexander Gavrilov; +Cc: git, Johannes Sixt

Alexander Gavrilov writes:

> The previous patch changed the encoding used for
> reading diffs to binary, which broke non-ASCII filename
> support. This one fixes the breakage, together with some
> existing bugs related to filename encoding.

Sounds like this patch should be combined with the previous one.

> @@ -6250,11 +6250,12 @@ proc gettreeline {gtf id} {
>  	    set line [string range $line 0 [expr {$i-1}]]
>  	    if {$diffids ne $nullid2 && [lindex $line 1] ne "blob"} continue
>  	    set sha1 [lindex $line 2]
> -	    if {[string index $fname 0] eq "\""} {
> -		set fname [lindex $fname 0]
> -	    }
>  	    lappend treeidlist($id) $sha1
>  	}
> +	if {[string index $fname 0] eq "\""} {
> +	    set fname [lindex $fname 0]
> +	}

Concerning this part of the change, do we know whether git ls-files
will quote filenames containing special characters or not?

Paul.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH (GITK) v2 3/4] gitk: Support filenames in the locale encoding.
  2008-10-10 11:21       ` [PATCH (GITK) v2 3/4] gitk: Support filenames in the locale encoding Paul Mackerras
@ 2008-10-10 11:23         ` Paul Mackerras
  0 siblings, 0 replies; 12+ messages in thread
From: Paul Mackerras @ 2008-10-10 11:23 UTC (permalink / raw)
  To: Alexander Gavrilov; +Cc: git, Johannes Sixt

I wrote:

> > @@ -6250,11 +6250,12 @@ proc gettreeline {gtf id} {
> >  	    set line [string range $line 0 [expr {$i-1}]]
> >  	    if {$diffids ne $nullid2 && [lindex $line 1] ne "blob"} continue
> >  	    set sha1 [lindex $line 2]
> > -	    if {[string index $fname 0] eq "\""} {
> > -		set fname [lindex $fname 0]
> > -	    }
> >  	    lappend treeidlist($id) $sha1
> >  	}
> > +	if {[string index $fname 0] eq "\""} {
> > +	    set fname [lindex $fname 0]
> > +	}
> 
> Concerning this part of the change, do we know whether git ls-files
> will quote filenames containing special characters or not?

A little experiment has convinced me that it does.  So this part looks
OK.

Paul.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH (GITK) v2 4/4] gitk: Implement batch lookup and caching of encoding attrs.
  2008-09-30 11:00       ` [PATCH (GITK) v2 4/4] gitk: Implement batch lookup and caching of encoding attrs Alexander Gavrilov
@ 2008-10-10 11:48         ` Paul Mackerras
  2008-10-10 12:22           ` Alexander Gavrilov
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Mackerras @ 2008-10-10 11:48 UTC (permalink / raw)
  To: Alexander Gavrilov; +Cc: git, Johannes Sixt

Alexander Gavrilov writes:

> When the diff contains thousands of files, calling git-check-attr
> once per file is very slow. With this patch gitk does attribute
> lookup in batches of 30 files while reading the diff file list,
> which leads to a very noticeable speedup.

Why only 30 at a time?  The logic would be simpler if cache_gitattr
just did all the paths in one call to git check-attr, and it should be
able to cope with 1000 paths in one call, I would think, which is the
most that gettreediffline will give it.

Also, I wonder why we now have two levels of caching of the encoding
attribute.  Your patch 1/4 introduced path_encoding_cache, which was
fine, but now we have path_attr_cache as well, which seems to me to
serve exactly the same function since the encoding is the only
attribute we ever ask about.  Surely we don't need both caches?

Even with this batching, I am a bit concerned that adding the encoding
support might make things noticeably slower for people who don't need
any encoding support (which would be the majority, I think).  We may
end up needing an option to turn off the checking of the encoding
attribute.

Paul.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH (GITK) v2 4/4] gitk: Implement batch lookup and caching of encoding attrs.
  2008-10-10 11:48         ` Paul Mackerras
@ 2008-10-10 12:22           ` Alexander Gavrilov
  2008-10-11  0:31             ` Paul Mackerras
  0 siblings, 1 reply; 12+ messages in thread
From: Alexander Gavrilov @ 2008-10-10 12:22 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: git, Johannes Sixt

On Fri, Oct 10, 2008 at 3:48 PM, Paul Mackerras <paulus@samba.org> wrote:
> Alexander Gavrilov writes:
>
>> When the diff contains thousands of files, calling git-check-attr
>> once per file is very slow. With this patch gitk does attribute
>> lookup in batches of 30 files while reading the diff file list,
>> which leads to a very noticeable speedup.
>
> Why only 30 at a time?  The logic would be simpler if cache_gitattr
> just did all the paths in one call to git check-attr, and it should be
> able to cope with 1000 paths in one call, I would think, which is the
> most that gettreediffline will give it.

OS-enforced command-line size limit on Windows is 32K. Cramming in
1000 paths would leave only 32 characters for each path.

> Also, I wonder why we now have two levels of caching of the encoding
> attribute.  Your patch 1/4 introduced path_encoding_cache, which was
> fine, but now we have path_attr_cache as well, which seems to me to
> serve exactly the same function since the encoding is the only
> attribute we ever ask about.  Surely we don't need both caches?

If the (git-gui) patch that reimplements the tcl_encoding procedure is
applied, we may drop the path_encoding_cache. Current implementation
is too slow for batch lookup, especially if the encoding is actually
not supported, and without the cache the lookup would be done on every
loading of a diff.

> Even with this batching, I am a bit concerned that adding the encoding
> support might make things noticeably slower for people who don't need
> any encoding support (which would be the majority, I think).  We may
> end up needing an option to turn off the checking of the encoding
> attribute.

I hope that most diffs don't contain thousands of files at once. And
actual huge diffs are likely to be relatively slow to load anyway.

Alexander

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH (GITK) v2 4/4] gitk: Implement batch lookup and caching of encoding attrs.
  2008-10-10 12:22           ` Alexander Gavrilov
@ 2008-10-11  0:31             ` Paul Mackerras
  2008-10-11  9:28               ` Alexander Gavrilov
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Mackerras @ 2008-10-11  0:31 UTC (permalink / raw)
  To: Alexander Gavrilov; +Cc: git, Johannes Sixt

Alexander Gavrilov writes:

> OS-enforced command-line size limit on Windows is 32K. Cramming in
> 1000 paths would leave only 32 characters for each path.

Eeek.  OK.

> > Also, I wonder why we now have two levels of caching of the encoding
> > attribute.  Your patch 1/4 introduced path_encoding_cache, which was
> > fine, but now we have path_attr_cache as well, which seems to me to
> > serve exactly the same function since the encoding is the only
> > attribute we ever ask about.  Surely we don't need both caches?
> 
> If the (git-gui) patch that reimplements the tcl_encoding procedure is
> applied, we may drop the path_encoding_cache. Current implementation
> is too slow for batch lookup, especially if the encoding is actually
> not supported, and without the cache the lookup would be done on every
> loading of a diff.

I was thinking more in terms of dropping the path_attr_cache actually.

Actually, if [tcl_encoding] is slow, then why is $gui_encoding the
untranslated version, so that we do [tcl_encoding $gui_encoding] on
each call to get_path_encoding?  Why don't we do the tcl_encoding call
once and have $gui_encoding be the result of that?  In fact
$gui_encoding should be the result of this code (from
get_path_encoding):

	set tcl_enc [tcl_encoding $gui_encoding]
	if {$tcl_enc eq {}} {
		set tcl_enc [encoding system]
	}

And if [tcl_encoding] is slow, then it should have a cache.  There's
only likely to be at most 2 or 3 values it gets called for, and it's
a constant function.

At this point, what I think I might do is apply your set of patches
(but with 2/4 and 3/4 folded into a single patch) and then go through
and do another commit that addresses the concerns I've raised.  OK?

Paul.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH (GITK) v2 4/4] gitk: Implement batch lookup and caching of encoding attrs.
  2008-10-11  0:31             ` Paul Mackerras
@ 2008-10-11  9:28               ` Alexander Gavrilov
  2008-10-11 12:03                 ` Paul Mackerras
  0 siblings, 1 reply; 12+ messages in thread
From: Alexander Gavrilov @ 2008-10-11  9:28 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: git, Johannes Sixt

On Saturday 11 October 2008 04:31:13 Paul Mackerras wrote:
> > > Also, I wonder why we now have two levels of caching of the encoding
> > > attribute.  Your patch 1/4 introduced path_encoding_cache, which was
> > > fine, but now we have path_attr_cache as well, which seems to me to
> > > serve exactly the same function since the encoding is the only
> > > attribute we ever ask about.  Surely we don't need both caches?
> > 
> > If the (git-gui) patch that reimplements the tcl_encoding procedure is
> > applied, we may drop the path_encoding_cache. Current implementation
> > is too slow for batch lookup, especially if the encoding is actually
> > not supported, and without the cache the lookup would be done on every
> > loading of a diff.
> 
> I was thinking more in terms of dropping the path_attr_cache actually.

Since gitattr is a general-purpose function that can read any attribute,
I decided that it should use its own cache.

Basically, all this double-caching issue is fallout from my failure to
anticipate the need of batch attribute lookup from the beginning...

> Actually, if [tcl_encoding] is slow, then why is $gui_encoding the
> untranslated version, so that we do [tcl_encoding $gui_encoding] on
> each call to get_path_encoding?  Why don't we do the tcl_encoding call
> once and have $gui_encoding be the result of that?  In fact
> $gui_encoding should be the result of this code (from
> get_path_encoding):
> 
> 	set tcl_enc [tcl_encoding $gui_encoding]
> 	if {$tcl_enc eq {}} {
> 		set tcl_enc [encoding system]
> 	}

Well, that code was copied from git-gui, where it looks like this:

set tcl_enc [tcl_encoding [get_config gui.encoding]]

I.e. there is no "$gui_encoding" variable, although get_config does use a cache.

> And if [tcl_encoding] is slow, then it should have a cache.  There's
> only likely to be at most 2 or 3 values it gets called for, and it's
> a constant function.

In git-gui the slowdown appeared during the construction of the menu
listing all available encodings, so a simple cache would not have helped. 
I  reimplemented it using a lookup table to resolve aliases (constructed
on the first run). But it can be thought of as a precalculated cache.

> At this point, what I think I might do is apply your set of patches
> (but with 2/4 and 3/4 folded into a single patch) and then go through
> and do another commit that addresses the concerns I've raised.  OK?

Maybe I should resend the patches, scrapping path_encoding_cache,
and adding the optimized version of tcl_encoding?

Alexander

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH (GITK) v2 4/4] gitk: Implement batch lookup and caching of encoding attrs.
  2008-10-11  9:28               ` Alexander Gavrilov
@ 2008-10-11 12:03                 ` Paul Mackerras
  0 siblings, 0 replies; 12+ messages in thread
From: Paul Mackerras @ 2008-10-11 12:03 UTC (permalink / raw)
  To: Alexander Gavrilov; +Cc: git, Johannes Sixt

Alexander Gavrilov writes:

> > And if [tcl_encoding] is slow, then it should have a cache.  There's
> > only likely to be at most 2 or 3 values it gets called for, and it's
> > a constant function.
> 
> In git-gui the slowdown appeared during the construction of the menu
> listing all available encodings, so a simple cache would not have helped. 
> I  reimplemented it using a lookup table to resolve aliases (constructed
> on the first run). But it can be thought of as a precalculated cache.

Hmmm, one that uses more time and memory than it needs to for gitk's
use...  I guess it's not a lot, but it still seems unnecessary, unless
you can see a need for a menu of encodings in gitk.

> > At this point, what I think I might do is apply your set of patches
> > (but with 2/4 and 3/4 folded into a single patch) and then go through
> > and do another commit that addresses the concerns I've raised.  OK?
> 
> Maybe I should resend the patches, scrapping path_encoding_cache,
> and adding the optimized version of tcl_encoding?

OK.

Paul.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-10-11 12:04 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-30 11:00 [PATCH (GITK) v2 0/4] Encoding support in GUI Alexander Gavrilov
2008-09-30 11:00 ` [PATCH (GITK) v2 1/4] gitk: Port new encoding logic from git-gui Alexander Gavrilov
2008-09-30 11:00   ` [PATCH (GITK) v2 2/4] gitk: Implement file contents encoding support Alexander Gavrilov
2008-09-30 11:00     ` [PATCH (GITK) v2 3/4] gitk: Support filenames in the locale encoding Alexander Gavrilov
2008-09-30 11:00       ` [PATCH (GITK) v2 4/4] gitk: Implement batch lookup and caching of encoding attrs Alexander Gavrilov
2008-10-10 11:48         ` Paul Mackerras
2008-10-10 12:22           ` Alexander Gavrilov
2008-10-11  0:31             ` Paul Mackerras
2008-10-11  9:28               ` Alexander Gavrilov
2008-10-11 12:03                 ` Paul Mackerras
2008-10-10 11:21       ` [PATCH (GITK) v2 3/4] gitk: Support filenames in the locale encoding Paul Mackerras
2008-10-10 11:23         ` Paul Mackerras

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).