git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH (GIT-GUI,GITK) 0/8] Encoding support in GUI
@ 2008-09-17 21:07 Alexander Gavrilov
  2008-09-17 21:07 ` [PATCH (GIT-GUI,GITK) 1/8] git-gui: Cleanup handling of the default encoding Alexander Gavrilov
                   ` (2 more replies)
  0 siblings, 3 replies; 44+ messages in thread
From: Alexander Gavrilov @ 2008-09-17 21:07 UTC (permalink / raw)
  To: git; +Cc: Shawn O. Pearce, Paul Mackerras


Currently GUI tools don't provide any support for
viewing files that contain non-ASCII characters. This set of
patches addresses that issue. The git-gui part of the series
is based on patches that are currently in the 'pu' branch
of the git-gui repository.


File encoding can be specified in the following ways:

1) It defaults to the current locale encoding.
2) It can be overridden by setting the gui.encoding option.
3) It can be further set on per-file basis by specifying
   the 'encoding' attribute in gitattributes.
4) Finally, git-gui allows directly selecting encodings
   through a popup menu.


The last item is a necessity to compensate for the fact that
now staging/unstaging hunks would not work, if the selected
encoding is incompatible with the actual data:

While most single-byte character sets are mutually compatible,
all multibyte encodings, including utf-8, have the concept
of invalid byte sequences, and thus are not completely
reversible. Because of this, git-gui won't be able to apply
hunks that contain such sequences, and the user would have to
specify the encoding correctly, or fallback to ISO-8859-1.


Since git apparently cannot work with filenames in non-locale
encodings anyway, I did not try to do anything about it apart
from fixing some obvious bugs.

There are also some bugs in handling of commit encodings in gitk,
but they are out of the scope of this series.



GIT-GUI:
	git-gui: Cleanup handling of the default encoding.
	---
	git-gui.sh       |    1 +
	lib/blame.tcl    |    2 +-
	lib/diff.tcl     |   11 ++++++-----
	lib/encoding.tcl |   14 ++++++++++++++
	lib/option.tcl   |   24 ++++++++++++++++++++++++
	5 files changed, 46 insertions(+), 6 deletions(-)

	git-gui: Add a menu of available encodings.
	---
	lib/encoding.tcl |  133 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
	lib/option.tcl   |   13 +++++-
	2 files changed, 142 insertions(+), 4 deletions(-)

	git-gui: Allow forcing display encoding for diffs using a submenu.
	---
	git-gui.sh       |    8 ++++++++
	lib/diff.tcl     |    9 +++++++++
	lib/encoding.tcl |   29 +++++++++++++++++++++++++++--
	3 files changed, 44 insertions(+), 2 deletions(-)

	git-gui: Optimize encoding name resolution using a lookup table.
	---
	lib/encoding.tcl |   82 +++++++++++++++++++++++++++++++++++-------------------
	1 files changed, 53 insertions(+), 29 deletions(-)

	git-gui: Support the encoding menu in gui blame.
	---
	lib/blame.tcl |   17 +++++++++++++++++
	1 files changed, 17 insertions(+), 0 deletions(-)


GITK:
	gitk: Port new encoding logic from git-gui.
	---
	gitk |   50 +++++++++++++++++++++++++++++++++++++++++++++++---
	1 files changed, 47 insertions(+), 3 deletions(-)

	gitk: Implement file contents encoding support.
	---
	gitk |   19 ++++++++++++++++---
	1 files changed, 16 insertions(+), 3 deletions(-)

	gitk: Support filenames in the locale encoding.
	---
	gitk |   16 +++++++++++-----
	1 files changed, 11 insertions(+), 5 deletions(-)

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH (GIT-GUI,GITK) 1/8] git-gui: Cleanup handling of the default encoding.
  2008-09-17 21:07 [PATCH (GIT-GUI,GITK) 0/8] Encoding support in GUI Alexander Gavrilov
@ 2008-09-17 21:07 ` Alexander Gavrilov
  2008-09-17 21:07   ` [PATCH (GIT-GUI,GITK) 2/8] git-gui: Add a menu of available encodings Alexander Gavrilov
  2008-09-18 15:02   ` [PATCH (GIT-GUI,GITK) 1/8] git-gui: Cleanup handling of the default encoding Dmitry Potapov
  2008-09-17 21:45 ` [PATCH (GIT-GUI,GITK) 0/8] Encoding support in GUI Paul Mackerras
  2008-09-21 22:55 ` Paul Mackerras
  2 siblings, 2 replies; 44+ messages in thread
From: Alexander Gavrilov @ 2008-09-17 21:07 UTC (permalink / raw)
  To: git; +Cc: Shawn O. Pearce, Paul Mackerras

- Make diffs and blame default to the system (locale)
  encoding instead of hard-coding UTF-8.
- Add a gui.encoding option to allow overriding it.
- gitattributes still have the final word.

The rationale for this is Windows support:

1) Windows people are accustomed to using legacy encodings
   for text files. For many of them defaulting to utf-8
   will be counter-intuitive.
2) Windows doesn't support utf-8 locales, and switching
   the system encoding is a real pain. Thus the option.

This patch also adds proper encoding conversion to Apply Hunk/Line.

Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com>
---
 git-gui.sh       |    1 +
 lib/blame.tcl    |    2 +-
 lib/diff.tcl     |   11 ++++++-----
 lib/encoding.tcl |   14 ++++++++++++++
 lib/option.tcl   |   24 ++++++++++++++++++++++++
 5 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/git-gui.sh b/git-gui.sh
index 91457a2..444990b 100755
--- a/git-gui.sh
+++ b/git-gui.sh
@@ -678,6 +678,7 @@ set default_config(merge.verbosity) 2
 set default_config(user.name) {}
 set default_config(user.email) {}
 
+set default_config(gui.encoding) [encoding system]
 set default_config(gui.matchtrackingbranch) false
 set default_config(gui.pruneduringfetch) false
 set default_config(gui.trustmtime) false
diff --git a/lib/blame.tcl b/lib/blame.tcl
index 7535adb..84d55b5 100644
--- a/lib/blame.tcl
+++ b/lib/blame.tcl
@@ -402,7 +402,7 @@ method _load {jump} {
 	fconfigure $fd \
 		-blocking 0 \
 		-translation lf \
-		-encoding [tcl_encoding [gitattr $path encoding UTF-8]]
+		-encoding [get_path_encoding $path]
 	fileevent $fd readable [cb _read_file $fd $jump]
 	set current_fd $fd
 }
diff --git a/lib/diff.tcl b/lib/diff.tcl
index b0ecfbc..8fefc5d 100644
--- a/lib/diff.tcl
+++ b/lib/diff.tcl
@@ -164,11 +164,10 @@ proc show_other_diff {path w m scroll_pos} {
 					set sz [string length $content]
 				}
 				file {
-					set enc [gitattr $path encoding UTF-8]
 					set fd [open $path r]
 					fconfigure $fd \
 						-eofchar {} \
-						-encoding [tcl_encoding $enc]
+						-encoding [get_path_encoding $path]
 					set content [read $fd $max_sz]
 					close $fd
 					set sz [file size $path]
@@ -282,7 +281,7 @@ proc start_show_diff {scroll_pos {add_opts {}}} {
 	set ::current_diff_inheader 1
 	fconfigure $fd \
 		-blocking 0 \
-		-encoding [tcl_encoding [gitattr $path encoding UTF-8]] \
+		-encoding [get_path_encoding $path] \
 		-translation lf
 	fileevent $fd readable [list read_diff $fd $scroll_pos]
 }
@@ -435,8 +434,9 @@ proc apply_hunk {x y} {
 	}
 
 	if {[catch {
+		set enc [get_path_encoding $current_diff_path]
 		set p [eval git_write $apply_cmd]
-		fconfigure $p -translation binary -encoding binary
+		fconfigure $p -translation binary -encoding $enc
 		puts -nonewline $p $current_diff_header
 		puts -nonewline $p [$ui_diff get $s_lno $e_lno]
 		close $p} err]} {
@@ -604,8 +604,9 @@ proc apply_line {x y} {
 	set patch "@@ -$hln,$n +$hln,[eval expr $n $sign 1] @@\n$patch"
 
 	if {[catch {
+		set enc [get_path_encoding $current_diff_path]
 		set p [eval git_write $apply_cmd]
-		fconfigure $p -translation binary -encoding binary
+		fconfigure $p -translation binary -encoding $enc
 		puts -nonewline $p $current_diff_header
 		puts -nonewline $p $patch
 		close $p} err]} {
diff --git a/lib/encoding.tcl b/lib/encoding.tcl
index 7f06b0d..e186b0c 100644
--- a/lib/encoding.tcl
+++ b/lib/encoding.tcl
@@ -274,3 +274,17 @@ proc tcl_encoding {enc} {
     }
     return {}
 }
+
+proc get_path_encoding {path} {
+	set tcl_enc [tcl_encoding [get_config gui.encoding]]
+	if {$tcl_enc eq {}} {
+		set tcl_enc [encoding system]
+	}
+	if {$path ne {}} {
+		set enc2 [tcl_encoding [gitattr $path encoding $tcl_enc]]
+		if {$enc2 ne {}} {
+			set tcl_enc $enc2
+		}
+	}
+	return $tcl_enc
+}
diff --git a/lib/option.tcl b/lib/option.tcl
index 9b865f6..40af44e 100644
--- a/lib/option.tcl
+++ b/lib/option.tcl
@@ -1,6 +1,28 @@
 # git-gui options editor
 # Copyright (C) 2006, 2007 Shawn Pearce
 
+proc config_check_encodings {} {
+	global repo_config_new global_config_new
+
+	set enc $global_config_new(gui.encoding)
+	if {$enc eq {}} {
+		set global_config_new(gui.encoding) [encoding system]
+	} elseif {[tcl_encoding $enc] eq {}} {
+		error_popup [mc "Invalid global encoding '%s'" $enc]
+		return 0
+	}
+
+	set enc $repo_config_new(gui.encoding)
+	if {$enc eq {}} {
+		set repo_config_new(gui.encoding) [encoding system]
+	} elseif {[tcl_encoding $enc] eq {}} {
+		error_popup [mc "Invalid repo encoding '%s'" $enc]
+		return 0
+	}
+
+	return 1
+}
+
 proc save_config {} {
 	global default_config font_descs
 	global repo_config global_config
@@ -130,6 +152,7 @@ proc do_options {} {
 		{i-1..99 gui.diffcontext {mc "Number of Diff Context Lines"}}
 		{i-0..99 gui.commitmsgwidth {mc "Commit Message Text Width"}}
 		{t gui.newbranchtemplate {mc "New Branch Name Template"}}
+		{t gui.encoding {mc "Default File Contents Encoding"}}
 		} {
 		set type [lindex $option 0]
 		set name [lindex $option 1]
@@ -275,6 +298,7 @@ proc do_restore_defaults {} {
 }
 
 proc do_save_config {w} {
+	if {![config_check_encodings]} return
 	if {[catch {save_config} err]} {
 		error_popup [strcat [mc "Failed to completely save options:"] "\n\n$err"]
 	}
-- 
1.6.0.20.g6148bc

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH (GIT-GUI,GITK) 2/8] git-gui: Add a menu of available encodings.
  2008-09-17 21:07 ` [PATCH (GIT-GUI,GITK) 1/8] git-gui: Cleanup handling of the default encoding Alexander Gavrilov
@ 2008-09-17 21:07   ` Alexander Gavrilov
  2008-09-17 21:07     ` [PATCH (GIT-GUI,GITK) 3/8] git-gui: Allow forcing display encoding for diffs using a submenu Alexander Gavrilov
  2008-09-18 15:02   ` [PATCH (GIT-GUI,GITK) 1/8] git-gui: Cleanup handling of the default encoding Dmitry Potapov
  1 sibling, 1 reply; 44+ messages in thread
From: Alexander Gavrilov @ 2008-09-17 21:07 UTC (permalink / raw)
  To: git; +Cc: Shawn O. Pearce, Paul Mackerras

To make encoding selection easier, add a menu that
lists available encodings to the Options window.

Menu structure is borrowed from Firefox.

Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com>
---
 lib/encoding.tcl |  133 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 lib/option.tcl   |   13 +++++-
 2 files changed, 142 insertions(+), 4 deletions(-)

diff --git a/lib/encoding.tcl b/lib/encoding.tcl
index e186b0c..2c1eda3 100644
--- a/lib/encoding.tcl
+++ b/lib/encoding.tcl
@@ -206,7 +206,7 @@ set encoding_aliases {
     { ISO-8859-16 iso-ir-226 ISO_8859-16:2001 ISO_8859-16 latin10 l10 }
     { GBK CP936 MS936 windows-936 }
     { JIS_Encoding csJISEncoding }
-    { Shift_JIS MS_Kanji csShiftJIS }
+    { Shift_JIS MS_Kanji csShiftJIS ShiftJIS Shift-JIS }
     { Extended_UNIX_Code_Packed_Format_for_Japanese csEUCPkdFmtJapanese
       EUC-JP }
     { Extended_UNIX_Code_Fixed_Width_for_Japanese csEUCFixWidJapanese }
@@ -240,6 +240,52 @@ set encoding_aliases {
     { Big5 csBig5 }
 }
 
+set encoding_groups {
+    {"" ""
+	{"Unicode" UTF-8}
+	{"Western" ISO-8859-1}}
+    {we "West European"
+	{"Western" ISO-8859-15 CP-437 CP-850 MacRoman CP-1252 Windows-1252}
+	{"Celtic" ISO-8859-14}
+	{"Greek" ISO-8859-14 ISO-8859-7 CP-737 CP-869 MacGreek CP-1253 Windows-1253}
+	{"Icelandic" MacIceland MacIcelandic CP-861}
+	{"Nordic" ISO-8859-10 CP-865}
+	{"Portuguese" CP-860}
+	{"South European" ISO-8859-3}}
+    {ee "East European"
+	{"Baltic" CP-775 ISO-8859-4 ISO-8859-13 CP-1257 Windows-1257}
+	{"Central European" CP-852 ISO-8859-2 MacCE CP-1250 Windows-1250}
+	{"Croatian" MacCroatian}
+	{"Cyrillic" CP-855 ISO-8859-5 ISO-IR-111 KOI8-R MacCyrillic CP-1251 Windows-1251}
+	{"Russian" CP-866}
+	{"Ukrainian" KOI8-U MacUkraine MacUkrainian}
+	{"Romanian" ISO-8859-16 MacRomania MacRomanian}}
+    {ea "East Asian"
+	{"Generic" ISO-2022}
+	{"Chinese Simplified" GB2312 GB1988 GB12345 GB2312-RAW GBK EUC-CN GB18030 HZ ISO-2022-CN}
+	{"Chinese Traditional" Big5 Big5-HKSCS EUC-TW CP-950}
+	{"Japanese" EUC-JP ISO-2022-JP Shift-JIS JIS-0212 JIS-0208 JIS-0201 CP-932 MacJapan}
+	{"Korean" EUC-KR UHC JOHAB ISO-2022-KR CP-949 KSC5601}}
+    {sa "SE & SW Asian"
+	{"Armenian" ARMSCII-8}
+	{"Georgian" GEOSTD8}
+	{"Thai" TIS-620 ISO-8859-11 CP-874 Windows-874 MacThai}
+	{"Turkish" CP-857 CP857 ISO-8859-9 MacTurkish CP-1254 Windows-1254}
+	{"Vietnamese" TCVN VISCII VPS CP-1258 Windows-1258}
+	{"Hindi" MacDevanagari}
+	{"Gujarati" MacGujarati}
+	{"Gurmukhi" MacGurmukhi}}
+    {me "Middle Eastern"
+	{"Arabic" ISO-8859-6 Windows-1256 CP-1256 CP-864 MacArabic}
+	{"Farsi" MacFarsi}
+	{"Hebrew" ISO-8859-8-I Windows-1255 CP-1255 ISO-8859-8 CP-862 MacHebrew}}
+    {mi "Misc"
+	{"7-bit" ASCII}
+	{"16-bit" Unicode}
+	{"Legacy" CP-863 EBCDIC}
+	{"Symbol" Symbol Dingbats MacDingbats MacCentEuro}}
+}
+
 proc tcl_encoding {enc} {
     global encoding_aliases
     set names [encoding names]
@@ -248,7 +294,7 @@ proc tcl_encoding {enc} {
     set i [lsearch -exact $lcnames $enc]
     if {$i < 0} {
 	# look for "isonnn" instead of "iso-nnn" or "iso_nnn"
-	if {[regsub {^iso[-_]} $enc iso encx]} {
+	if {[regsub {^(iso|cp|ibm|jis)[-_]} $enc {\1} encx]} {
 	    set i [lsearch -exact $lcnames $encx]
 	}
     }
@@ -260,7 +306,7 @@ proc tcl_encoding {enc} {
 	    foreach e $ll {
 		set i [lsearch -exact $lcnames $e]
 		if {$i < 0} {
-		    if {[regsub {^iso[-_]} $e iso ex]} {
+		    if {[regsub {^(iso|cp|ibm|jis)[-_]} $e {\1} ex]} {
 			set i [lsearch -exact $lcnames $ex]
 		    }
 		}
@@ -288,3 +334,84 @@ proc get_path_encoding {path} {
 	}
 	return $tcl_enc
 }
+
+proc build_encoding_submenu {parent grp cmd} {
+	global used_encodings
+
+	set mid [lindex $grp 0]
+	set gname [mc [lindex $grp 1]]
+
+	set smenu {}
+	foreach subset [lrange $grp 2 end] {
+		set name [mc [lindex $subset 0]]
+
+		foreach enc [lrange $subset 1 end] {
+			set tcl_enc [tcl_encoding $enc]
+			if {$tcl_enc eq {}} continue
+
+			if {$smenu eq {}} {
+				if {$mid eq {}} {
+					set smenu $parent
+				} else {
+					set smenu "$parent.$mid"
+					menu $smenu
+					$parent add cascade \
+						-label $gname \
+						-menu $smenu
+				}
+			}
+
+			if {$name ne {}} {
+				set lbl "$name ($enc)"
+			} else {
+				set lbl $enc
+			}
+			$smenu add command \
+				-label $lbl \
+				-command [concat $cmd [list $tcl_enc]]
+
+			lappend used_encodings $tcl_enc
+		}
+	}
+}
+
+proc popup_btn_menu {m b} {
+	tk_popup $m [winfo pointerx $b] [winfo pointery $b]
+}
+
+proc build_encoding_menu {emenu cmd {nodef 0}} {
+	$emenu configure -postcommand \
+		[list do_build_encoding_menu $emenu $cmd $nodef]
+}
+
+proc do_build_encoding_menu {emenu cmd {nodef 0}} {
+	global used_encodings encoding_groups
+
+	$emenu configure -postcommand {}
+
+	if {!$nodef} {
+		$emenu add command \
+			-label [mc "Default"] \
+			-command [concat $cmd [list {}]]
+	}
+	set sysenc [encoding system]
+	$emenu add command \
+		-label [mc "System (%s)" $sysenc] \
+		-command [concat $cmd [list $sysenc]]
+
+	# Main encoding tree
+	set used_encodings [list identity]
+	$emenu add separator
+	foreach grp $encoding_groups {
+		build_encoding_submenu $emenu $grp $cmd
+	}
+
+	# Add unclassified encodings
+	set unused_grp [list [mc Other]]
+	foreach enc [encoding names] {
+		if {[lsearch -exact $used_encodings $enc] < 0} {
+			lappend unused_grp $enc
+		}
+	}
+	build_encoding_submenu $emenu [list other [mc Other] $unused_grp] $cmd
+}
diff --git a/lib/option.tcl b/lib/option.tcl
index 40af44e..c80c939 100644
--- a/lib/option.tcl
+++ b/lib/option.tcl
@@ -152,7 +152,7 @@ proc do_options {} {
 		{i-1..99 gui.diffcontext {mc "Number of Diff Context Lines"}}
 		{i-0..99 gui.commitmsgwidth {mc "Commit Message Text Width"}}
 		{t gui.newbranchtemplate {mc "New Branch Name Template"}}
-		{t gui.encoding {mc "Default File Contents Encoding"}}
+		{c gui.encoding {mc "Default File Contents Encoding"}}
 		} {
 		set type [lindex $option 0]
 		set name [lindex $option 1]
@@ -182,6 +182,7 @@ proc do_options {} {
 				pack $w.$f.$optid.v -side right -anchor e -padx 5
 				pack $w.$f.$optid -side top -anchor w -fill x
 			}
+			c -
 			t {
 				frame $w.$f.$optid
 				label $w.$f.$optid.l -text "$text:"
@@ -194,6 +195,16 @@ proc do_options {} {
 				pack $w.$f.$optid.v -side left -anchor w \
 					-fill x -expand 1 \
 					-padx 5
+				if {$type eq {c}} {
+					menu $w.$f.$optid.m
+					build_encoding_menu $w.$f.$optid.m \
+						[list set ${f}_config_new($name)] 1
+					button $w.$f.$optid.b \
+						-text [mc "Change"] \
+						-command [list popup_btn_menu \
+							$w.$f.$optid.m $w.$f.$optid.b]
+					pack $w.$f.$optid.b -side left -anchor w
+				}
 				pack $w.$f.$optid -side top -anchor w -fill x
 			}
 			}
-- 
1.6.0.20.g6148bc

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH (GIT-GUI,GITK) 3/8] git-gui: Allow forcing display encoding for diffs using a submenu.
  2008-09-17 21:07   ` [PATCH (GIT-GUI,GITK) 2/8] git-gui: Add a menu of available encodings Alexander Gavrilov
@ 2008-09-17 21:07     ` Alexander Gavrilov
  2008-09-17 21:07       ` [PATCH (GIT-GUI,GITK) 4/8] git-gui: Optimize encoding name resolution using a lookup table Alexander Gavrilov
  0 siblings, 1 reply; 44+ messages in thread
From: Alexander Gavrilov @ 2008-09-17 21:07 UTC (permalink / raw)
  To: git; +Cc: Shawn O. Pearce, Paul Mackerras

Add a submenu to allow dynamically changing the encoding to use
for diffs. Encoding settings are remembered while git-gui runs.
The rules are:

1) Encoding set for a specific file overrides gitattributes.
2) Last explicitly set value of the encoding overrides gui.encoding

Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com>
---
 git-gui.sh       |    8 ++++++++
 lib/diff.tcl     |    9 +++++++++
 lib/encoding.tcl |   29 +++++++++++++++++++++++++++--
 3 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/git-gui.sh b/git-gui.sh
index 444990b..3bbb4f1 100755
--- a/git-gui.sh
+++ b/git-gui.sh
@@ -2937,6 +2937,14 @@ proc create_common_diff_popup {ctxm} {
 		-command {incr_font_size font_diff 1}
 	lappend diff_actions [list $ctxm entryconf [$ctxm index last] -state]
 	$ctxm add separator
+	set emenu $ctxm.enc
+	menu $emenu
+	build_encoding_menu $emenu [list force_diff_encoding]
+	$ctxm add cascade \
+		-label [mc "Encoding"] \
+		-menu $emenu
+	lappend diff_actions [list $ctxm entryconf [$ctxm index last] -state]
+	$ctxm add separator
 	$ctxm add command -label [mc "Options..."] \
 		-command do_options
 }
diff --git a/lib/diff.tcl b/lib/diff.tcl
index 8fefc5d..b616296 100644
--- a/lib/diff.tcl
+++ b/lib/diff.tcl
@@ -40,6 +40,15 @@ proc reshow_diff {} {
 	}
 }
 
+proc force_diff_encoding {enc} {
+	global current_diff_path
+	
+	if {$current_diff_path ne {}} {
+		force_path_encoding $current_diff_path $enc
+		reshow_diff
+	}
+}
+
 proc handle_empty_diff {} {
 	global current_diff_path file_states file_lists
 
diff --git a/lib/encoding.tcl b/lib/encoding.tcl
index 2c1eda3..b2ee38c 100644
--- a/lib/encoding.tcl
+++ b/lib/encoding.tcl
@@ -321,13 +321,38 @@ proc tcl_encoding {enc} {
     return {}
 }
 
+proc force_path_encoding {path enc} {
+	global path_encoding_overrides last_encoding_override
+
+	set enc [tcl_encoding $enc]
+	if {$enc eq {}} {
+		catch { unset last_encoding_override }
+		catch { unset path_encoding_overrides($path) }
+	} else {
+		set last_encoding_override $enc
+		if {$path ne {}} {
+			set path_encoding_overrides($path) $enc
+		}
+	}
+}
+
 proc get_path_encoding {path} {
-	set tcl_enc [tcl_encoding [get_config gui.encoding]]
+	global path_encoding_overrides last_encoding_override
+
+	if {[info exists last_encoding_override]} {
+		set tcl_enc $last_encoding_override
+	} else {
+		set tcl_enc [tcl_encoding [get_config gui.encoding]]
+	}
 	if {$tcl_enc eq {}} {
 		set tcl_enc [encoding system]
 	}
 	if {$path ne {}} {
-		set enc2 [tcl_encoding [gitattr $path encoding $tcl_enc]]
+		if {[info exists path_encoding_overrides($path)]} {
+			set enc2 $path_encoding_overrides($path)
+		} else {
+			set enc2 [tcl_encoding [gitattr $path encoding $tcl_enc]]
+		}
 		if {$enc2 ne {}} {
 			set tcl_enc $enc2
 		}
-- 
1.6.0.20.g6148bc

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH (GIT-GUI,GITK) 4/8] git-gui: Optimize encoding name resolution using a lookup table.
  2008-09-17 21:07     ` [PATCH (GIT-GUI,GITK) 3/8] git-gui: Allow forcing display encoding for diffs using a submenu Alexander Gavrilov
@ 2008-09-17 21:07       ` Alexander Gavrilov
  2008-09-17 21:07         ` [PATCH (GIT-GUI,GITK) 5/8] git-gui: Support the encoding menu in gui blame Alexander Gavrilov
  0 siblings, 1 reply; 44+ messages in thread
From: Alexander Gavrilov @ 2008-09-17 21:07 UTC (permalink / raw)
  To: git; +Cc: Shawn O. Pearce, Paul Mackerras

Encoding menu construction does almost a hundred of encoding
resolutions, which with the old implementation led to a
small but noticeable delay.

Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com>
---
 lib/encoding.tcl |   82 +++++++++++++++++++++++++++++++++++-------------------
 1 files changed, 53 insertions(+), 29 deletions(-)

diff --git a/lib/encoding.tcl b/lib/encoding.tcl
index b2ee38c..32668fc 100644
--- a/lib/encoding.tcl
+++ b/lib/encoding.tcl
@@ -286,39 +286,63 @@ set encoding_groups {
 	{"Symbol" Symbol Dingbats MacDingbats MacCentEuro}}
 }
 
+proc build_encoding_table {} {
+	global encoding_aliases encoding_lookup_table
+
+	# Prepare the lookup list; cannot use lsort -nocase because
+	# of compatibility issues with older Tcl (e.g. in msysgit)
+	set names [list]
+	foreach item [encoding names] {
+		lappend names [list [string tolower $item] $item]
+	}
+	set names [lsort -ascii -index 0 $names]
+	# neither can we use lsearch -index
+	set lnames [list]
+	foreach item $names {
+		lappend lnames [lindex $item 0]
+	}
+
+	foreach grp $encoding_aliases {
+		set target {}
+		foreach item $grp {
+			set i [lsearch -sorted -ascii $lnames \
+					[string tolower $item]]
+			if {$i >= 0} {
+				set target [lindex $names $i 1]
+				break
+			}
+		}
+		if {$target eq {}} continue
+		foreach item $grp {
+			set encoding_lookup_table([string tolower $item]) $target
+		}
+	}
+
+	foreach item $names {
+		set encoding_lookup_table([lindex $item 0]) [lindex $item 1]
+	}
+}
+
 proc tcl_encoding {enc} {
-    global encoding_aliases
-    set names [encoding names]
-    set lcnames [string tolower $names]
-    set enc [string tolower $enc]
-    set i [lsearch -exact $lcnames $enc]
-    if {$i < 0} {
-	# look for "isonnn" instead of "iso-nnn" or "iso_nnn"
-	if {[regsub {^(iso|cp|ibm|jis)[-_]} $enc {\1} encx]} {
-	    set i [lsearch -exact $lcnames $encx]
+	global encoding_lookup_table
+	if {$enc eq {}} {
+		return {}
+	}
+	if {![info exists encoding_lookup_table]} {
+		build_encoding_table
 	}
-    }
-    if {$i < 0} {
-	foreach l $encoding_aliases {
-	    set ll [string tolower $l]
-	    if {[lsearch -exact $ll $enc] < 0} continue
-	    # look through the aliases for one that tcl knows about
-	    foreach e $ll {
-		set i [lsearch -exact $lcnames $e]
-		if {$i < 0} {
-		    if {[regsub {^(iso|cp|ibm|jis)[-_]} $e {\1} ex]} {
-			set i [lsearch -exact $lcnames $ex]
-		    }
+	set enc [string tolower $enc]
+	if {![info exists encoding_lookup_table($enc)]} {
+		# look for "isonnn" instead of "iso-nnn" or "iso_nnn"
+		if {[regsub {^(iso|cp|ibm|jis)[-_]} $enc {\1} encx]} {
+			set enc $encx
 		}
-		if {$i >= 0} break
-	    }
-	    break
 	}
-    }
-    if {$i >= 0} {
-	return [lindex $names $i]
-    }
-    return {}
+	if {[info exists encoding_lookup_table($enc)]} {
+		return $encoding_lookup_table($enc)
+	} else {
+		return {}
+	}
 }
 
 proc force_path_encoding {path enc} {
-- 
1.6.0.20.g6148bc

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH (GIT-GUI,GITK) 5/8] git-gui: Support the encoding menu in gui blame.
  2008-09-17 21:07       ` [PATCH (GIT-GUI,GITK) 4/8] git-gui: Optimize encoding name resolution using a lookup table Alexander Gavrilov
@ 2008-09-17 21:07         ` Alexander Gavrilov
  2008-09-17 21:07           ` [PATCH (GIT-GUI,GITK) 6/8] gitk: Port new encoding logic from git-gui Alexander Gavrilov
  0 siblings, 1 reply; 44+ messages in thread
From: Alexander Gavrilov @ 2008-09-17 21:07 UTC (permalink / raw)
  To: git; +Cc: Shawn O. Pearce, Paul Mackerras

Allow dynamically changing the encoding from the blame
viewer as well.

Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com>
---
 lib/blame.tcl |   17 +++++++++++++++++
 1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/lib/blame.tcl b/lib/blame.tcl
index 84d55b5..eb61374 100644
--- a/lib/blame.tcl
+++ b/lib/blame.tcl
@@ -256,9 +256,16 @@ constructor new {i_commit i_path i_jump} {
 	$w.ctxm add command \
 		-label [mc "Copy Commit"] \
 		-command [cb _copycommit]
+	$w.ctxm add separator
+	menu $w.ctxm.enc
+	build_encoding_menu $w.ctxm.enc [cb _setencoding]
+	$w.ctxm add cascade \
+		-label [mc "Encoding"] \
+		-menu $w.ctxm.enc
 	$w.ctxm add command \
 		-label [mc "Do Full Copy Detection"] \
 		-command [cb _fullcopyblame]
+	$w.ctxm add separator
 	$w.ctxm add command \
 		-label [mc "Show History Context"] \
 		-command [cb _gitkcommit]
@@ -791,6 +798,16 @@ method _click {cur_w pos} {
 	_showcommit $this $cur_w $lno
 }
 
+method _setencoding {enc} {
+	force_path_encoding $path $enc
+	_load $this [list \
+		$highlight_column \
+		$highlight_line \
+		[lindex [$w_file xview] 0] \
+		[lindex [$w_file yview] 0] \
+		]
+}
+
 method _load_commit {cur_w cur_d pos} {
 	upvar #0 $cur_d line_data
 	set lno [lindex [split [$cur_w index $pos] .] 0]
-- 
1.6.0.20.g6148bc

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH (GIT-GUI,GITK) 6/8] gitk: Port new encoding logic from git-gui.
  2008-09-17 21:07         ` [PATCH (GIT-GUI,GITK) 5/8] git-gui: Support the encoding menu in gui blame Alexander Gavrilov
@ 2008-09-17 21:07           ` Alexander Gavrilov
  2008-09-17 21:07             ` [PATCH (GIT-GUI,GITK) 7/8] gitk: Implement file contents encoding support Alexander Gavrilov
  2008-09-19 12:10             ` [PATCH (GIT-GUI,GITK) 6/8] gitk: Port new encoding logic from git-gui Johannes Sixt
  0 siblings, 2 replies; 44+ messages in thread
From: Alexander Gavrilov @ 2008-09-17 21:07 UTC (permalink / raw)
  To: git; +Cc: Shawn O. Pearce, Paul Mackerras

Add functions that implement the same logic for file
contents encoding as git-gui uses:

- Defaults to the system encoding.
- Overridden by setting the gui.encoding option.
- Further overridden on per-file basis by gitattributes.

Also extends the range of supported encoding names.

Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com>
---
 gitk |   50 +++++++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/gitk b/gitk
index 546627f..b210f79 100755
--- a/gitk
+++ b/gitk
@@ -9726,7 +9726,7 @@ set encoding_aliases {
     { ISO-8859-16 iso-ir-226 ISO_8859-16:2001 ISO_8859-16 latin10 l10 }
     { GBK CP936 MS936 windows-936 }
     { JIS_Encoding csJISEncoding }
-    { Shift_JIS MS_Kanji csShiftJIS }
+    { Shift_JIS MS_Kanji csShiftJIS ShiftJIS Shift-JIS }
     { Extended_UNIX_Code_Packed_Format_for_Japanese csEUCPkdFmtJapanese
       EUC-JP }
     { Extended_UNIX_Code_Fixed_Width_for_Japanese csEUCFixWidJapanese }
@@ -9768,7 +9768,7 @@ proc tcl_encoding {enc} {
     set i [lsearch -exact $lcnames $enc]
     if {$i < 0} {
 	# look for "isonnn" instead of "iso-nnn" or "iso_nnn"
-	if {[regsub {^iso[-_]} $enc iso encx]} {
+	if {[regsub {^(iso|cp|ibm|jis)[-_]} $enc {\1} encx]} {
 	    set i [lsearch -exact $lcnames $encx]
 	}
     }
@@ -9780,7 +9780,7 @@ proc tcl_encoding {enc} {
 	    foreach e $ll {
 		set i [lsearch -exact $lcnames $e]
 		if {$i < 0} {
-		    if {[regsub {^iso[-_]} $e iso ex]} {
+		    if {[regsub {^(iso|cp|ibm|jis)[-_]} $e {\1} ex]} {
 			set i [lsearch -exact $lcnames $ex]
 		    }
 		}
@@ -9795,6 +9795,45 @@ proc tcl_encoding {enc} {
     return {}
 }
 
+proc gitattr {path attr default} {
+	if {[catch {set r [exec git check-attr $attr -- $path]}]} {
+		set r unspecified
+	} else {
+		set r [join [lrange [split $r :] 2 end] :]
+		regsub {^ } $r {} r
+	}
+	if {$r eq {unspecified}} {
+		return $default
+	}
+	return $r
+}
+
+proc get_path_encoding {path} {
+	global gui_encoding
+	set tcl_enc [tcl_encoding $gui_encoding]
+	if {$tcl_enc eq {}} {
+		set tcl_enc [encoding system]
+	}
+	if {$path ne {}} {
+		set enc2 [tcl_encoding [gitattr $path encoding $tcl_enc]]
+		if {$enc2 ne {}} {
+			set tcl_enc $enc2
+		}
+	}
+	return $tcl_enc
+}
+
+proc get_cached_encoding {path} {
+	global path_encoding_cache
+	if {[info exists path_encoding_cache($path)]} {
+		return $path_encoding_cache($path)
+	} else {
+		set enc [get_path_encoding $path]
+		set path_encoding_cache($path) $enc
+		return $enc
+	}
+}
+
 # First check that Tcl/Tk is recent enough
 if {[catch {package require Tk 8.4} err]} {
     show_error {} . [mc "Sorry, gitk cannot run with this version of Tcl/Tk.\n\
@@ -9817,6 +9856,11 @@ if {$tclencoding == {}} {
     puts stderr "Warning: encoding $gitencoding is not supported by Tcl/Tk"
 }
 
+set gui_encoding [encoding system]
+catch {
+	set gui_encoding [exec git config --get gui.encoding]
+}
+
 set mainfont {Helvetica 9}
 set textfont {Courier 9}
 set uifont {Helvetica 9 bold}
-- 
1.6.0.20.g6148bc

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH (GIT-GUI,GITK) 7/8] gitk: Implement file contents encoding support.
  2008-09-17 21:07           ` [PATCH (GIT-GUI,GITK) 6/8] gitk: Port new encoding logic from git-gui Alexander Gavrilov
@ 2008-09-17 21:07             ` Alexander Gavrilov
  2008-09-17 21:07               ` [PATCH (GIT-GUI,GITK) 8/8] gitk: Support filenames in the locale encoding Alexander Gavrilov
  2008-09-19 12:10             ` [PATCH (GIT-GUI,GITK) 6/8] gitk: Port new encoding logic from git-gui Johannes Sixt
  1 sibling, 1 reply; 44+ messages in thread
From: Alexander Gavrilov @ 2008-09-17 21:07 UTC (permalink / raw)
  To: git; +Cc: Shawn O. Pearce, Paul Mackerras

Use new functions to support using explicit encoding
configuration for file contents.

Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com>
---
 gitk |   19 ++++++++++++++++---
 1 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/gitk b/gitk
index b210f79..98923e7 100755
--- a/gitk
+++ b/gitk
@@ -6295,7 +6295,7 @@ proc showfile {f} {
 	    return
 	}
     }
-    fconfigure $bf -blocking 0
+    fconfigure $bf -blocking 0 -encoding [get_cached_encoding $f]
     filerun $bf [list getblobline $bf $diffids]
     $ctext config -state normal
     clear_ctext $commentend
@@ -6333,6 +6333,7 @@ proc mergediff {id} {
     global diffids
     global parents
     global diffcontext
+    global diffencoding
     global limitdiffs vfilelimit curview
 
     set diffmergeid $id
@@ -6346,9 +6347,10 @@ proc mergediff {id} {
 	error_popup "[mc "Error getting merge diffs:"] $err"
 	return
     }
-    fconfigure $mdf -blocking 0
+    fconfigure $mdf -blocking 0 -encoding binary
     set mdifffd($id) $mdf
     set np [llength $parents($curview,$id)]
+    set diffencoding [get_cached_encoding {}]
     settabs $np
     filerun $mdf [list getmergediffline $mdf $id $np]
 }
@@ -6356,6 +6358,7 @@ proc mergediff {id} {
 proc getmergediffline {mdf id np} {
     global diffmergeid ctext cflist mergemax
     global difffilestart mdifffd
+    global diffencoding
 
     $ctext conf -state normal
     set nr 0
@@ -6371,14 +6374,17 @@ proc getmergediffline {mdf id np} {
 	    set here [$ctext index "end - 1c"]
 	    lappend difffilestart $here
 	    add_flist [list $fname]
+	    set diffencoding [get_cached_encoding $fname]
 	    set l [expr {(78 - [string length $fname]) / 2}]
 	    set pad [string range "----------------------------------------" 1 $l]
 	    $ctext insert end "$pad $fname $pad\n" filesep
 	} elseif {[regexp {^@@} $line]} {
+	    set line [encoding convertfrom $diffencoding $line]
 	    $ctext insert end "$line\n" hunksep
 	} elseif {[regexp {^[0-9a-f]{40}$} $line] || [regexp {^index} $line]} {
 	    # do nothing
 	} else {
+	    set line [encoding convertfrom $diffencoding $line]
 	    # parse the prefix - one ' ', '-' or '+' for each parent
 	    set spaces {}
 	    set minuses {}
@@ -6586,6 +6592,7 @@ proc getblobdiffs {ids} {
     global diffcontext
     global ignorespace
     global limitdiffs vfilelimit curview
+    global diffencoding
 
     set cmd [diffcmd $ids "-p -C --no-commit-id -U$diffcontext"]
     if {$ignorespace} {
@@ -6599,7 +6606,8 @@ proc getblobdiffs {ids} {
 	return
     }
     set diffinhdr 0
-    fconfigure $bdf -blocking 0
+    set diffencoding [get_cached_encoding {}]
+    fconfigure $bdf -blocking 0 -encoding binary
     set blobdifffd($ids) $bdf
     filerun $bdf [list getblobdiffline $bdf $diffids]
 }
@@ -6633,6 +6641,7 @@ proc getblobdiffline {bdf ids} {
     global diffids blobdifffd ctext curdiffstart
     global diffnexthead diffnextnote difffilestart
     global diffinhdr treediffs
+    global diffencoding
 
     set nr 0
     $ctext conf -state normal
@@ -6670,10 +6679,12 @@ proc getblobdiffline {bdf ids} {
 	    } else {
 		set fname [string range $line 2 [expr {$i - 1}]]
 	    }
+	    set diffencoding [get_cached_encoding $fname]
 	    makediffhdr $fname $ids
 
 	} elseif {[regexp {^@@ -([0-9]+)(,[0-9]+)? \+([0-9]+)(,[0-9]+)? @@(.*)} \
 		       $line match f1l f1c f2l f2c rest]} {
+	    set line [encoding convertfrom $diffencoding $line]
 	    $ctext insert end "$line\n" hunksep
 	    set diffinhdr 0
 
@@ -6693,6 +6704,7 @@ proc getblobdiffline {bdf ids} {
 		if {[string index $fname 0] eq "\""} {
 		    set fname [lindex $fname 0]
 		}
+		set diffencoding [get_cached_encoding $fname]
 		makediffhdr $fname $ids
 	    } elseif {[string compare -length 3 $line "---"] == 0} {
 		# do nothing
@@ -6704,6 +6716,7 @@ proc getblobdiffline {bdf ids} {
 	    $ctext insert end "$line\n" filesep
 
 	} else {
+	    set line [encoding convertfrom $diffencoding $line]
 	    set x [string range $line 0 0]
 	    if {$x == "-" || $x == "+"} {
 		set tag [expr {$x == "+"}]
-- 
1.6.0.20.g6148bc

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH (GIT-GUI,GITK) 8/8] gitk: Support filenames in the locale encoding.
  2008-09-17 21:07             ` [PATCH (GIT-GUI,GITK) 7/8] gitk: Implement file contents encoding support Alexander Gavrilov
@ 2008-09-17 21:07               ` Alexander Gavrilov
  0 siblings, 0 replies; 44+ messages in thread
From: Alexander Gavrilov @ 2008-09-17 21:07 UTC (permalink / raw)
  To: git; +Cc: Shawn O. Pearce, Paul Mackerras

Ensure that gitk works properly with non-ASCII names
in the system encoding. Apparently git does not support
using file names encoded differently from the current
locale.

Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com>
---
 gitk |   16 +++++++++++-----
 1 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/gitk b/gitk
index 98923e7..254faa1 100755
--- a/gitk
+++ b/gitk
@@ -6228,7 +6228,7 @@ proc gettree {id} {
 	    set treepending $id
 	    set treefilelist($id) {}
 	    set treeidlist($id) {}
-	    fconfigure $gtf -blocking 0
+	    fconfigure $gtf -blocking 0 -encoding binary
 	    filerun $gtf [list gettreeline $gtf $id]
 	}
     } else {
@@ -6250,11 +6250,12 @@ proc gettreeline {gtf id} {
 	    set line [string range $line 0 [expr {$i-1}]]
 	    if {$diffids ne $nullid2 && [lindex $line 1] ne "blob"} continue
 	    set sha1 [lindex $line 2]
-	    if {[string index $fname 0] eq "\""} {
-		set fname [lindex $fname 0]
-	    }
 	    lappend treeidlist($id) $sha1
 	}
+	if {[string index $fname 0] eq "\""} {
+	    set fname [lindex $fname 0]
+	}
+	set fname [encoding convertfrom $fname]
 	lappend treefilelist($id) $fname
     }
     if {![eof $gtf]} {
@@ -6370,6 +6371,7 @@ proc getmergediffline {mdf id np} {
 	}
 	if {[regexp {^diff --cc (.*)} $line match fname]} {
 	    # start of a new file
+	    set fname [encoding convertfrom $fname]
 	    $ctext insert end "\n"
 	    set here [$ctext index "end - 1c"]
 	    lappend difffilestart $here
@@ -6519,7 +6521,7 @@ proc gettreediffs {ids} {
 
     set treepending $ids
     set treediff {}
-    fconfigure $gdtf -blocking 0
+    fconfigure $gdtf -blocking 0 -encoding binary
     filerun $gdtf [list gettreediffline $gdtf $ids]
 }
 
@@ -6535,6 +6537,7 @@ proc gettreediffline {gdtf ids} {
 	    if {[string index $file 0] eq "\""} {
 		set file [lindex $file 0]
 	    }
+	    set file [encoding convertfrom $file]
 	    lappend treediff $file
 	}
     }
@@ -6679,6 +6682,7 @@ proc getblobdiffline {bdf ids} {
 	    } else {
 		set fname [string range $line 2 [expr {$i - 1}]]
 	    }
+	    set fname [encoding convertfrom $fname]
 	    set diffencoding [get_cached_encoding $fname]
 	    makediffhdr $fname $ids
 
@@ -6694,6 +6698,7 @@ proc getblobdiffline {bdf ids} {
 		if {[string index $fname 0] eq "\""} {
 		    set fname [lindex $fname 0]
 		}
+		set fname [encoding convertfrom $fname]
 		set i [lsearch -exact $treediffs($ids) $fname]
 		if {$i >= 0} {
 		    setinlist difffilestart $i $curdiffstart
@@ -6704,6 +6709,7 @@ proc getblobdiffline {bdf ids} {
 		if {[string index $fname 0] eq "\""} {
 		    set fname [lindex $fname 0]
 		}
+		set fname [encoding convertfrom $fname]
 		set diffencoding [get_cached_encoding $fname]
 		makediffhdr $fname $ids
 	    } elseif {[string compare -length 3 $line "---"] == 0} {
-- 
1.6.0.20.g6148bc

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH (GIT-GUI,GITK) 0/8] Encoding support in GUI
  2008-09-17 21:07 [PATCH (GIT-GUI,GITK) 0/8] Encoding support in GUI Alexander Gavrilov
  2008-09-17 21:07 ` [PATCH (GIT-GUI,GITK) 1/8] git-gui: Cleanup handling of the default encoding Alexander Gavrilov
@ 2008-09-17 21:45 ` Paul Mackerras
  2008-09-18 11:12   ` Alexander Gavrilov
  2008-09-21 22:55 ` Paul Mackerras
  2 siblings, 1 reply; 44+ messages in thread
From: Paul Mackerras @ 2008-09-17 21:45 UTC (permalink / raw)
  To: Alexander Gavrilov; +Cc: git, Shawn O. Pearce

Alexander Gavrilov writes:

> Currently GUI tools don't provide any support for
> viewing files that contain non-ASCII characters.

Well, that's just not true, at least as far as gitk is concerned.

If you feel there are deficiencies in how gitk handles encodings (and
I'm quite willing to believe there are, since ASCII is sufficient for
my needs), then please give us a detailed explanation of what you
would like it to do or specifically what is wrong with what it does at
the moment.  I'd like to see several paragraphs, not just the one or
two sentences you have put in the descriptions for patches 6-8.

Thanks,
Paul.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH (GIT-GUI,GITK) 0/8] Encoding support in GUI
  2008-09-17 21:45 ` [PATCH (GIT-GUI,GITK) 0/8] Encoding support in GUI Paul Mackerras
@ 2008-09-18 11:12   ` Alexander Gavrilov
  0 siblings, 0 replies; 44+ messages in thread
From: Alexander Gavrilov @ 2008-09-18 11:12 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: git, Shawn O. Pearce

On Thu, Sep 18, 2008 at 1:45 AM, Paul Mackerras <paulus@samba.org> wrote:
> Alexander Gavrilov writes:
>> Currently GUI tools don't provide any support for
>> viewing files that contain non-ASCII characters.
>
> Well, that's just not true, at least as far as gitk is concerned.

Somehow being able to show files in the system encoding is not good
enough for a tool that is supposed to be used for cross-platform
projects. It is only marginally better than always using ISO-8859-1,
as git-gui, in effect, did.

> If you feel there are deficiencies in how gitk handles encodings (and
> I'm quite willing to believe there are, since ASCII is sufficient for
> my needs), then please give us a detailed explanation of what you
> would like it to do or specifically what is wrong with what it does at
> the moment.  I'd like to see several paragraphs, not just the one or
> two sentences you have put in the descriptions for patches 6-8.

I did not combine this set of patches into a single group without a
reason. This is a policy decision that spans the boundary of
individual tools, although it is initially implemented and documented
on the git-gui side. The gitk commits simply bring back changes to
code originally copied from gitk, tie in new logic that supports
per-file encoding, and fix some obvious breakage (of course, I can
write longer descriptions for them). By the way, patch 4 will apply to
gitk, if you replace 'lib/encoding.tcl' with 'gitk', and specify -C2.

P.S. All changes are build on top of these two commits:
http://repo.or.cz/w/git-gui.git?a=log;h=refs/heads/pu

Alexander

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH (GIT-GUI,GITK) 1/8] git-gui: Cleanup handling of the default encoding.
  2008-09-17 21:07 ` [PATCH (GIT-GUI,GITK) 1/8] git-gui: Cleanup handling of the default encoding Alexander Gavrilov
  2008-09-17 21:07   ` [PATCH (GIT-GUI,GITK) 2/8] git-gui: Add a menu of available encodings Alexander Gavrilov
@ 2008-09-18 15:02   ` Dmitry Potapov
  2008-09-18 15:14     ` Alexander Gavrilov
  2008-09-18 16:29     ` Johannes Sixt
  1 sibling, 2 replies; 44+ messages in thread
From: Dmitry Potapov @ 2008-09-18 15:02 UTC (permalink / raw)
  To: Alexander Gavrilov; +Cc: git, Shawn O. Pearce, Paul Mackerras

On Thu, Sep 18, 2008 at 01:07:32AM +0400, Alexander Gavrilov wrote:
> - Make diffs and blame default to the system (locale)
>   encoding instead of hard-coding UTF-8.
> - Add a gui.encoding option to allow overriding it.
> - gitattributes still have the final word.

The subject line of this patch is a bit misleading. I would not expect
from "clean up" to change the existing behavior and existing default.

> The rationale for this is Windows support:
> 
> 1) Windows people are accustomed to using legacy encodings
>    for text files. For many of them defaulting to utf-8
>    will be counter-intuitive.
> 2) Windows doesn't support utf-8 locales, and switching
>    the system encoding is a real pain. Thus the option.

I don't care much what is the default for Windows, but I wonder whether
this rationale is good enough to change the default for other platforms.
If you have systems configured with utf-8 and others (usually old ones)
with legacy encoding, you will store files in utf-8 in your repo, thus
having utf-8 as the default makes sense for non-Windows platforms.

BTW, when you said the system encoding above, what exactly encoding do
you mean? AFAIK, Windows has two legacy encodings OEM-CP and ANSI-CP.
If I write a console program and compile it using MS-VC then it should
use OEM-CP.  However, if you write a GUI program or a console program
that is compiled using gcc from Cygwin, you have to use ANSI-CP. For
instance, if you use the Russian locale on Windows, ASNI-CP is 1251 and
OEM-CP is 866. So, my question is what exactly encoding do you call as
"system" above?

Dmitry

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH (GIT-GUI,GITK) 1/8] git-gui: Cleanup handling of the default encoding.
  2008-09-18 15:02   ` [PATCH (GIT-GUI,GITK) 1/8] git-gui: Cleanup handling of the default encoding Dmitry Potapov
@ 2008-09-18 15:14     ` Alexander Gavrilov
  2008-09-18 16:29     ` Johannes Sixt
  1 sibling, 0 replies; 44+ messages in thread
From: Alexander Gavrilov @ 2008-09-18 15:14 UTC (permalink / raw)
  To: Dmitry Potapov; +Cc: git, Shawn O. Pearce, Paul Mackerras

On Thu, Sep 18, 2008 at 7:02 PM, Dmitry Potapov <dpotapov@gmail.com> wrote:
> The subject line of this patch is a bit misleading. I would not expect
> from "clean up" to change the existing behavior and existing default.

Comments can be easily changed.

>> The rationale for this is Windows support:
>>
>> 1) Windows people are accustomed to using legacy encodings
>>    for text files. For many of them defaulting to utf-8
>>    will be counter-intuitive.
>> 2) Windows doesn't support utf-8 locales, and switching
>>    the system encoding is a real pain. Thus the option.
>
> I don't care much what is the default for Windows, but I wonder whether
> this rationale is good enough to change the default for other platforms.
> If you have systems configured with utf-8 and others (usually old ones)
> with legacy encoding, you will store files in utf-8 in your repo, thus
> having utf-8 as the default makes sense for non-Windows platforms.

In fact, I think that the only reasonable default is the locale
encoding. If they want something different, they can do "git config
--global gui.encoding utf-8", that's what the option is there for.

> BTW, when you said the system encoding above, what exactly encoding do
> you mean? AFAIK, Windows has two legacy encodings OEM-CP and ANSI-CP.
> If I write a console program and compile it using MS-VC then it should
> use OEM-CP.  However, if you write a GUI program or a console program
> that is compiled using gcc from Cygwin, you have to use ANSI-CP. For
> instance, if you use the Russian locale on Windows, ASNI-CP is 1251 and
> OEM-CP is 866. So, my question is what exactly encoding do you call as
> "system" above?

Whatever Tcl thinks the system encoding is. In this case it is cp1251.
CP866 is for DOS.

Alexander

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH (GIT-GUI,GITK) 1/8] git-gui: Cleanup handling of the default encoding.
  2008-09-18 15:02   ` [PATCH (GIT-GUI,GITK) 1/8] git-gui: Cleanup handling of the default encoding Dmitry Potapov
  2008-09-18 15:14     ` Alexander Gavrilov
@ 2008-09-18 16:29     ` Johannes Sixt
  2008-09-18 16:50       ` Dmitry Potapov
  1 sibling, 1 reply; 44+ messages in thread
From: Johannes Sixt @ 2008-09-18 16:29 UTC (permalink / raw)
  To: Dmitry Potapov; +Cc: Alexander Gavrilov, git, Shawn O. Pearce, Paul Mackerras

Dmitry Potapov schrieb:
> On Thu, Sep 18, 2008 at 01:07:32AM +0400, Alexander Gavrilov wrote:
>> The rationale for this is Windows support:
>>
>> 1) Windows people are accustomed to using legacy encodings
>>    for text files. For many of them defaulting to utf-8
>>    will be counter-intuitive.
>> 2) Windows doesn't support utf-8 locales, and switching
>>    the system encoding is a real pain. Thus the option.
> 
> I don't care much what is the default for Windows, but I wonder whether
> this rationale is good enough to change the default for other platforms.

"The default" should not be hardcoded in the tool.

By setting the encoding to "system", "the default" is taken from whatever
the system's current locale is. If you are on modern Linux, your locale is
most likely set to UTF8, and everything is fine; you won't observe a
change in behavior.

But if you are on a system whose locale was not set to UTF8, then you very
likely did *not* produce UTF8 data, and the display in git-gui was screwed
because it assumed UTF8. With this change it uses the system's encoding,
and it is an improvement.

> If you have systems configured with utf-8 and others (usually old ones)
> with legacy encoding, you will store files in utf-8 in your repo, thus
> having utf-8 as the default makes sense for non-Windows platforms.

How can you know? For example, I've to work with systems that use "legacy
encodings", and I can't use UTF8 in my data. Hence, the default of UTF8
was not exactly useful. With this patch series there's now a mechanism
that allows me to state the encoding per file, and all platforms should be
able to show the data in the correct way.

-- Hannes

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH (GIT-GUI,GITK) 1/8] git-gui: Cleanup handling of the default encoding.
  2008-09-18 16:29     ` Johannes Sixt
@ 2008-09-18 16:50       ` Dmitry Potapov
  2008-09-18 17:00         ` Alexander Gavrilov
  0 siblings, 1 reply; 44+ messages in thread
From: Dmitry Potapov @ 2008-09-18 16:50 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Alexander Gavrilov, git, Shawn O. Pearce, Paul Mackerras

On Thu, Sep 18, 2008 at 06:29:26PM +0200, Johannes Sixt wrote:
> 
> By setting the encoding to "system", "the default" is taken from whatever
> the system's current locale is. If you are on modern Linux, your locale is
> most likely set to UTF8, and everything is fine; you won't observe a
> change in behavior.

That's right.

> But if you are on a system whose locale was not set to UTF8, then you very
> likely did *not* produce UTF8 data, and the display in git-gui was screwed
> because it assumed UTF8. With this change it uses the system's encoding,
> and it is an improvement.

It is not about how data are stored locale but what is in repository.
Even if you still have some Linux box with legacy encoding on it, you
still want to see what in repository, which is mostly likely to be in
UTF-8. Even if you do not have UTF-8 locale, all decent editors are
capable to read and store files in UTF-8 (even if it is not your locale),
and it is really make sense to store files in UTF-8, which makes sense
because you are going then on a modern Linux, you want to have all data
in the repository to be in a single encoding, and UTF-8 is the best
choice for that.

> 
> > If you have systems configured with utf-8 and others (usually old ones)
> > with legacy encoding, you will store files in utf-8 in your repo, thus
> > having utf-8 as the default makes sense for non-Windows platforms.
> 
> How can you know? For example, I've to work with systems that use "legacy
> encodings", and I can't use UTF8 in my data. Hence, the default of UTF8
> was not exactly useful. With this patch series there's now a mechanism
> that allows me to state the encoding per file, and all platforms should be
> able to show the data in the correct way.

This patch is certainly a big improvement, as it allows to choose what
encoding you want to see, but I was not sure that changing the default
from UTF-8 to the system locale is really a good idea for anything but
Windows specific projects. Anyway, I have converted all computers that
I use regularly to UTF-8, so I don't really care...

Dmitry

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH (GIT-GUI,GITK) 1/8] git-gui: Cleanup handling of the default encoding.
  2008-09-18 16:50       ` Dmitry Potapov
@ 2008-09-18 17:00         ` Alexander Gavrilov
  2008-09-18 17:19           ` Dmitry Potapov
  0 siblings, 1 reply; 44+ messages in thread
From: Alexander Gavrilov @ 2008-09-18 17:00 UTC (permalink / raw)
  To: Dmitry Potapov; +Cc: Johannes Sixt, git, Shawn O. Pearce, Paul Mackerras

On Thu, Sep 18, 2008 at 8:50 PM, Dmitry Potapov <dpotapov@gmail.com> wrote:
> It is not about how data are stored locale but what is in repository.
> Even if you still have some Linux box with legacy encoding on it, you
> still want to see what in repository, which is mostly likely to be in
> UTF-8. Even if you do not have UTF-8 locale, all decent editors are
> capable to read and store files in UTF-8 (even if it is not your locale),
> and it is really make sense to store files in UTF-8, which makes sense
> because you are going then on a modern Linux, you want to have all data
> in the repository to be in a single encoding, and UTF-8 is the best
> choice for that.

A new user would expect to see his files properly, and they are likely
to be in the locale encoding. And if you know about utf-8, you can
open the Options dialog, and select it explicitly from a menu. And if
you commit a .gitattributes file with encoding specifications to the
repository, it will be used automatically wherever you check it out.

> This patch is certainly a big improvement, as it allows to choose what
> encoding you want to see, but I was not sure that changing the default
> from UTF-8 to the system locale is really a good idea for anything but
> Windows specific projects. Anyway, I have converted all computers that
> I use regularly to UTF-8, so I don't really care...

You are here missing the fact, that the actual current default for
git-gui is not utf-8, but 'binary', essentially equivalent to
ISO-8859-1. UTF-8 was suggested by a patch that has been around in the
'pu' branch since January, and which I took as a base for my series.
Gitk on the other hand uses the locale encoding.

Alexander

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH (GIT-GUI,GITK) 1/8] git-gui: Cleanup handling of the default encoding.
  2008-09-18 17:00         ` Alexander Gavrilov
@ 2008-09-18 17:19           ` Dmitry Potapov
  0 siblings, 0 replies; 44+ messages in thread
From: Dmitry Potapov @ 2008-09-18 17:19 UTC (permalink / raw)
  To: Alexander Gavrilov; +Cc: Johannes Sixt, git, Shawn O. Pearce, Paul Mackerras

On Thu, Sep 18, 2008 at 09:00:07PM +0400, Alexander Gavrilov wrote:
> 
> You are here missing the fact, that the actual current default for
> git-gui is not utf-8, but 'binary', essentially equivalent to
> ISO-8859-1.

In this case, I was just confused by the comment to the patch, which was
saying "Make diffs and blame default to the system (locale) encoding
instead of hard-coding UTF-8."

> UTF-8 was suggested by a patch that has been around in the
> 'pu' branch since January, and which I took as a base for my series.
> Gitk on the other hand uses the locale encoding.

I see... Then your patch makes perfect sense regardless of Windows
support, which was presented as the rationale for the patch.

Thanks,
Dmitry

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH (GIT-GUI,GITK) 6/8] gitk: Port new encoding logic from git-gui.
  2008-09-17 21:07           ` [PATCH (GIT-GUI,GITK) 6/8] gitk: Port new encoding logic from git-gui Alexander Gavrilov
  2008-09-17 21:07             ` [PATCH (GIT-GUI,GITK) 7/8] gitk: Implement file contents encoding support Alexander Gavrilov
@ 2008-09-19 12:10             ` Johannes Sixt
  2008-09-19 12:38               ` Alexander Gavrilov
  1 sibling, 1 reply; 44+ messages in thread
From: Johannes Sixt @ 2008-09-19 12:10 UTC (permalink / raw)
  To: Alexander Gavrilov; +Cc: git, Shawn O. Pearce, Paul Mackerras

Alexander Gavrilov schrieb:
> Add functions that implement the same logic for file
> contents encoding as git-gui uses:
> 
> - Defaults to the system encoding.
> - Overridden by setting the gui.encoding option.
> - Further overridden on per-file basis by gitattributes.
> 
> Also extends the range of supported encoding names.

If I run

  $ LANG=C gitk 146ed90

with this series on the git-gui repository, then I hoped to see the text
in the patches in the right encoding. But I understand that I expected too
much - the patch text is just a stream of bytes that comes from different
files, and the best you can do is to apply the system encoding.

But if the view is switched to the tree view, and file contents are
inspected, then this patch should help. But it doesn't. If you look at
po/ja.po, it is appearent that the file was not read as UTF-8, which is
dictated by .gitattributes.

-- Hannes

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH (GIT-GUI,GITK) 6/8] gitk: Port new encoding logic from git-gui.
  2008-09-19 12:10             ` [PATCH (GIT-GUI,GITK) 6/8] gitk: Port new encoding logic from git-gui Johannes Sixt
@ 2008-09-19 12:38               ` Alexander Gavrilov
  2008-09-19 13:04                 ` Johannes Sixt
  0 siblings, 1 reply; 44+ messages in thread
From: Alexander Gavrilov @ 2008-09-19 12:38 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Shawn O. Pearce, Paul Mackerras

On Fri, Sep 19, 2008 at 4:10 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
> If I run
>
>  $ LANG=C gitk 146ed90
>
> with this series on the git-gui repository, then I hoped to see the text
> in the patches in the right encoding. But I understand that I expected too
> much - the patch text is just a stream of bytes that comes from different
> files, and the best you can do is to apply the system encoding.
>
> But if the view is switched to the tree view, and file contents are
> inspected, then this patch should help. But it doesn't. If you look at
> po/ja.po, it is appearent that the file was not read as UTF-8, which is
> dictated by .gitattributes.

On my system everything works. You must have made a mistake somewhere. Namely:

1) Did you apply ALL patches to gitk? The first one is absolutely
useless without the other two.
2) Did you install it? I.e. do you actually run the patched version?
3) Do you have .gitattributes checked out?

Also, it should work in the patch mode as well: it reads the patch as
binary, and decodes each line separately, based on the encoding
determined for the current file.

Alexander

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH (GIT-GUI,GITK) 6/8] gitk: Port new encoding logic from git-gui.
  2008-09-19 12:38               ` Alexander Gavrilov
@ 2008-09-19 13:04                 ` Johannes Sixt
  2008-09-21 18:52                   ` Alexander Gavrilov
  0 siblings, 1 reply; 44+ messages in thread
From: Johannes Sixt @ 2008-09-19 13:04 UTC (permalink / raw)
  To: Alexander Gavrilov; +Cc: git, Shawn O. Pearce, Paul Mackerras

Alexander Gavrilov schrieb:
> On Fri, Sep 19, 2008 at 4:10 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
>> If I run
>>
>>  $ LANG=C gitk 146ed90
>>
>> with this series on the git-gui repository, then I hoped to see the text
>> in the patches in the right encoding. But I understand that I expected too
>> much - the patch text is just a stream of bytes that comes from different
>> files, and the best you can do is to apply the system encoding.
>>
>> But if the view is switched to the tree view, and file contents are
>> inspected, then this patch should help. But it doesn't. If you look at
>> po/ja.po, it is appearent that the file was not read as UTF-8, which is
>> dictated by .gitattributes.
> 
> On my system everything works. You must have made a mistake somewhere.

Indeed. I was running above command form a git.git clone. Now, looking at
anything that is in 146ed90 produces paths like

   po/de.po
   po/ja.po

Yet, .gitattributes says this:

   /po/*.po    encoding=UTF-8

which does not match any of the *.po files because the *.po are one level
deeper inside git-gui/po/. The change below made a difference.
Everything's fine now!

diff --git a/git-gui/.gitattributes b/git-gui/.gitattributes
index f96112d..8ad5766 100644
--- a/git-gui/.gitattributes
+++ b/git-gui/.gitattributes
@@ -1,3 +1,3 @@
 *           encoding=US-ASCII
 git-gui.sh  encoding=UTF-8
-/po/*.po    encoding=UTF-8
+*.po        encoding=UTF-8

> Also, it should work in the patch mode as well: it reads the patch as
> binary, and decodes each line separately, based on the encoding
> determined for the current file.

This is great. But we'll have to see how this works on a multi-file diff
on Windows. (I sense slowness.)

Thanks,
-- Hannes

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH (GIT-GUI,GITK) 6/8] gitk: Port new encoding logic from git-gui.
  2008-09-19 13:04                 ` Johannes Sixt
@ 2008-09-21 18:52                   ` Alexander Gavrilov
  2008-09-22  7:25                     ` Johannes Sixt
  2008-09-22  9:01                     ` Dmitry Potapov
  0 siblings, 2 replies; 44+ messages in thread
From: Alexander Gavrilov @ 2008-09-21 18:52 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Shawn O. Pearce, Paul Mackerras

On Friday 19 September 2008 17:04:54 Johannes Sixt wrote:
> Alexander Gavrilov schrieb:
> > Also, it should work in the patch mode as well: it reads the patch as
> > binary, and decodes each line separately, based on the encoding
> > determined for the current file.
> 
> This is great. But we'll have to see how this works on a multi-file diff
> on Windows. (I sense slowness.)
> 

Yes, yesterday I noticed that when the patch is very big, like when the
index has got out of sync, and gitk claims local changes in all of the git-gui
repository (>1000 files), first view is slow even on Linux.

I made a patch to optimize attribute lookup. I'm afraid that further optimization
is impossible without interface changes in git-check-attr, or reimplementing
the attribute parser in Tcl.

--- >8 ---
From: Alexander Gavrilov <angavrilov@gmail.com>
Subject: [PATCH] gitk: Implement batch lookup and caching of encoding attrs.

When the diff contains thousands of files, calling git-check-attr
once per file is very slow. With this patch gitk does attribute
lookup in batches of 30 files while reading the diff file list,
which leads to a very noticeable speedup.

Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com>
---
 gitk |   34 +++++++++++++++++++++++++++++++++-
 1 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/gitk b/gitk
index 1355aa2..6fc1e90 100755
--- a/gitk
+++ b/gitk
@@ -6530,6 +6530,7 @@ proc gettreediffline {gdtf ids} {
     global cmitmode vfilelimit curview limitdiffs
 
     set nr 0
+    set sublist {}
     while {[incr nr] <= 1000 && [gets $gdtf line] >= 0} {
 	set i [string first "\t" $line]
 	if {$i >= 0} {
@@ -6539,8 +6540,10 @@ proc gettreediffline {gdtf ids} {
 	    }
 	    set file [encoding convertfrom $file]
 	    lappend treediff $file
+	    lappend sublist $file
 	}
     }
+    cache_gitattr encoding $sublist
     if {![eof $gdtf]} {
 	return [expr {$nr >= 1000? 2: 1}]
     }
@@ -9839,18 +9842,47 @@ proc tcl_encoding {enc} {
 }
 
 proc gitattr {path attr default} {
-	if {[catch {set r [exec git check-attr $attr -- $path]}]} {
+	global path_attr_cache
+	if {[info exists path_attr_cache($attr,$path)]} {
+		set r $path_attr_cache($attr,$path)
+	} elseif {[catch {set r [exec git check-attr $attr -- $path]}]} {
 		set r unspecified
 	} else {
 		set r [join [lrange [split $r :] 2 end] :]
 		regsub {^ } $r {} r
 	}
+	set path_attr_cache($attr,$path) $r
 	if {$r eq {unspecified}} {
 		return $default
 	}
 	return $r
 }
 
+proc cache_gitattr {attr pathlist} {
+	global path_attr_cache
+	set newlist {}
+	foreach path $pathlist {
+		if {[info exists path_attr_cache($attr,$path)]} continue
+		lappend newlist $path
+	}
+	while {$newlist ne {}} {
+		set head [lrange $newlist 0 29]
+		set newlist [lrange $newlist 30 end]
+		if {![catch {set rlist [eval exec git check-attr $attr -- $head]}]} {
+			foreach row [split $rlist "\n"] {
+				set cols [split $row :]
+				set path [lindex $cols 0]
+				set value [join [lrange $cols 2 end] :]
+				if {[string index $path 0] eq "\""} {
+					set path [encoding convertfrom [lindex $path 0]]
+				}
+				set path_attr_cache($attr,$path) $value
+			}
+		}
+		update
+	}
+}
+
 proc get_path_encoding {path} {
 	global gui_encoding
 	set tcl_enc [tcl_encoding $gui_encoding]
-- 
1.6.0.20.g6148bc

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH (GIT-GUI,GITK) 0/8] Encoding support in GUI
  2008-09-17 21:07 [PATCH (GIT-GUI,GITK) 0/8] Encoding support in GUI Alexander Gavrilov
  2008-09-17 21:07 ` [PATCH (GIT-GUI,GITK) 1/8] git-gui: Cleanup handling of the default encoding Alexander Gavrilov
  2008-09-17 21:45 ` [PATCH (GIT-GUI,GITK) 0/8] Encoding support in GUI Paul Mackerras
@ 2008-09-21 22:55 ` Paul Mackerras
  2008-09-22 10:12   ` Alexander Gavrilov
  2008-10-01 11:35   ` [PATCH (GIT-GUI,GITK) 0/8] Encoding support in GUI Johannes Sixt
  2 siblings, 2 replies; 44+ messages in thread
From: Paul Mackerras @ 2008-09-21 22:55 UTC (permalink / raw)
  To: Alexander Gavrilov; +Cc: git, Shawn O. Pearce

Alexander Gavrilov writes:

> File encoding can be specified in the following ways:
> 
> 1) It defaults to the current locale encoding.
> 2) It can be overridden by setting the gui.encoding option.

I'm happy with providing a way to say what the default encoding of
files in the repository is, but I wonder why it is seen as a property
of the GUI.  Is it just that there is an existing "gui" section that
is convenient to use, or does git-gui already use gui.encoding (before
this patch series), or is there some other reason?

> 3) It can be further set on per-file basis by specifying
>    the 'encoding' attribute in gitattributes.

I haven't used .gitattributes before, but I would expect that the
.gitattributes files would be stored in the repository along with
everything else.  If that's the case, then for gitk at least there is
the question of which version of a given .gitattributes file one
should use when viewing the tree for a commit which isn't the
currently checked-out commit - do you use the version from that tree,
or the version in the working directory?  We seem to be using the
latter at present, and caching the results.  Is there a philosophical
reason to do that, other than speed?  (Also it seems that we won't
notice if the user changes .gitattributes after we've looked at it, or
if they create one after we've looked for one and not found it.)

> Since git apparently cannot work with filenames in non-locale
> encodings anyway, I did not try to do anything about it apart
> from fixing some obvious bugs.

For Linux, filenames are sequences of non-null bytes, so using
binary encoding to read them in Tcl sounds about right.

> There are also some bugs in handling of commit encodings in gitk,
> but they are out of the scope of this series.

I'm interested to hear what they are.

Paul.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH (GIT-GUI,GITK) 6/8] gitk: Port new encoding logic from git-gui.
  2008-09-21 18:52                   ` Alexander Gavrilov
@ 2008-09-22  7:25                     ` Johannes Sixt
  2008-09-22  7:46                       ` Johannes Sixt
  2008-09-22  8:01                       ` Alexander Gavrilov
  2008-09-22  9:01                     ` Dmitry Potapov
  1 sibling, 2 replies; 44+ messages in thread
From: Johannes Sixt @ 2008-09-22  7:25 UTC (permalink / raw)
  To: Alexander Gavrilov; +Cc: git, Shawn O. Pearce, Paul Mackerras

Alexander Gavrilov schrieb:
> Subject: [PATCH] gitk: Implement batch lookup and caching of encoding attrs.
> 
> When the diff contains thousands of files, calling git-check-attr
> once per file is very slow. With this patch gitk does attribute
> lookup in batches of 30 files while reading the diff file list,
> which leads to a very noticeable speedup.

This one does not work for me: The correct is not picked up anymore,
neither in Patch mode nor Tree mode. (It works as expected without this
patch.)

> +proc cache_gitattr {attr pathlist} {
> +	global path_attr_cache
> +	set newlist {}
> +	foreach path $pathlist {
> +		if {[info exists path_attr_cache($attr,$path)]} continue
> +		lappend newlist $path
> +	}
> +	while {$newlist ne {}} {
> +		set head [lrange $newlist 0 29]
> +		set newlist [lrange $newlist 30 end]
> +		if {![catch {set rlist [eval exec git check-attr $attr -- $head]}]} {
> +			foreach row [split $rlist "\n"] {
> +				set cols [split $row :]

This colon made me nervous (because of the drive-colon combination on
Windows), but as long as you feed relative paths into 'git check-attr',
this should not matter (in my case).

> +				set path [lindex $cols 0]
> +				set value [join [lrange $cols 2 end] :]
> +				if {[string index $path 0] eq "\""} {
> +					set path [encoding convertfrom [lindex $path 0]]
> +				}
> +				set path_attr_cache($attr,$path) $value
> +			}
> +		}
> +		update
> +	}
> +}

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH (GIT-GUI,GITK) 6/8] gitk: Port new encoding logic from git-gui.
  2008-09-22  7:25                     ` Johannes Sixt
@ 2008-09-22  7:46                       ` Johannes Sixt
  2008-09-22  8:01                       ` Alexander Gavrilov
  1 sibling, 0 replies; 44+ messages in thread
From: Johannes Sixt @ 2008-09-22  7:46 UTC (permalink / raw)
  To: Alexander Gavrilov; +Cc: git, Shawn O. Pearce, Paul Mackerras

Johannes Sixt schrieb:
> Alexander Gavrilov schrieb:
>> Subject: [PATCH] gitk: Implement batch lookup and caching of encoding attrs.
>>
>> When the diff contains thousands of files, calling git-check-attr
>> once per file is very slow. With this patch gitk does attribute
>> lookup in batches of 30 files while reading the diff file list,
>> which leads to a very noticeable speedup.
> 
> This one does not work for me: The correct is not picked up anymore,

The correct _encoding_ is not picked up anymore...

> neither in Patch mode nor Tree mode. (It works as expected without this
> patch.)
> 
>> +proc cache_gitattr {attr pathlist} {
>> +	global path_attr_cache
>> +	set newlist {}
>> +	foreach path $pathlist {
>> +		if {[info exists path_attr_cache($attr,$path)]} continue
>> +		lappend newlist $path
>> +	}
>> +	while {$newlist ne {}} {
>> +		set head [lrange $newlist 0 29]
>> +		set newlist [lrange $newlist 30 end]
>> +		if {![catch {set rlist [eval exec git check-attr $attr -- $head]}]} {
>> +			foreach row [split $rlist "\n"] {
>> +				set cols [split $row :]
> 
> This colon made me nervous (because of the drive-colon combination on
> Windows), but as long as you feed relative paths into 'git check-attr',
> this should not matter (in my case).

This comment does not imply that I debugged this code. I just noticed the
colon while reading the patch.

>> +				set path [lindex $cols 0]
>> +				set value [join [lrange $cols 2 end] :]
>> +				if {[string index $path 0] eq "\""} {
>> +					set path [encoding convertfrom [lindex $path 0]]
>> +				}
>> +				set path_attr_cache($attr,$path) $value
>> +			}
>> +		}
>> +		update
>> +	}
>> +}

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH (GIT-GUI,GITK) 6/8] gitk: Port new encoding logic from git-gui.
  2008-09-22  7:25                     ` Johannes Sixt
  2008-09-22  7:46                       ` Johannes Sixt
@ 2008-09-22  8:01                       ` Alexander Gavrilov
  2008-09-22  8:20                         ` Johannes Sixt
  1 sibling, 1 reply; 44+ messages in thread
From: Alexander Gavrilov @ 2008-09-22  8:01 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Shawn O. Pearce, Paul Mackerras

On Monday 22 September 2008 11:25:43 Johannes Sixt wrote:
> Alexander Gavrilov schrieb:
> > Subject: [PATCH] gitk: Implement batch lookup and caching of encoding attrs.
> > 
> > When the diff contains thousands of files, calling git-check-attr
> > once per file is very slow. With this patch gitk does attribute
> > lookup in batches of 30 files while reading the diff file list,
> > which leads to a very noticeable speedup.
> 
> This one does not work for me: The correct is not picked up anymore,
> neither in Patch mode nor Tree mode. (It works as expected without this
> patch.)
> 

OOPS, I forgot to copy the line that removes leading whitespace:

@@ -9876,6 +9876,7 @@ proc cache_gitattr {attr pathlist} {
                                if {[string index $path 0] eq "\""} {
                                        set path [encoding convertfrom [lindex $path 0]]
                                }
+                               regsub {^ } $value {} value
                                set path_attr_cache($attr,$path) $value
                        }
                }

> > +				set cols [split $row :]
> > +				set path [lindex $cols 0]
> 
> This colon made me nervous (because of the drive-colon combination on
> Windows), but as long as you feed relative paths into 'git check-attr',
> this should not matter (in my case).

I'm afraid there is nothing to be done about it without changing git-check-attr.
For example, it could quote ':' in the path in octal -- current interface allows it.
But since the path is relative, it is not a big deal.


--- >8 ---
From: Alexander Gavrilov <angavrilov@gmail.com>
Subject: [PATCH v2] gitk: Implement batch lookup and caching of encoding attrs.

When the diff contains thousands of files, calling git-check-attr
once per file is very slow. With this patch gitk does attribute
lookup in batches of 30 files while reading the diff file list,
which leads to a very noticeable speedup.

Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com>
---
 gitk |   35 ++++++++++++++++++++++++++++++++++-
 1 files changed, 34 insertions(+), 1 deletions(-)

diff --git a/gitk b/gitk
index 1355aa2..cf557c3 100755
--- a/gitk
+++ b/gitk
@@ -6530,6 +6530,7 @@ proc gettreediffline {gdtf ids} {
     global cmitmode vfilelimit curview limitdiffs
 
     set nr 0
+    set sublist {}
     while {[incr nr] <= 1000 && [gets $gdtf line] >= 0} {
 	set i [string first "\t" $line]
 	if {$i >= 0} {
@@ -6539,8 +6540,10 @@ proc gettreediffline {gdtf ids} {
 	    }
 	    set file [encoding convertfrom $file]
 	    lappend treediff $file
+	    lappend sublist $file
 	}
     }
+    cache_gitattr encoding $sublist
     if {![eof $gdtf]} {
 	return [expr {$nr >= 1000? 2: 1}]
     }
@@ -9839,18 +9842,48 @@ proc tcl_encoding {enc} {
 }
 
 proc gitattr {path attr default} {
-	if {[catch {set r [exec git check-attr $attr -- $path]}]} {
+	global path_attr_cache
+	if {[info exists path_attr_cache($attr,$path)]} {
+		set r $path_attr_cache($attr,$path)
+	} elseif {[catch {set r [exec git check-attr $attr -- $path]}]} {
 		set r unspecified
 	} else {
 		set r [join [lrange [split $r :] 2 end] :]
 		regsub {^ } $r {} r
 	}
+	set path_attr_cache($attr,$path) $r
 	if {$r eq {unspecified}} {
 		return $default
 	}
 	return $r
 }
 
+proc cache_gitattr {attr pathlist} {
+	global path_attr_cache
+	set newlist {}
+	foreach path $pathlist {
+		if {[info exists path_attr_cache($attr,$path)]} continue
+		lappend newlist $path
+	}
+	while {$newlist ne {}} {
+		set head [lrange $newlist 0 29]
+		set newlist [lrange $newlist 30 end]
+		if {![catch {set rlist [eval exec git check-attr $attr -- $head]}]} {
+			foreach row [split $rlist "\n"] {
+				set cols [split $row :]
+				set path [lindex $cols 0]
+				set value [join [lrange $cols 2 end] :]
+				if {[string index $path 0] eq "\""} {
+					set path [encoding convertfrom [lindex $path 0]]
+				}
+				regsub {^ } $value {} value
+				set path_attr_cache($attr,$path) $value
+			}
+		}
+		update
+	}
+}
+
 proc get_path_encoding {path} {
 	global gui_encoding
 	set tcl_enc [tcl_encoding $gui_encoding]
-- 
1.6.0.20.g6148bc

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH (GIT-GUI,GITK) 6/8] gitk: Port new encoding logic from git-gui.
  2008-09-22  8:01                       ` Alexander Gavrilov
@ 2008-09-22  8:20                         ` Johannes Sixt
  2008-09-22  9:02                           ` Alexander Gavrilov
  0 siblings, 1 reply; 44+ messages in thread
From: Johannes Sixt @ 2008-09-22  8:20 UTC (permalink / raw)
  To: Alexander Gavrilov; +Cc: git, Shawn O. Pearce, Paul Mackerras

Alexander Gavrilov schrieb:
> On Monday 22 September 2008 11:25:43 Johannes Sixt wrote:
>> Alexander Gavrilov schrieb:
>>> Subject: [PATCH] gitk: Implement batch lookup and caching of encoding attrs.
>>>
>>> When the diff contains thousands of files, calling git-check-attr
>>> once per file is very slow. With this patch gitk does attribute
>>> lookup in batches of 30 files while reading the diff file list,
>>> which leads to a very noticeable speedup.
>> This one does not work for me: The correct is not picked up anymore,
>> neither in Patch mode nor Tree mode. (It works as expected without this
>> patch.)
>>
> 
> OOPS, I forgot to copy the line that removes leading whitespace:

Thanks, with this it works now. The delay for a 1000 file patch is
bearable (on Windows, but y'know, Windows types are masochists :-).

Feel free to add:

Tested-by: Johannes Sixt <johannes.sixt@telecom.at>

to the patches in the series when you resend it - and I hope you do resend
it after addressing Paul's concerns.

-- Hannes

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH (GIT-GUI,GITK) 6/8] gitk: Port new encoding logic from git-gui.
  2008-09-21 18:52                   ` Alexander Gavrilov
  2008-09-22  7:25                     ` Johannes Sixt
@ 2008-09-22  9:01                     ` Dmitry Potapov
  1 sibling, 0 replies; 44+ messages in thread
From: Dmitry Potapov @ 2008-09-22  9:01 UTC (permalink / raw)
  To: Alexander Gavrilov; +Cc: Johannes Sixt, git, Shawn O. Pearce, Paul Mackerras

On Sun, Sep 21, 2008 at 10:52:35PM +0400, Alexander Gavrilov wrote:
> 
> I made a patch to optimize attribute lookup. I'm afraid that further optimization
> is impossible without interface changes in git-check-attr, or reimplementing
> the attribute parser in Tcl.

I wonder would not make sense to add --stdin-paths option to git
check-attr in the same way as we have for git hash-object?

Dmitry

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH (GIT-GUI,GITK) 6/8] gitk: Port new encoding logic from git-gui.
  2008-09-22  8:20                         ` Johannes Sixt
@ 2008-09-22  9:02                           ` Alexander Gavrilov
  2008-09-22  9:18                             ` Johannes Sixt
  0 siblings, 1 reply; 44+ messages in thread
From: Alexander Gavrilov @ 2008-09-22  9:02 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Shawn O. Pearce, Paul Mackerras

On Monday 22 September 2008 12:20:28 Johannes Sixt wrote:
> Alexander Gavrilov schrieb:
> > On Monday 22 September 2008 11:25:43 Johannes Sixt wrote:
> >> Alexander Gavrilov schrieb:
> >>> Subject: [PATCH] gitk: Implement batch lookup and caching of encoding attrs.
> >>>
> >>> When the diff contains thousands of files, calling git-check-attr
> >>> once per file is very slow. With this patch gitk does attribute
> >>> lookup in batches of 30 files while reading the diff file list,
> >>> which leads to a very noticeable speedup.
> >> This one does not work for me: The correct is not picked up anymore,
> >> neither in Patch mode nor Tree mode. (It works as expected without this
> >> patch.)
> >>
> > 
> > OOPS, I forgot to copy the line that removes leading whitespace:
> 
> Thanks, with this it works now. The delay for a 1000 file patch is
> bearable (on Windows, but y'know, Windows types are masochists :-).

You can also try applying this patch (originally made for git-gui). It may save
additional 0.3 sec, especially for obscure legacy encodings.

P.S. I do believe there is a place for a library shared between gitk & git-gui.
This code duplication is ugly and annoying; moreover, they have different
indentation conventions, which get messed up...

--- >8 --- 
From: Alexander Gavrilov <angavrilov@gmail.com>
Subject: [PATCH] git-gui: Optimize encoding name resolution using a lookup table.

Encoding menu construction does almost a hundred of encoding
resolutions, which with the old implementation led to a
small but noticeable delay.

Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com>
---
 gitk |   84 ++++++++++++++++++++++++++++++++++++++++++-----------------------
 1 files changed, 54 insertions(+), 30 deletions(-)

diff --git a/gitk b/gitk
index 254faa1..1355aa2 100755
--- a/gitk
+++ b/gitk
@@ -9779,39 +9779,63 @@ set encoding_aliases {
     { Big5 csBig5 }
 }
 
-proc tcl_encoding {enc} {
-    global encoding_aliases
-    set names [encoding names]
-    set lcnames [string tolower $names]
-    set enc [string tolower $enc]
-    set i [lsearch -exact $lcnames $enc]
-    if {$i < 0} {
-	# look for "isonnn" instead of "iso-nnn" or "iso_nnn"
-	if {[regsub {^(iso|cp|ibm|jis)[-_]} $enc {\1} encx]} {
-	    set i [lsearch -exact $lcnames $encx]
+proc build_encoding_table {} {
+	global encoding_aliases encoding_lookup_table
+
+	# Prepare the lookup list; cannot use lsort -nocase because
+	# of compatibility issues with older Tcl (e.g. in msysgit)
+	set names [list]
+	foreach item [encoding names] {
+		lappend names [list [string tolower $item] $item]
+	}
+	set names [lsort -ascii -index 0 $names]
+	# neither can we use lsearch -index
+	set lnames [list]
+	foreach item $names {
+		lappend lnames [lindex $item 0]
+	}
+
+	foreach grp $encoding_aliases {
+		set target {}
+		foreach item $grp {
+			set i [lsearch -sorted -ascii $lnames \
+					[string tolower $item]]
+			if {$i >= 0} {
+				set target [lindex $names $i 1]
+				break
+			}
+		}
+		if {$target eq {}} continue
+		foreach item $grp {
+			set encoding_lookup_table([string tolower $item]) $target
+		}
 	}
-    }
-    if {$i < 0} {
-	foreach l $encoding_aliases {
-	    set ll [string tolower $l]
-	    if {[lsearch -exact $ll $enc] < 0} continue
-	    # look through the aliases for one that tcl knows about
-	    foreach e $ll {
-		set i [lsearch -exact $lcnames $e]
-		if {$i < 0} {
-		    if {[regsub {^(iso|cp|ibm|jis)[-_]} $e {\1} ex]} {
-			set i [lsearch -exact $lcnames $ex]
-		    }
+
+	foreach item $names {
+		set encoding_lookup_table([lindex $item 0]) [lindex $item 1]
+	}
+}
+
+proc tcl_encoding {enc} {
+	global encoding_lookup_table
+	if {$enc eq {}} {
+		return {}
+	}
+	if {![info exists encoding_lookup_table]} {
+		build_encoding_table
+	}
+	set enc [string tolower $enc]
+	if {![info exists encoding_lookup_table($enc)]} {
+		# look for "isonnn" instead of "iso-nnn" or "iso_nnn"
+		if {[regsub {^(iso|cp|ibm|jis)[-_]} $enc {\1} encx]} {
+			set enc $encx
 		}
-		if {$i >= 0} break
-	    }
-	    break
 	}
-    }
-    if {$i >= 0} {
-	return [lindex $names $i]
-    }
-    return {}
+	if {[info exists encoding_lookup_table($enc)]} {
+		return $encoding_lookup_table($enc)
+	} else {
+		return {}
+	}
 }
 
 proc gitattr {path attr default} {
-- 
1.6.0.20.g6148bc

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH (GIT-GUI,GITK) 6/8] gitk: Port new encoding logic from git-gui.
  2008-09-22  9:02                           ` Alexander Gavrilov
@ 2008-09-22  9:18                             ` Johannes Sixt
  2008-09-22 10:18                               ` Alexander Gavrilov
  0 siblings, 1 reply; 44+ messages in thread
From: Johannes Sixt @ 2008-09-22  9:18 UTC (permalink / raw)
  To: Alexander Gavrilov; +Cc: git, Shawn O. Pearce, Paul Mackerras

Alexander Gavrilov schrieb:
> You can also try applying this patch (originally made for git-gui). It may save
> additional 0.3 sec, especially for obscure legacy encodings.

Is this about startup time? Personally, I don't care about 0.3 sec startup
time. I close my primary gitk and git-gui Windows only once a week. ;-)

-- Hannes

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH (GIT-GUI,GITK) 0/8] Encoding support in GUI
  2008-09-21 22:55 ` Paul Mackerras
@ 2008-09-22 10:12   ` Alexander Gavrilov
  2008-10-05  2:30     ` [PATCH 1/2] check-attr: add an internal check_attr() function Dmitry Potapov
  2008-10-01 11:35   ` [PATCH (GIT-GUI,GITK) 0/8] Encoding support in GUI Johannes Sixt
  1 sibling, 1 reply; 44+ messages in thread
From: Alexander Gavrilov @ 2008-09-22 10:12 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: git, Shawn O. Pearce

On Mon, Sep 22, 2008 at 2:55 AM, Paul Mackerras <paulus@samba.org> wrote:
> I'm happy with providing a way to say what the default encoding of
> files in the repository is, but I wonder why it is seen as a property
> of the GUI.  Is it just that there is an existing "gui" section that
> is convenient to use, or does git-gui already use gui.encoding (before
> this patch series), or is there some other reason?

No particular reason, it can be easily renamed to i18n.encoding, or
something else. I only recall seeing the name 'gui.encoding' in a
discussion on this topic several months ago.

> I haven't used .gitattributes before, but I would expect that the
> .gitattributes files would be stored in the repository along with
> everything else.  If that's the case, then for gitk at least there is
> the question of which version of a given .gitattributes file one
> should use when viewing the tree for a commit which isn't the
> currently checked-out commit - do you use the version from that tree,
> or the version in the working directory?  We seem to be using the
> latter at present, and caching the results.  Is there a philosophical
> reason to do that, other than speed?  (Also it seems that we won't
> notice if the user changes .gitattributes after we've looked at it, or
> if they create one after we've looked for one and not found it.)

Core git does not provide any interface for reading attributes from
older commits, so they are loaded from the working copy. And caching
is necessary for performance when lookup involves calling an external
application at least once per 30 files. This may change if
git-check-attr is modified to support --stdin-paths.

>> There are also some bugs in handling of commit encodings in gitk,
>> but they are out of the scope of this series.
>
> I'm interested to hear what they are.

When I tested it, I noticed that:

1. It works correctly only if all commits use the encoding specified
in i18n.commitencoding
2. Even when they do, sometimes commits are loaded through cat-file,
and in this case they are processed using the locale encoding.

Alexander

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH (GIT-GUI,GITK) 6/8] gitk: Port new encoding logic from git-gui.
  2008-09-22  9:18                             ` Johannes Sixt
@ 2008-09-22 10:18                               ` Alexander Gavrilov
  0 siblings, 0 replies; 44+ messages in thread
From: Alexander Gavrilov @ 2008-09-22 10:18 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Shawn O. Pearce, Paul Mackerras

On Mon, Sep 22, 2008 at 1:18 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
> Alexander Gavrilov schrieb:
>> You can also try applying this patch (originally made for git-gui). It may save
>> additional 0.3 sec, especially for obscure legacy encodings.
>
> Is this about startup time? Personally, I don't care about 0.3 sec startup
> time. I close my primary gitk and git-gui Windows only once a week. ;-)

As encoding lookups are currently cached, all this discussion is about
the first viewing.

This patch addresses the time necessary to convert an arbitrary
encoding name to a name that is known to Tcl, or determine that it is
not supported. Without the patch it is done using a linear search
through a large table of aliases. In git-gui it caused a noticeable
delay before the Encoding submenu, which lists all available
encodings, was displayed.

Alexander

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH (GIT-GUI,GITK) 0/8] Encoding support in GUI
  2008-09-21 22:55 ` Paul Mackerras
  2008-09-22 10:12   ` Alexander Gavrilov
@ 2008-10-01 11:35   ` Johannes Sixt
  2008-10-10 10:46     ` Paul Mackerras
  1 sibling, 1 reply; 44+ messages in thread
From: Johannes Sixt @ 2008-10-01 11:35 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Alexander Gavrilov, git, Shawn O. Pearce

Paul Mackerras schrieb:
> Alexander Gavrilov writes:
>> 3) It can be further set on per-file basis by specifying
>>    the 'encoding' attribute in gitattributes.
> 
> I haven't used .gitattributes before, but I would expect that the
> .gitattributes files would be stored in the repository along with
> everything else.  If that's the case, then for gitk at least there is
> the question of which version of a given .gitattributes file one
> should use when viewing the tree for a commit which isn't the
> currently checked-out commit - do you use the version from that tree,
> or the version in the working directory?  We seem to be using the
> latter at present, and caching the results.  Is there a philosophical
> reason to do that, other than speed?

I understand your concerns that an encoding may be picked from the "wrong"
.gitattributes file. But in practice it doesn't matter much, and picking
the attribute from a past commit's tree would even be counter-productive:

I'm about to add a .gitattributes file that specifies the encoding for
some of my files *today* because I was not clever enough to anticipate the
usefulness of an "encoding" attribute a year ago when those files were
added to the repository. When I browse history, I *do* want that *today's*
encoding is picked.

> (Also it seems that we won't
> notice if the user changes .gitattributes after we've looked at it, or
> if they create one after we've looked for one and not found it.)

This is not a show stopper, IMHO. The user will notice soon enough, and
can restart gitk. Nobody sane will change the encoding attributes every hour.

That said, a menu command to flush the attribute cache would be useful
every now and then.

-- Hannes

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 1/2] check-attr: add an internal check_attr() function
  2008-09-22 10:12   ` Alexander Gavrilov
@ 2008-10-05  2:30     ` Dmitry Potapov
  2008-10-05  2:30       ` [PATCH 2/2] check-attr: Add --stdin-paths option Dmitry Potapov
  0 siblings, 1 reply; 44+ messages in thread
From: Dmitry Potapov @ 2008-10-05  2:30 UTC (permalink / raw)
  To: Alexander Gavrilov, git; +Cc: Shawn O. Pearce, Paul Mackerras, Dmitry Potapov

This step is preparation to introducing --stdin-paths option

Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
---
On Mon, Sep 22, 2008 at 02:12:33PM +0400, Alexander Gavrilov wrote:
> 
> Core git does not provide any interface for reading attributes from
> older commits, so they are loaded from the working copy. And caching
> is necessary for performance when lookup involves calling an external
> application at least once per 30 files. This may change if
> git-check-attr is modified to support --stdin-paths.

 builtin-check-attr.c |   41 +++++++++++++++++++++++------------------
 1 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/builtin-check-attr.c b/builtin-check-attr.c
index cb783fc..5a2e329 100644
--- a/builtin-check-attr.c
+++ b/builtin-check-attr.c
@@ -6,6 +6,27 @@
 static const char check_attr_usage[] =
 "git check-attr attr... [--] pathname...";
 
+static void check_attr(int cnt, struct git_attr_check *check,
+	const char** name, const char *file)
+{
+	int j;
+	if (git_checkattr(file, cnt, check))
+		die("git_checkattr died");
+	for (j = 0; j < cnt; j++) {
+		const char *value = check[j].value;
+
+		if (ATTR_TRUE(value))
+			value = "set";
+		else if (ATTR_FALSE(value))
+			value = "unset";
+		else if (ATTR_UNSET(value))
+			value = "unspecified";
+
+		quote_c_style(file, NULL, stdout, 0);
+		printf(": %s: %s\n", name[j], value);
+	}
+}
+
 int cmd_check_attr(int argc, const char **argv, const char *prefix)
 {
 	struct git_attr_check *check;
@@ -42,23 +63,7 @@ int cmd_check_attr(int argc, const char **argv, const char *prefix)
 		check[i].attr = a;
 	}
 
-	for (i = doubledash; i < argc; i++) {
-		int j;
-		if (git_checkattr(argv[i], cnt, check))
-			die("git_checkattr died");
-		for (j = 0; j < cnt; j++) {
-			const char *value = check[j].value;
-
-			if (ATTR_TRUE(value))
-				value = "set";
-			else if (ATTR_FALSE(value))
-				value = "unset";
-			else if (ATTR_UNSET(value))
-				value = "unspecified";
-
-			quote_c_style(argv[i], NULL, stdout, 0);
-			printf(": %s: %s\n", argv[j+1], value);
-		}
-	}
+	for (i = doubledash; i < argc; i++)
+		check_attr(cnt, check, argv+1, argv[i]);
 	return 0;
 }
-- 
1.6.0.2.447.g3befd

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 2/2] check-attr: Add --stdin-paths option
  2008-10-05  2:30     ` [PATCH 1/2] check-attr: add an internal check_attr() function Dmitry Potapov
@ 2008-10-05  2:30       ` Dmitry Potapov
  2008-10-06  7:09         ` Johannes Sixt
  0 siblings, 1 reply; 44+ messages in thread
From: Dmitry Potapov @ 2008-10-05  2:30 UTC (permalink / raw)
  To: Alexander Gavrilov, git; +Cc: Shawn O. Pearce, Paul Mackerras, Dmitry Potapov

This allows multiple paths to be specified on stdin.

Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
---
 Documentation/git-check-attr.txt |    4 ++
 builtin-check-attr.c             |   66 ++++++++++++++++++++++++++++++++------
 t/t0003-attributes.sh            |   17 ++++++++++
 3 files changed, 77 insertions(+), 10 deletions(-)

diff --git a/Documentation/git-check-attr.txt b/Documentation/git-check-attr.txt
index 2b821f2..0839a57 100644
--- a/Documentation/git-check-attr.txt
+++ b/Documentation/git-check-attr.txt
@@ -9,6 +9,7 @@ git-check-attr - Display gitattributes information.
 SYNOPSIS
 --------
 'git check-attr' attr... [--] pathname...
+'git check-attr' --stdin-paths attr... < <list-of-paths
 
 DESCRIPTION
 -----------
@@ -17,6 +18,9 @@ For every pathname, this command will list if each attr is 'unspecified',
 
 OPTIONS
 -------
+--stdin-paths::
+	Read file names from stdin instead of from the command-line.
+
 \--::
 	Interpret all preceding arguments as attributes, and all following
 	arguments as path names. If not supplied, only the first argument will
diff --git a/builtin-check-attr.c b/builtin-check-attr.c
index 5a2e329..821eb5e 100644
--- a/builtin-check-attr.c
+++ b/builtin-check-attr.c
@@ -2,9 +2,19 @@
 #include "cache.h"
 #include "attr.h"
 #include "quote.h"
+#include "parse-options.h"
 
-static const char check_attr_usage[] =
-"git check-attr attr... [--] pathname...";
+static int stdin_paths;
+static const char * const check_attr_usage[] = {
+"git check-attr attr... [--] pathname...",
+"git check-attr --stdin-paths attr... < <list-of-paths>",
+NULL
+};
+
+static const struct option check_attr_options[] = {
+	OPT_BOOLEAN(0 , "stdin-paths", &stdin_paths, "read file names from stdin"),
+	OPT_END()
+};
 
 static void check_attr(int cnt, struct git_attr_check *check,
 	const char** name, const char *file)
@@ -27,17 +37,43 @@ static void check_attr(int cnt, struct git_attr_check *check,
 	}
 }
 
+static void check_attr_stdin_paths(int cnt, struct git_attr_check *check,
+	const char** name)
+{
+	struct strbuf buf, nbuf;
+
+	strbuf_init(&buf, 0);
+	strbuf_init(&nbuf, 0);
+	while (strbuf_getline(&buf, stdin, '\n') != EOF) {
+		if (buf.buf[0] == '"') {
+			strbuf_reset(&nbuf);
+			if (unquote_c_style(&nbuf, buf.buf, NULL))
+				die("line is badly quoted");
+			strbuf_swap(&buf, &nbuf);
+		}
+		check_attr(cnt, check, name, buf.buf);
+	}
+	strbuf_release(&buf);
+	strbuf_release(&nbuf);
+}
+
 int cmd_check_attr(int argc, const char **argv, const char *prefix)
 {
 	struct git_attr_check *check;
 	int cnt, i, doubledash;
+	const char *errstr = NULL;
+
+	argc = parse_options(argc, argv, check_attr_options, check_attr_usage,
+		PARSE_OPT_KEEP_DASHDASH);
+	if (!argc)
+		usage_with_options(check_attr_usage, check_attr_options);
 
 	if (read_cache() < 0) {
 		die("invalid cache");
 	}
 
 	doubledash = -1;
-	for (i = 1; doubledash < 0 && i < argc; i++) {
+	for (i = 0; doubledash < 0 && i < argc; i++) {
 		if (!strcmp(argv[i], "--"))
 			doubledash = i;
 	}
@@ -45,25 +81,35 @@ int cmd_check_attr(int argc, const char **argv, const char *prefix)
 	/* If there is no double dash, we handle only one attribute */
 	if (doubledash < 0) {
 		cnt = 1;
-		doubledash = 1;
+		doubledash = 0;
 	} else
-		cnt = doubledash - 1;
+		cnt = doubledash;
 	doubledash++;
 
-	if (cnt <= 0 || argc < doubledash)
-		usage(check_attr_usage);
+	if (cnt <= 0)
+		errstr = "No attribute specified";
+	else if (stdin_paths && doubledash < argc)
+		errstr = "Can't specify files with --stdin-paths";
+	if (errstr) {
+		error (errstr);
+		usage_with_options(check_attr_usage, check_attr_options);
+	}
+
 	check = xcalloc(cnt, sizeof(*check));
 	for (i = 0; i < cnt; i++) {
 		const char *name;
 		struct git_attr *a;
-		name = argv[i + 1];
+		name = argv[i];
 		a = git_attr(name, strlen(name));
 		if (!a)
 			return error("%s: not a valid attribute name", name);
 		check[i].attr = a;
 	}
 
-	for (i = doubledash; i < argc; i++)
-		check_attr(cnt, check, argv+1, argv[i]);
+	if (stdin_paths)
+		check_attr_stdin_paths(cnt, check, argv);
+	else
+		for (i = doubledash; i < argc; i++)
+			check_attr(cnt, check, argv, argv[i]);
 	return 0;
 }
diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh
index 3d8e06a..f6901b4 100755
--- a/t/t0003-attributes.sh
+++ b/t/t0003-attributes.sh
@@ -47,6 +47,23 @@ test_expect_success 'attribute test' '
 
 '
 
+test_expect_success 'attribute test: read paths from stdin' '
+
+	cat <<EOF > expect
+f: test: f
+a/f: test: f
+a/c/f: test: f
+a/g: test: a/g
+a/b/g: test: a/b/g
+b/g: test: unspecified
+a/b/h: test: a/b/h
+a/b/d/g: test: a/b/d/*
+EOF
+
+	sed -e "s/:.*//" < expect | git check-attr --stdin-paths test > actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'root subdir attribute test' '
 
 	attr_check a/i a/i &&
-- 
1.6.0.2.447.g3befd

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/2] check-attr: Add --stdin-paths option
  2008-10-05  2:30       ` [PATCH 2/2] check-attr: Add --stdin-paths option Dmitry Potapov
@ 2008-10-06  7:09         ` Johannes Sixt
  2008-10-07  0:14           ` [PATCH 1/2 v2] check-attr: add an internal check_attr() function Dmitry Potapov
  2008-10-07  0:16           ` [PATCH 2/2 v2] check-attr: Add --stdin-paths option Dmitry Potapov
  0 siblings, 2 replies; 44+ messages in thread
From: Johannes Sixt @ 2008-10-06  7:09 UTC (permalink / raw)
  To: Dmitry Potapov; +Cc: Alexander Gavrilov, git, Shawn O. Pearce, Paul Mackerras

Dmitry Potapov schrieb:
> +static void check_attr_stdin_paths(int cnt, struct git_attr_check *check,
> +	const char** name)
> +{
> +	struct strbuf buf, nbuf;
> +
> +	strbuf_init(&buf, 0);
> +	strbuf_init(&nbuf, 0);
> +	while (strbuf_getline(&buf, stdin, '\n') != EOF) {
> +		if (buf.buf[0] == '"') {
> +			strbuf_reset(&nbuf);
> +			if (unquote_c_style(&nbuf, buf.buf, NULL))
> +				die("line is badly quoted");
> +			strbuf_swap(&buf, &nbuf);
> +		}
> +		check_attr(cnt, check, name, buf.buf);
> +	}
> +	strbuf_release(&buf);
> +	strbuf_release(&nbuf);
> +}
> +

We know that you will want to use this feature in gitk to reduce the
number of fork()s. But you've a problem: gitk will first write a path to
git-check-addr's stdin, and then wait for the result on its stdout. But
this is a classic pitfall: You are not guaranteed that something will be
returned from stdout right away due to buffering. The least that is needed
is fflush(stdout) in this loop (after each iteration!) so that gitk sees
some result and does not hang forever.

-- Hannes

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 1/2 v2] check-attr: add an internal check_attr() function
  2008-10-06  7:09         ` Johannes Sixt
@ 2008-10-07  0:14           ` Dmitry Potapov
  2008-10-07  0:16           ` [PATCH 2/2 v2] check-attr: Add --stdin-paths option Dmitry Potapov
  1 sibling, 0 replies; 44+ messages in thread
From: Dmitry Potapov @ 2008-10-07  0:14 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Alexander Gavrilov, git, Shawn O. Pearce, Paul Mackerras

This step is preparation to introducing --stdin-paths option.

I have also added maybe_flush_or_die() at the end of main() to ensure that
we exit with the zero code only when we flushed the output successfully.

Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
---
maybe_flush_or_die() is added in this version of patch.

 builtin-check-attr.c |   42 ++++++++++++++++++++++++------------------
 1 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/builtin-check-attr.c b/builtin-check-attr.c
index cb783fc..786256e 100644
--- a/builtin-check-attr.c
+++ b/builtin-check-attr.c
@@ -6,6 +6,27 @@
 static const char check_attr_usage[] =
 "git check-attr attr... [--] pathname...";
 
+static void check_attr(int cnt, struct git_attr_check *check,
+	const char** name, const char *file)
+{
+	int j;
+	if (git_checkattr(file, cnt, check))
+		die("git_checkattr died");
+	for (j = 0; j < cnt; j++) {
+		const char *value = check[j].value;
+
+		if (ATTR_TRUE(value))
+			value = "set";
+		else if (ATTR_FALSE(value))
+			value = "unset";
+		else if (ATTR_UNSET(value))
+			value = "unspecified";
+
+		quote_c_style(file, NULL, stdout, 0);
+		printf(": %s: %s\n", name[j], value);
+	}
+}
+
 int cmd_check_attr(int argc, const char **argv, const char *prefix)
 {
 	struct git_attr_check *check;
@@ -42,23 +63,8 @@ int cmd_check_attr(int argc, const char **argv, const char *prefix)
 		check[i].attr = a;
 	}
 
-	for (i = doubledash; i < argc; i++) {
-		int j;
-		if (git_checkattr(argv[i], cnt, check))
-			die("git_checkattr died");
-		for (j = 0; j < cnt; j++) {
-			const char *value = check[j].value;
-
-			if (ATTR_TRUE(value))
-				value = "set";
-			else if (ATTR_FALSE(value))
-				value = "unset";
-			else if (ATTR_UNSET(value))
-				value = "unspecified";
-
-			quote_c_style(argv[i], NULL, stdout, 0);
-			printf(": %s: %s\n", argv[j+1], value);
-		}
-	}
+	for (i = doubledash; i < argc; i++)
+		check_attr(cnt, check, argv+1, argv[i]);
+	maybe_flush_or_die(stdout, "attribute to stdout");
 	return 0;
 }
-- 
1.6.0.2.447.g3befd

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 2/2 v2] check-attr: Add --stdin-paths option
  2008-10-06  7:09         ` Johannes Sixt
  2008-10-07  0:14           ` [PATCH 1/2 v2] check-attr: add an internal check_attr() function Dmitry Potapov
@ 2008-10-07  0:16           ` Dmitry Potapov
  2008-10-08 15:24             ` Shawn O. Pearce
  2008-10-10 22:39             ` Paul Mackerras
  1 sibling, 2 replies; 44+ messages in thread
From: Dmitry Potapov @ 2008-10-07  0:16 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Alexander Gavrilov, git, Shawn O. Pearce, Paul Mackerras

This allows multiple paths to be specified on stdin.

Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
---
On Mon, Oct 06, 2008 at 09:09:11AM +0200, Johannes Sixt wrote:
> 
> The least that is needed is fflush(stdout) in this loop (after each
> iteration!)

Thanks. Somehow, I forgot about it though it is quite obvious.
I have added maybe_flush_or_die().

 Documentation/git-check-attr.txt |    4 ++
 builtin-check-attr.c             |   70 ++++++++++++++++++++++++++++++++------
 t/t0003-attributes.sh            |   17 +++++++++
 3 files changed, 80 insertions(+), 11 deletions(-)

diff --git a/Documentation/git-check-attr.txt b/Documentation/git-check-attr.txt
index 2b821f2..0839a57 100644
--- a/Documentation/git-check-attr.txt
+++ b/Documentation/git-check-attr.txt
@@ -9,6 +9,7 @@ git-check-attr - Display gitattributes information.
 SYNOPSIS
 --------
 'git check-attr' attr... [--] pathname...
+'git check-attr' --stdin-paths attr... < <list-of-paths
 
 DESCRIPTION
 -----------
@@ -17,6 +18,9 @@ For every pathname, this command will list if each attr is 'unspecified',
 
 OPTIONS
 -------
+--stdin-paths::
+	Read file names from stdin instead of from the command-line.
+
 \--::
 	Interpret all preceding arguments as attributes, and all following
 	arguments as path names. If not supplied, only the first argument will
diff --git a/builtin-check-attr.c b/builtin-check-attr.c
index 786256e..fa1e4d5 100644
--- a/builtin-check-attr.c
+++ b/builtin-check-attr.c
@@ -2,9 +2,19 @@
 #include "cache.h"
 #include "attr.h"
 #include "quote.h"
+#include "parse-options.h"
 
-static const char check_attr_usage[] =
-"git check-attr attr... [--] pathname...";
+static int stdin_paths;
+static const char * const check_attr_usage[] = {
+"git check-attr attr... [--] pathname...",
+"git check-attr --stdin-paths attr... < <list-of-paths>",
+NULL
+};
+
+static const struct option check_attr_options[] = {
+	OPT_BOOLEAN(0 , "stdin-paths", &stdin_paths, "read file names from stdin"),
+	OPT_END()
+};
 
 static void check_attr(int cnt, struct git_attr_check *check,
 	const char** name, const char *file)
@@ -27,17 +37,44 @@ static void check_attr(int cnt, struct git_attr_check *check,
 	}
 }
 
+static void check_attr_stdin_paths(int cnt, struct git_attr_check *check,
+	const char** name)
+{
+	struct strbuf buf, nbuf;
+
+	strbuf_init(&buf, 0);
+	strbuf_init(&nbuf, 0);
+	while (strbuf_getline(&buf, stdin, '\n') != EOF) {
+		if (buf.buf[0] == '"') {
+			strbuf_reset(&nbuf);
+			if (unquote_c_style(&nbuf, buf.buf, NULL))
+				die("line is badly quoted");
+			strbuf_swap(&buf, &nbuf);
+		}
+		check_attr(cnt, check, name, buf.buf);
+		maybe_flush_or_die(stdout, "attribute to stdout");
+	}
+	strbuf_release(&buf);
+	strbuf_release(&nbuf);
+}
+
 int cmd_check_attr(int argc, const char **argv, const char *prefix)
 {
 	struct git_attr_check *check;
 	int cnt, i, doubledash;
+	const char *errstr = NULL;
+
+	argc = parse_options(argc, argv, check_attr_options, check_attr_usage,
+		PARSE_OPT_KEEP_DASHDASH);
+	if (!argc)
+		usage_with_options(check_attr_usage, check_attr_options);
 
 	if (read_cache() < 0) {
 		die("invalid cache");
 	}
 
 	doubledash = -1;
-	for (i = 1; doubledash < 0 && i < argc; i++) {
+	for (i = 0; doubledash < 0 && i < argc; i++) {
 		if (!strcmp(argv[i], "--"))
 			doubledash = i;
 	}
@@ -45,26 +82,37 @@ int cmd_check_attr(int argc, const char **argv, const char *prefix)
 	/* If there is no double dash, we handle only one attribute */
 	if (doubledash < 0) {
 		cnt = 1;
-		doubledash = 1;
+		doubledash = 0;
 	} else
-		cnt = doubledash - 1;
+		cnt = doubledash;
 	doubledash++;
 
-	if (cnt <= 0 || argc < doubledash)
-		usage(check_attr_usage);
+	if (cnt <= 0)
+		errstr = "No attribute specified";
+	else if (stdin_paths && doubledash < argc)
+		errstr = "Can't specify files with --stdin-paths";
+	if (errstr) {
+		error (errstr);
+		usage_with_options(check_attr_usage, check_attr_options);
+	}
+
 	check = xcalloc(cnt, sizeof(*check));
 	for (i = 0; i < cnt; i++) {
 		const char *name;
 		struct git_attr *a;
-		name = argv[i + 1];
+		name = argv[i];
 		a = git_attr(name, strlen(name));
 		if (!a)
 			return error("%s: not a valid attribute name", name);
 		check[i].attr = a;
 	}
 
-	for (i = doubledash; i < argc; i++)
-		check_attr(cnt, check, argv+1, argv[i]);
-	maybe_flush_or_die(stdout, "attribute to stdout");
+	if (stdin_paths)
+		check_attr_stdin_paths(cnt, check, argv);
+	else {
+		for (i = doubledash; i < argc; i++)
+			check_attr(cnt, check, argv, argv[i]);
+		maybe_flush_or_die(stdout, "attribute to stdout");
+	}
 	return 0;
 }
diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh
index 3d8e06a..f6901b4 100755
--- a/t/t0003-attributes.sh
+++ b/t/t0003-attributes.sh
@@ -47,6 +47,23 @@ test_expect_success 'attribute test' '
 
 '
 
+test_expect_success 'attribute test: read paths from stdin' '
+
+	cat <<EOF > expect
+f: test: f
+a/f: test: f
+a/c/f: test: f
+a/g: test: a/g
+a/b/g: test: a/b/g
+b/g: test: unspecified
+a/b/h: test: a/b/h
+a/b/d/g: test: a/b/d/*
+EOF
+
+	sed -e "s/:.*//" < expect | git check-attr --stdin-paths test > actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'root subdir attribute test' '
 
 	attr_check a/i a/i &&
-- 
1.6.0.2.447.g3befd

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/2 v2] check-attr: Add --stdin-paths option
  2008-10-07  0:16           ` [PATCH 2/2 v2] check-attr: Add --stdin-paths option Dmitry Potapov
@ 2008-10-08 15:24             ` Shawn O. Pearce
  2008-10-12 14:19               ` Dmitry Potapov
  2008-10-10 22:39             ` Paul Mackerras
  1 sibling, 1 reply; 44+ messages in thread
From: Shawn O. Pearce @ 2008-10-08 15:24 UTC (permalink / raw)
  To: Dmitry Potapov; +Cc: Johannes Sixt, Alexander Gavrilov, git, Paul Mackerras

Dmitry Potapov <dpotapov@gmail.com> wrote:
> This allows multiple paths to be specified on stdin.

> diff --git a/Documentation/git-check-attr.txt b/Documentation/git-check-attr.txt
> index 2b821f2..0839a57 100644
> --- a/Documentation/git-check-attr.txt
> +++ b/Documentation/git-check-attr.txt
> @@ -9,6 +9,7 @@ git-check-attr - Display gitattributes information.
>  SYNOPSIS
>  --------
>  'git check-attr' attr... [--] pathname...
> +'git check-attr' --stdin-paths attr... < <list-of-paths

I wonder if the option should just be "--stdin".  And since its being
used mostly by automated tools (gitk/git-gui) I wonder if a -z should
also be supported for input termination with NUL instead of LF.

> diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh
> index 3d8e06a..f6901b4 100755
> --- a/t/t0003-attributes.sh
> +++ b/t/t0003-attributes.sh
> @@ -47,6 +47,23 @@ test_expect_success 'attribute test' '
>  
>  '
>  
> +test_expect_success 'attribute test: read paths from stdin' '

A test case for the quoting might also be good.

> +
> +	cat <<EOF > expect
> +f: test: f
> +a/f: test: f
> +a/c/f: test: f
> +a/g: test: a/g
> +a/b/g: test: a/b/g
> +b/g: test: unspecified
> +a/b/h: test: a/b/h
> +a/b/d/g: test: a/b/d/*
> +EOF
> +
> +	sed -e "s/:.*//" < expect | git check-attr --stdin-paths test > actual &&
> +	test_cmp expect actual

-- 
Shawn.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH (GIT-GUI,GITK) 0/8] Encoding support in GUI
  2008-10-01 11:35   ` [PATCH (GIT-GUI,GITK) 0/8] Encoding support in GUI Johannes Sixt
@ 2008-10-10 10:46     ` Paul Mackerras
  0 siblings, 0 replies; 44+ messages in thread
From: Paul Mackerras @ 2008-10-10 10:46 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Alexander Gavrilov, git, Shawn O. Pearce

Johannes Sixt writes:

> I'm about to add a .gitattributes file that specifies the encoding for
> some of my files *today* because I was not clever enough to anticipate the
> usefulness of an "encoding" attribute a year ago when those files were
> added to the repository. When I browse history, I *do* want that *today's*
> encoding is picked.

Fair enough.

> > (Also it seems that we won't
> > notice if the user changes .gitattributes after we've looked at it, or
> > if they create one after we've looked for one and not found it.)
> 
> This is not a show stopper, IMHO. The user will notice soon enough, and
> can restart gitk. Nobody sane will change the encoding attributes every hour.

The scenario I'm thinking of is this: a user clicks on a file to
display it, sees that it isn't displayed using the encoding he wants,
says "duh" and creates a .gitattributes file.  User clicks on the file
again, sees that it *still* isn't displayed using the right encoding,
and says something worse than "duh". :)

If it was just one .gitattributes file, then it wouldn't be hard to
stat it each time we go to display some file, and throw away our cache
if it has changed.  But it looks like we would have to do N+1 stats
for a file N levels deep in the tree (since it looks like we also need
to stat $GIT_DIR/info/attributes).  Still, it may be worth it.  At
least on Linux, stat is pretty fast.

Paul.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/2 v2] check-attr: Add --stdin-paths option
  2008-10-07  0:16           ` [PATCH 2/2 v2] check-attr: Add --stdin-paths option Dmitry Potapov
  2008-10-08 15:24             ` Shawn O. Pearce
@ 2008-10-10 22:39             ` Paul Mackerras
  2008-10-12 14:30               ` Dmitry Potapov
  1 sibling, 1 reply; 44+ messages in thread
From: Paul Mackerras @ 2008-10-10 22:39 UTC (permalink / raw)
  To: Dmitry Potapov; +Cc: Johannes Sixt, Alexander Gavrilov, git, Shawn O. Pearce

Dmitry Potapov writes:

> This allows multiple paths to be specified on stdin.
> 
> Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
> ---
> On Mon, Oct 06, 2008 at 09:09:11AM +0200, Johannes Sixt wrote:
> > 
> > The least that is needed is fflush(stdout) in this loop (after each
> > iteration!)
> 
> Thanks. Somehow, I forgot about it though it is quite obvious.
> I have added maybe_flush_or_die().

Actually, what was done with git diff-tree --stdin was to have it do
fflush(stdout) when it sees a blank line in the input.  That gives the
calling program a way to get the output up to that point without
having to do a flush for every line of output.

Paul.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/2 v2] check-attr: Add --stdin-paths option
  2008-10-08 15:24             ` Shawn O. Pearce
@ 2008-10-12 14:19               ` Dmitry Potapov
  2008-10-12 15:04                 ` Jakub Narebski
  0 siblings, 1 reply; 44+ messages in thread
From: Dmitry Potapov @ 2008-10-12 14:19 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Johannes Sixt, Alexander Gavrilov, git, Paul Mackerras

On Wed, Oct 08, 2008 at 08:24:43AM -0700, Shawn O. Pearce wrote:
> Dmitry Potapov <dpotapov@gmail.com> wrote:
> > This allows multiple paths to be specified on stdin.
> 
> > diff --git a/Documentation/git-check-attr.txt b/Documentation/git-check-attr.txt
> > index 2b821f2..0839a57 100644
> > --- a/Documentation/git-check-attr.txt
> > +++ b/Documentation/git-check-attr.txt
> > @@ -9,6 +9,7 @@ git-check-attr - Display gitattributes information.
> >  SYNOPSIS
> >  --------
> >  'git check-attr' attr... [--] pathname...
> > +'git check-attr' --stdin-paths attr... < <list-of-paths
> 
> I wonder if the option should just be "--stdin".

I used "--stdin-paths" because git hash-object uses it, while "--stdin"
means to read the object from standard input. OTOH, we are never going
to read the object from standard input in check-attr and some other git
commands use "--stdin" to mean: read the list of paths from the standard
input. So, I fully agree here.

> And since its being
> used mostly by automated tools (gitk/git-gui) I wonder if a -z should
> also be supported for input termination with NUL instead of LF.

I have added it, but after I did, I start to wonder whether it is the
right thing to do to unquote NUL terminated input line?

NUL terminated makes sense when you feed raw-bytes, and if the first
byte happen to be a quote character, I suppose it should be treated
just as any other byte, not as a sign that the string is quited. But
then I looked at git checkout-index, and it unquotes string even if it
is NUL terminated. I don't think it is the right thing to do, but just
to be consistent, I have decided to leave as-is, i.e. to unquote a NUL
terminated string.

> > +test_expect_success 'attribute test: read paths from stdin' '
> 
> A test case for the quoting might also be good.

As far as I can tell, there is no test case for special characters in
filenames when these filenames are given as arguments. And there are a
few problems with them. First, it is using colon as a separator in
output, which breaks parsing of a filename containing colons.  Second,
I still have not figured out how to specify filenames with special
characters in gitattributes. The documentation does not say anything
and was lazy to study the code. Does gitattributes understand quote
strings in filenames?


Anyway, here is interdiff to my previous patch, which addresses two
first points as I described above. (I can resend the full patch if
necessary).

-- >8 --
From: Dmitry Potapov <dpotapov@gmail.com>
Date: Sun, 12 Oct 2008 18:08:43 +0400
Subject: [PATCH] check-attr: Add --stdin option

---
 Documentation/git-check-attr.txt |    8 ++++++--
 builtin-check-attr.c             |   13 +++++++++----
 t/t0003-attributes.sh            |    2 +-
 3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-check-attr.txt b/Documentation/git-check-attr.txt
index 0839a57..14e4374 100644
--- a/Documentation/git-check-attr.txt
+++ b/Documentation/git-check-attr.txt
@@ -9,7 +9,7 @@ git-check-attr - Display gitattributes information.
 SYNOPSIS
 --------
 'git check-attr' attr... [--] pathname...
-'git check-attr' --stdin-paths attr... < <list-of-paths
+'git check-attr' --stdin [-z] attr... < <list-of-paths
 
 DESCRIPTION
 -----------
@@ -18,9 +18,13 @@ For every pathname, this command will list if each attr is 'unspecified',
 
 OPTIONS
 -------
---stdin-paths::
+--stdin::
 	Read file names from stdin instead of from the command-line.
 
+-z::
+	Only meaningful with `--stdin`; paths are separated with
+	NUL character instead of LF.
+
 \--::
 	Interpret all preceding arguments as attributes, and all following
 	arguments as path names. If not supplied, only the first argument will
diff --git a/builtin-check-attr.c b/builtin-check-attr.c
index fa1e4d5..02a8292 100644
--- a/builtin-check-attr.c
+++ b/builtin-check-attr.c
@@ -7,12 +7,16 @@
 static int stdin_paths;
 static const char * const check_attr_usage[] = {
 "git check-attr attr... [--] pathname...",
-"git check-attr --stdin-paths attr... < <list-of-paths>",
+"git check-attr --stdin attr... < <list-of-paths>",
 NULL
 };
 
+static int null_term_line;
+
 static const struct option check_attr_options[] = {
-	OPT_BOOLEAN(0 , "stdin-paths", &stdin_paths, "read file names from stdin"),
+	OPT_BOOLEAN(0 , "stdin", &stdin_paths, "read file names from stdin"),
+	OPT_BOOLEAN('z', NULL, &null_term_line,
+		"input paths are terminated by a null character"),
 	OPT_END()
 };
 
@@ -41,10 +45,11 @@ static void check_attr_stdin_paths(int cnt, struct git_attr_check *check,
 	const char** name)
 {
 	struct strbuf buf, nbuf;
+	int line_termination = null_term_line ? 0 : '\n';
 
 	strbuf_init(&buf, 0);
 	strbuf_init(&nbuf, 0);
-	while (strbuf_getline(&buf, stdin, '\n') != EOF) {
+	while (strbuf_getline(&buf, stdin, line_termination) != EOF) {
 		if (buf.buf[0] == '"') {
 			strbuf_reset(&nbuf);
 			if (unquote_c_style(&nbuf, buf.buf, NULL))
@@ -90,7 +95,7 @@ int cmd_check_attr(int argc, const char **argv, const char *prefix)
 	if (cnt <= 0)
 		errstr = "No attribute specified";
 	else if (stdin_paths && doubledash < argc)
-		errstr = "Can't specify files with --stdin-paths";
+		errstr = "Can't specify files with --stdin";
 	if (errstr) {
 		error (errstr);
 		usage_with_options(check_attr_usage, check_attr_options);
diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh
index f6901b4..1c77192 100755
--- a/t/t0003-attributes.sh
+++ b/t/t0003-attributes.sh
@@ -60,7 +60,7 @@ a/b/h: test: a/b/h
 a/b/d/g: test: a/b/d/*
 EOF
 
-	sed -e "s/:.*//" < expect | git check-attr --stdin-paths test > actual &&
+	sed -e "s/:.*//" < expect | git check-attr --stdin test > actual &&
 	test_cmp expect actual
 '
 
-- 
1.6.0.2.521.g49aa8.dirty

-- >8 --

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/2 v2] check-attr: Add --stdin-paths option
  2008-10-10 22:39             ` Paul Mackerras
@ 2008-10-12 14:30               ` Dmitry Potapov
  0 siblings, 0 replies; 44+ messages in thread
From: Dmitry Potapov @ 2008-10-12 14:30 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Johannes Sixt, Alexander Gavrilov, git, Shawn O. Pearce

On Sat, Oct 11, 2008 at 09:39:49AM +1100, Paul Mackerras wrote:
> 
> Actually, what was done with git diff-tree --stdin was to have it do
> fflush(stdout) when it sees a blank line in the input.  That gives the
> calling program a way to get the output up to that point without
> having to do a flush for every line of output.

It is an interesting idea, but I have not seen any other git command
doing so, and my measurement in git repo on Linux have not offered
any noticeable speed-up. So, I am not sure if it is worth it.

Dmitry

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/2 v2] check-attr: Add --stdin-paths option
  2008-10-12 14:19               ` Dmitry Potapov
@ 2008-10-12 15:04                 ` Jakub Narebski
  2008-10-12 16:35                   ` Dmitry Potapov
  0 siblings, 1 reply; 44+ messages in thread
From: Jakub Narebski @ 2008-10-12 15:04 UTC (permalink / raw)
  To: git

Dmitry Potapov wrote:

>> And since its being
>> used mostly by automated tools (gitk/git-gui) I wonder if a -z should
>> also be supported for input termination with NUL instead of LF.
> 
> I have added it, but after I did, I start to wonder whether it is the
> right thing to do to unquote NUL terminated input line?

I think that -z should be not quoted (like git-diff-tree or git-ls-tree
_output_); if it is, then IMHO it is a bug.

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/2 v2] check-attr: Add --stdin-paths option
  2008-10-12 15:04                 ` Jakub Narebski
@ 2008-10-12 16:35                   ` Dmitry Potapov
  0 siblings, 0 replies; 44+ messages in thread
From: Dmitry Potapov @ 2008-10-12 16:35 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Shawn O. Pearce

On Sun, Oct 12, 2008 at 05:04:21PM +0200, Jakub Narebski wrote:
> Dmitry Potapov wrote:
> 
> >> And since its being
> >> used mostly by automated tools (gitk/git-gui) I wonder if a -z should
> >> also be supported for input termination with NUL instead of LF.
> > 
> > I have added it, but after I did, I start to wonder whether it is the
> > right thing to do to unquote NUL terminated input line?
> 
> I think that -z should be not quoted (like git-diff-tree or git-ls-tree
> _output_); if it is, then IMHO it is a bug.

I am sorry. I was obviously wrong about 'git checkout-index', as it does
not try to unquote if '-z' is specified. So, my previous patch was wrong.
The corrected patch is below.

It still uses the same output format whether '-z' is specified or not.
Maybe I should change that too to use '\0' as the separator if '-z' is
specified?

Dmitry
PS Please, add me to 'To:' or 'Cc:' when you reply to my email.


-- >8 --
From: Dmitry Potapov <dpotapov@gmail.com>
Date: Sun, 12 Oct 2008 18:08:43 +0400
Subject: [PATCH] check-attr: Add --stdin option

---
This patch is meant to be squash on top of dp/checkattr in pu.

 Documentation/git-check-attr.txt |    8 ++++++--
 builtin-check-attr.c             |   15 ++++++++++-----
 t/t0003-attributes.sh            |    2 +-
 3 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-check-attr.txt b/Documentation/git-check-attr.txt
index 0839a57..14e4374 100644
--- a/Documentation/git-check-attr.txt
+++ b/Documentation/git-check-attr.txt
@@ -9,7 +9,7 @@ git-check-attr - Display gitattributes information.
 SYNOPSIS
 --------
 'git check-attr' attr... [--] pathname...
-'git check-attr' --stdin-paths attr... < <list-of-paths
+'git check-attr' --stdin [-z] attr... < <list-of-paths
 
 DESCRIPTION
 -----------
@@ -18,9 +18,13 @@ For every pathname, this command will list if each attr is 'unspecified',
 
 OPTIONS
 -------
---stdin-paths::
+--stdin::
 	Read file names from stdin instead of from the command-line.
 
+-z::
+	Only meaningful with `--stdin`; paths are separated with
+	NUL character instead of LF.
+
 \--::
 	Interpret all preceding arguments as attributes, and all following
 	arguments as path names. If not supplied, only the first argument will
diff --git a/builtin-check-attr.c b/builtin-check-attr.c
index fa1e4d5..4921341 100644
--- a/builtin-check-attr.c
+++ b/builtin-check-attr.c
@@ -7,12 +7,16 @@
 static int stdin_paths;
 static const char * const check_attr_usage[] = {
 "git check-attr attr... [--] pathname...",
-"git check-attr --stdin-paths attr... < <list-of-paths>",
+"git check-attr --stdin attr... < <list-of-paths>",
 NULL
 };
 
+static int null_term_line;
+
 static const struct option check_attr_options[] = {
-	OPT_BOOLEAN(0 , "stdin-paths", &stdin_paths, "read file names from stdin"),
+	OPT_BOOLEAN(0 , "stdin", &stdin_paths, "read file names from stdin"),
+	OPT_BOOLEAN('z', NULL, &null_term_line,
+		"input paths are terminated by a null character"),
 	OPT_END()
 };
 
@@ -41,11 +45,12 @@ static void check_attr_stdin_paths(int cnt, struct git_attr_check *check,
 	const char** name)
 {
 	struct strbuf buf, nbuf;
+	int line_termination = null_term_line ? 0 : '\n';
 
 	strbuf_init(&buf, 0);
 	strbuf_init(&nbuf, 0);
-	while (strbuf_getline(&buf, stdin, '\n') != EOF) {
-		if (buf.buf[0] == '"') {
+	while (strbuf_getline(&buf, stdin, line_termination) != EOF) {
+		if (line_termination && buf.buf[0] == '"') {
 			strbuf_reset(&nbuf);
 			if (unquote_c_style(&nbuf, buf.buf, NULL))
 				die("line is badly quoted");
@@ -90,7 +95,7 @@ int cmd_check_attr(int argc, const char **argv, const char *prefix)
 	if (cnt <= 0)
 		errstr = "No attribute specified";
 	else if (stdin_paths && doubledash < argc)
-		errstr = "Can't specify files with --stdin-paths";
+		errstr = "Can't specify files with --stdin";
 	if (errstr) {
 		error (errstr);
 		usage_with_options(check_attr_usage, check_attr_options);
diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh
index f6901b4..1c77192 100755
--- a/t/t0003-attributes.sh
+++ b/t/t0003-attributes.sh
@@ -60,7 +60,7 @@ a/b/h: test: a/b/h
 a/b/d/g: test: a/b/d/*
 EOF
 
-	sed -e "s/:.*//" < expect | git check-attr --stdin-paths test > actual &&
+	sed -e "s/:.*//" < expect | git check-attr --stdin test > actual &&
 	test_cmp expect actual
 '
 
-- 
1.6.0.2.521.g739d3

-- >8 --

^ permalink raw reply related	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2008-10-12 16:36 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-17 21:07 [PATCH (GIT-GUI,GITK) 0/8] Encoding support in GUI Alexander Gavrilov
2008-09-17 21:07 ` [PATCH (GIT-GUI,GITK) 1/8] git-gui: Cleanup handling of the default encoding Alexander Gavrilov
2008-09-17 21:07   ` [PATCH (GIT-GUI,GITK) 2/8] git-gui: Add a menu of available encodings Alexander Gavrilov
2008-09-17 21:07     ` [PATCH (GIT-GUI,GITK) 3/8] git-gui: Allow forcing display encoding for diffs using a submenu Alexander Gavrilov
2008-09-17 21:07       ` [PATCH (GIT-GUI,GITK) 4/8] git-gui: Optimize encoding name resolution using a lookup table Alexander Gavrilov
2008-09-17 21:07         ` [PATCH (GIT-GUI,GITK) 5/8] git-gui: Support the encoding menu in gui blame Alexander Gavrilov
2008-09-17 21:07           ` [PATCH (GIT-GUI,GITK) 6/8] gitk: Port new encoding logic from git-gui Alexander Gavrilov
2008-09-17 21:07             ` [PATCH (GIT-GUI,GITK) 7/8] gitk: Implement file contents encoding support Alexander Gavrilov
2008-09-17 21:07               ` [PATCH (GIT-GUI,GITK) 8/8] gitk: Support filenames in the locale encoding Alexander Gavrilov
2008-09-19 12:10             ` [PATCH (GIT-GUI,GITK) 6/8] gitk: Port new encoding logic from git-gui Johannes Sixt
2008-09-19 12:38               ` Alexander Gavrilov
2008-09-19 13:04                 ` Johannes Sixt
2008-09-21 18:52                   ` Alexander Gavrilov
2008-09-22  7:25                     ` Johannes Sixt
2008-09-22  7:46                       ` Johannes Sixt
2008-09-22  8:01                       ` Alexander Gavrilov
2008-09-22  8:20                         ` Johannes Sixt
2008-09-22  9:02                           ` Alexander Gavrilov
2008-09-22  9:18                             ` Johannes Sixt
2008-09-22 10:18                               ` Alexander Gavrilov
2008-09-22  9:01                     ` Dmitry Potapov
2008-09-18 15:02   ` [PATCH (GIT-GUI,GITK) 1/8] git-gui: Cleanup handling of the default encoding Dmitry Potapov
2008-09-18 15:14     ` Alexander Gavrilov
2008-09-18 16:29     ` Johannes Sixt
2008-09-18 16:50       ` Dmitry Potapov
2008-09-18 17:00         ` Alexander Gavrilov
2008-09-18 17:19           ` Dmitry Potapov
2008-09-17 21:45 ` [PATCH (GIT-GUI,GITK) 0/8] Encoding support in GUI Paul Mackerras
2008-09-18 11:12   ` Alexander Gavrilov
2008-09-21 22:55 ` Paul Mackerras
2008-09-22 10:12   ` Alexander Gavrilov
2008-10-05  2:30     ` [PATCH 1/2] check-attr: add an internal check_attr() function Dmitry Potapov
2008-10-05  2:30       ` [PATCH 2/2] check-attr: Add --stdin-paths option Dmitry Potapov
2008-10-06  7:09         ` Johannes Sixt
2008-10-07  0:14           ` [PATCH 1/2 v2] check-attr: add an internal check_attr() function Dmitry Potapov
2008-10-07  0:16           ` [PATCH 2/2 v2] check-attr: Add --stdin-paths option Dmitry Potapov
2008-10-08 15:24             ` Shawn O. Pearce
2008-10-12 14:19               ` Dmitry Potapov
2008-10-12 15:04                 ` Jakub Narebski
2008-10-12 16:35                   ` Dmitry Potapov
2008-10-10 22:39             ` Paul Mackerras
2008-10-12 14:30               ` Dmitry Potapov
2008-10-01 11:35   ` [PATCH (GIT-GUI,GITK) 0/8] Encoding support in GUI Johannes Sixt
2008-10-10 10:46     ` Paul Mackerras

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).