git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH (GITK)] gitk: Fix commit encoding support.
@ 2008-11-09 15:06 Alexander Gavrilov
  2008-11-10 11:46 ` Paul Mackerras
  0 siblings, 1 reply; 4+ messages in thread
From: Alexander Gavrilov @ 2008-11-09 15:06 UTC (permalink / raw)
  To: git; +Cc: Paul Mackerras

This commit fixes two problems with commit encodings:

1) git-log actually uses i18n.logoutputencoding to generate
   its output, and falls back to i18n.commitencoding only
   when that option is not set. Thus, gitk should use its
   value to read the results, if available.

2) The readcommit function did not process encodings at all.
   This led to randomly appearing misconverted commits if
   the commit encoding differed from the current locale.

Now commit messages should be displayed correctly, except
when logoutputencoding is set to an encoding that cannot
represent charecters in the message. For example, it is
impossible to convert Japanese characters from Shift-JIS
to CP-1251 (although the reverse conversion works).

Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com>
---
 gitk |   25 +++++++++++++++++++++++--
 1 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/gitk b/gitk
index ae775b1..3834fc0 100755
--- a/gitk
+++ b/gitk
@@ -1555,9 +1555,27 @@ proc chewcommits {} {
     return 0
 }
 
+proc do_readcommit {id} {
+    global tclencoding
+
+    # Invoke git-log to handle automatic encoding conversion
+    set fd [open [concat | git log --no-color --pretty=raw -1 $id] r]
+    # Read the results using i18n.logoutputencoding
+    fconfigure $fd -translation lf -eofchar {}
+    if {$tclencoding != {}} {
+	fconfigure $fd -encoding $tclencoding
+    }
+    set contents [read $fd]
+    close $fd
+    # Remove the heading line
+    regsub {^commit [0-9a-f]+\n} $contents {} contents
+
+    return $contents
+}
+
 proc readcommit {id} {
-    if {[catch {set contents [exec git cat-file commit $id]}]} return
-    parsecommit $id $contents 0
+    if {[catch {set contents [do_readcommit $id]}]} return
+    parsecommit $id $contents 1
 }
 
 proc parsecommit {id contents listed} {
@@ -10565,6 +10583,9 @@ set gitencoding {}
 catch {
     set gitencoding [exec git config --get i18n.commitencoding]
 }
+catch {
+    set gitencoding [exec git config --get i18n.logoutputencoding]
+}
 if {$gitencoding == ""} {
     set gitencoding "utf-8"
 }
-- 
1.6.0.3.15.gb8d36

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH (GITK)] gitk: Fix commit encoding support.
  2008-11-09 15:06 [PATCH (GITK)] gitk: Fix commit encoding support Alexander Gavrilov
@ 2008-11-10 11:46 ` Paul Mackerras
  2008-11-10 12:06   ` Alexander Gavrilov
  0 siblings, 1 reply; 4+ messages in thread
From: Paul Mackerras @ 2008-11-10 11:46 UTC (permalink / raw)
  To: Alexander Gavrilov; +Cc: git

Alexander Gavrilov writes:

> +proc do_readcommit {id} {
> +    global tclencoding
> +
> +    # Invoke git-log to handle automatic encoding conversion
> +    set fd [open [concat | git log --no-color --pretty=raw -1 $id] r]
> +    # Read the results using i18n.logoutputencoding
> +    fconfigure $fd -translation lf -eofchar {}
> +    if {$tclencoding != {}} {
> +	fconfigure $fd -encoding $tclencoding

Does this mean there are two conversions going on, one inside git log
and another inside Tcl?  Is there a reason why it's better to do two
conversions than one, or is it just more convenient that way?

Would an alternative approach have been to read the output of git
cat-file with -translation binary, look for an encoding header, and do
an encoding convertfrom based on the encoding header?  What would be
the disadvantage of such an approach?

Paul.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH (GITK)] gitk: Fix commit encoding support.
  2008-11-10 11:46 ` Paul Mackerras
@ 2008-11-10 12:06   ` Alexander Gavrilov
  2008-11-13 11:42     ` Paul Mackerras
  0 siblings, 1 reply; 4+ messages in thread
From: Alexander Gavrilov @ 2008-11-10 12:06 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: git

On Mon, Nov 10, 2008 at 2:46 PM, Paul Mackerras <paulus@samba.org> wrote:
> Alexander Gavrilov writes:
>
>> +proc do_readcommit {id} {
>> +    global tclencoding
>> +
>> +    # Invoke git-log to handle automatic encoding conversion
>> +    set fd [open [concat | git log --no-color --pretty=raw -1 $id] r]
>> +    # Read the results using i18n.logoutputencoding
>> +    fconfigure $fd -translation lf -eofchar {}
>> +    if {$tclencoding != {}} {
>> +     fconfigure $fd -encoding $tclencoding
>
> Does this mean there are two conversions going on, one inside git log
> and another inside Tcl?  Is there a reason why it's better to do two
> conversions than one, or is it just more convenient that way?

That makes the processing flow uniform with the usual code path.

> Would an alternative approach have been to read the output of git
> cat-file with -translation binary, look for an encoding header, and do
> an encoding convertfrom based on the encoding header?  What would be
> the disadvantage of such an approach?

If all commits were loaded through cat-file, that would be the way to
go. Otherwise, when one code path uses one method of conversion, and
another one, which is used rarely and semi-randomly, a different
method, it may lead to confusing results if something goes slightly
wrong.

Alexander

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH (GITK)] gitk: Fix commit encoding support.
  2008-11-10 12:06   ` Alexander Gavrilov
@ 2008-11-13 11:42     ` Paul Mackerras
  0 siblings, 0 replies; 4+ messages in thread
From: Paul Mackerras @ 2008-11-13 11:42 UTC (permalink / raw)
  To: Alexander Gavrilov; +Cc: git

Alexander Gavrilov writes:

> If all commits were loaded through cat-file, that would be the way to
> go. Otherwise, when one code path uses one method of conversion, and
> another one, which is used rarely and semi-randomly, a different
> method, it may lead to confusing results if something goes slightly
> wrong.

OK, that makes sense.  I applied the patch with a paragraph added to
the description that explains that.

Thanks,
Paul.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-11-13 11:44 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-09 15:06 [PATCH (GITK)] gitk: Fix commit encoding support Alexander Gavrilov
2008-11-10 11:46 ` Paul Mackerras
2008-11-10 12:06   ` Alexander Gavrilov
2008-11-13 11:42     ` Paul Mackerras

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).