git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC/PATCH] git-gui: Use gitattribute "encoding" for file content display
@ 2008-01-23  5:47 Shawn O. Pearce
  2008-01-23  5:55 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Shawn O. Pearce @ 2008-01-23  5:47 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Johannes Schindelin, Jakub Narebski,
	Finn Arne Gangstad, Michele Ballabio

I've got the following change in my "pu" right now and am considering
adding it to git-gui 0.9.2, which would be in git 1.5.4.

I've CC'd a number of people who have emailed me in the past
about git-gui's diff or blame failing to display a non US-ASCII
file content correctly and I am interested to hear if this would
resolve the issue for you.  Its configurable on a per-path basis
by an "encoding" attribute in .gitattributes (see git-gui's own
example below).

If we go this route we'll also want to have core Git document in
its gitattributes manpage what this "encoding" attribute is for...


--8>--
git-gui: Use gitattribute "encoding" for file content display

Most folks using git-gui on internationalized files have complained
that it doesn't recognize UTF-8 correctly.  In the past we have just
ignored the problem and showed the file contents as binary/US-ASCII,
which is wrong no matter how you look at it.

This really should be a per-file attribute, managed by .gitattributes,
so we now pull the "encoding" attribute data for the given path from
the .gitattributes (if available) and use that, falling back to UTF-8
if the attributes are unavailable, git-check-attr is broken, or an
encoding for this path not specified.

We apply the encoding anytime we show file content, which currently
is limited to only the diff viewer and the blame viewer.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .gitattributes |    3 +++
 git-gui.sh     |   13 +++++++++++++
 lib/blame.tcl  |    5 ++++-
 lib/diff.tcl   |    9 ++++++---
 4 files changed, 26 insertions(+), 4 deletions(-)
 create mode 100644 .gitattributes

diff --git a/.gitattributes b/.gitattributes
new file mode 100644
index 0000000..f96112d
--- /dev/null
+++ b/.gitattributes
@@ -0,0 +1,3 @@
+*           encoding=US-ASCII
+git-gui.sh  encoding=UTF-8
+/po/*.po    encoding=UTF-8
diff --git a/git-gui.sh b/git-gui.sh
index f42e461..adc25d0 100755
--- a/git-gui.sh
+++ b/git-gui.sh
@@ -466,6 +466,19 @@ proc githook_read {hook_name args} {
 	return {}
 }
 
+proc gitattr {path attr default} {
+	if {[catch {set r [git check-attr $attr -- $path]}]} {
+		set r unspecified
+	} else {
+		set r [join [lrange [split $r :] 2 end] :]
+		regsub {^ } $r {} r
+	}
+	if {$r eq {unspecified}} {
+		return $default
+	}
+	return $r
+}
+
 proc sq {value} {
 	regsub -all ' $value "'\\''" value
 	return "'$value'"
diff --git a/lib/blame.tcl b/lib/blame.tcl
index 00ecf21..f33d48f 100644
--- a/lib/blame.tcl
+++ b/lib/blame.tcl
@@ -374,7 +374,10 @@ method _load {jump} {
 	} else {
 		set fd [git_read cat-file blob "$commit:$path"]
 	}
-	fconfigure $fd -blocking 0 -translation lf -encoding binary
+	fconfigure $fd \
+		-blocking 0 \
+		-translation lf \
+		-encoding [tcl_encoding [gitattr $path encoding UTF-8]]
 	fileevent $fd readable [cb _read_file $fd $jump]
 	set current_fd $fd
 }
diff --git a/lib/diff.tcl b/lib/diff.tcl
index d04f6db..0f030e3 100644
--- a/lib/diff.tcl
+++ b/lib/diff.tcl
@@ -98,8 +98,11 @@ proc show_diff {path w {lno {}}} {
 					set sz [string length $content]
 				}
 				file {
+					set enc [gitattr $path encoding UTF-8]
 					set fd [open $path r]
-					fconfigure $fd -eofchar {}
+					fconfigure $fd \
+						-eofchar {} \
+						-encoding [tcl_encoding $enc]
 					set content [read $fd $max_sz]
 					close $fd
 					set sz [file size $path]
@@ -188,8 +191,8 @@ proc show_diff {path w {lno {}}} {
 
 	fconfigure $fd \
 		-blocking 0 \
-		-encoding binary \
-		-translation binary
+		-encoding [tcl_encoding [gitattr $path encoding UTF-8]] \
+		-translation lf
 	fileevent $fd readable [list read_diff $fd]
 }
 
-- 
1.5.4.rc4.1130.g9ad85

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] git-gui: Use gitattribute "encoding" for file content display
  2008-01-23  5:47 [RFC/PATCH] git-gui: Use gitattribute "encoding" for file content display Shawn O. Pearce
@ 2008-01-23  5:55 ` Junio C Hamano
  2008-01-23  8:41   ` Steffen Prohaska
  2008-01-24  3:36   ` Shawn O. Pearce
  2008-01-23  7:02 ` Pedro Melo
       [not found] ` <4FF40048-FCF4-4BAD-AD08-6ADAD30E7B6A@simplicidade.org>
  2 siblings, 2 replies; 7+ messages in thread
From: Junio C Hamano @ 2008-01-23  5:55 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: git, Johannes Schindelin, Jakub Narebski, Finn Arne Gangstad,
	Michele Ballabio

"Shawn O. Pearce" <spearce@spearce.org> writes:

> git-gui: Use gitattribute "encoding" for file content display
>
> Most folks using git-gui on internationalized files have complained
> that it doesn't recognize UTF-8 correctly.  In the past we have just
> ignored the problem and showed the file contents as binary/US-ASCII,
> which is wrong no matter how you look at it.

Hmmm.

At least for now in 1.5.4, I'd prefer the way gitk shows UTF-8
(if I recall correctly latin-1 or other legacy encoding, as long
as LANG/LC_* is given appropriately, as well) contents without
per-path configuration without introducing new attributes.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] git-gui: Use gitattribute "encoding" for file content display
  2008-01-23  5:47 [RFC/PATCH] git-gui: Use gitattribute "encoding" for file content display Shawn O. Pearce
  2008-01-23  5:55 ` Junio C Hamano
@ 2008-01-23  7:02 ` Pedro Melo
       [not found] ` <4FF40048-FCF4-4BAD-AD08-6ADAD30E7B6A@simplicidade.org>
  2 siblings, 0 replies; 7+ messages in thread
From: Pedro Melo @ 2008-01-23  7:02 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: git, Junio C Hamano, Johannes Schindelin, Jakub Narebski,
	Finn Arne Gangstad, Michele Ballabio

Hi,

On Jan 23, 2008, at 5:47 AM, Shawn O. Pearce wrote:

> I've got the following change in my "pu" right now and am considering
> adding it to git-gui 0.9.2, which would be in git 1.5.4.
>
> I've CC'd a number of people who have emailed me in the past
> about git-gui's diff or blame failing to display a non US-ASCII
> file content correctly and I am interested to hear if this would
> resolve the issue for you.  Its configurable on a per-path basis
> by an "encoding" attribute in .gitattributes (see git-gui's own
> example below).

This solves the problem for me.

The diff display correctly display utf-8 characters.

Best regards,
-- 
Pedro Melo
Blog: http://www.simplicidade.org/notes/
XMPP ID: melo@simplicidade.org
Use XMPP!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] git-gui: Use gitattribute "encoding" for file content display
  2008-01-23  5:55 ` Junio C Hamano
@ 2008-01-23  8:41   ` Steffen Prohaska
  2008-01-23 10:28     ` Jakub Narebski
  2008-01-24  3:36   ` Shawn O. Pearce
  1 sibling, 1 reply; 7+ messages in thread
From: Steffen Prohaska @ 2008-01-23  8:41 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Git Mailing List, Johannes Schindelin, Jakub Narebski,
	Finn Arne Gangstad, Junio C Hamano, Michele Ballabio


On Jan 23, 2008, at 6:55 AM, Junio C Hamano wrote:

> "Shawn O. Pearce" <spearce@spearce.org> writes:
>
>> git-gui: Use gitattribute "encoding" for file content display
>>
>> Most folks using git-gui on internationalized files have complained
>> that it doesn't recognize UTF-8 correctly.  In the past we have just
>> ignored the problem and showed the file contents as binary/US-ASCII,
>> which is wrong no matter how you look at it.
>
> Hmmm.
>
> At least for now in 1.5.4, I'd prefer the way gitk shows UTF-8
> (if I recall correctly latin-1 or other legacy encoding, as long
> as LANG/LC_* is given appropriately, as well) contents without
> per-path configuration without introducing new attributes.

Shouldn't we first try harder to get things right without adding
an attribute?  Maybe we could continue a good tradition and look
at the content of the first: we could first look for hints in the
file about the encoding.  XML and many text files contain such
hints already to help editors.  For example,  Python source can
explicitly contain the encoding [1]; and I guess there are many
other examples.  If we don't find a direct hint, we could have
some magic auto-detection similar to what we do for autocrlf.  As
a fallback the user could specify a default encoding.  But only
as a last resort, I'd use explicit attributes.

[1] http://www.python.org/dev/peps/pep-0263/

	Steffen

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] git-gui: Use gitattribute "encoding" for file content display
  2008-01-23  8:41   ` Steffen Prohaska
@ 2008-01-23 10:28     ` Jakub Narebski
  0 siblings, 0 replies; 7+ messages in thread
From: Jakub Narebski @ 2008-01-23 10:28 UTC (permalink / raw)
  To: Steffen Prohaska
  Cc: Shawn O. Pearce, Git Mailing List, Johannes Schindelin,
	Finn Arne Gangstad, Junio C Hamano, Michele Ballabio

On Wed. 23 Jan 2008, Steffen Prohaska wrote:
> On Jan 23, 2008, at 6:55 AM, Junio C Hamano wrote:
>> "Shawn O. Pearce" <spearce@spearce.org> writes:
>>
>>> git-gui: Use gitattribute "encoding" for file content display
>>>
>>> Most folks using git-gui on internationalized files have complained
>>> that it doesn't recognize UTF-8 correctly.  In the past we have just
>>> ignored the problem and showed the file contents as binary/US-ASCII,
>>> which is wrong no matter how you look at it.
>>
>> Hmmm.
>>
>> At least for now in 1.5.4, I'd prefer the way gitk shows UTF-8
>> (if I recall correctly latin-1 or other legacy encoding, as long
>> as LANG/LC_* is given appropriately, as well) contents without
>> per-path configuration without introducing new attributes.
> 
> Shouldn't we first try harder to get things right without adding
> an attribute?  Maybe we could continue a good tradition and look
> at the content of the first: we could first look for hints in the
> file about the encoding.  XML and many text files contain such
> hints already to help editors.  For example,  Python source can
> explicitly contain the encoding [1]; and I guess there are many
> other examples.

For example LaTeX files either use inputenc package to set encoding
(e.g. \usepackage[latin2]{inputenc}) or use magic first line to
specify TCX (TeX character translation) file 
(e.g. %& -translate-file=il2-t1).

Emacs encourages to use file variables, either in the form of magic
first line, or file variables at the end of file; I think the same
is true for Vim.


I'd like then for it to be at least as configurable as diff.*.funcname 
is for diff.

> If we don't find a direct hint, we could have 
> some magic auto-detection similar to what we do for autocrlf.

We can at least try to and check for UTF-16 magic first two bytes, and 
detect if we have character which is invalid in UTF-8 (for performance 
I guess checking only beginning of file)... 

> As a fallback the user could specify a default encoding.  But only
> as a last resort, I'd use explicit attributes.

...and then falling back to fallback encoding, like gitweb does.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] git-gui: Use gitattribute "encoding" for file content display
  2008-01-23  5:55 ` Junio C Hamano
  2008-01-23  8:41   ` Steffen Prohaska
@ 2008-01-24  3:36   ` Shawn O. Pearce
  1 sibling, 0 replies; 7+ messages in thread
From: Shawn O. Pearce @ 2008-01-24  3:36 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Johannes Schindelin, Jakub Narebski, Finn Arne Gangstad,
	Michele Ballabio

Junio C Hamano <gitster@pobox.com> wrote:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
> 
> > git-gui: Use gitattribute "encoding" for file content display
> >
> > Most folks using git-gui on internationalized files have complained
> > that it doesn't recognize UTF-8 correctly.  In the past we have just
> > ignored the problem and showed the file contents as binary/US-ASCII,
> > which is wrong no matter how you look at it.
> 
> Hmmm.
> 
> At least for now in 1.5.4, I'd prefer the way gitk shows UTF-8
> (if I recall correctly latin-1 or other legacy encoding, as long
> as LANG/LC_* is given appropriately, as well) contents without
> per-path configuration without introducing new attributes.

Hmm.  I'll try to rework something along those lines for 1.5.4 then.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] git-gui: Use gitattribute "encoding" for file content display
       [not found] ` <4FF40048-FCF4-4BAD-AD08-6ADAD30E7B6A@simplicidade.org>
@ 2008-01-24  3:39   ` Shawn O. Pearce
  0 siblings, 0 replies; 7+ messages in thread
From: Shawn O. Pearce @ 2008-01-24  3:39 UTC (permalink / raw)
  To: Pedro Melo
  Cc: git, Junio C Hamano, Johannes Schindelin, Jakub Narebski,
	Finn Arne Gangstad, Michele Ballabio

Pedro Melo <melo@simplicidade.org> wrote:
> On Jan 23, 2008, at 5:47 AM, Shawn O. Pearce wrote:
> 
> >I've got the following change in my "pu" right now and am considering
> >adding it to git-gui 0.9.2, which would be in git 1.5.4.
> 
> Spoke too soon.
> 
> I can see the UTF-8 chars in the diffs, but if I use Stage Hunk for  
> Commit in a hunk with accentuated utf-8 chars, git-gui will throw an  
> error like "patch does not apply".

Yea, I know why.  It actually kept me awake for a little bit last
night after I sent the patch out for comment, but wasn't big enough
to get me out of bed to fix it.

The issue is git-gui doesn't convert back from character data to
binary correctly, so the patch won't apply, as the context will
not match.  Its simple enough to fix, we just have to remember
the encoding we used to go from git-diff --> Tcl and use that
same encoding in the other direction (Tcl --> git-apply).

-- 
Shawn.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-01-24  3:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-23  5:47 [RFC/PATCH] git-gui: Use gitattribute "encoding" for file content display Shawn O. Pearce
2008-01-23  5:55 ` Junio C Hamano
2008-01-23  8:41   ` Steffen Prohaska
2008-01-23 10:28     ` Jakub Narebski
2008-01-24  3:36   ` Shawn O. Pearce
2008-01-23  7:02 ` Pedro Melo
     [not found] ` <4FF40048-FCF4-4BAD-AD08-6ADAD30E7B6A@simplicidade.org>
2008-01-24  3:39   ` Shawn O. Pearce

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).