git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Clifford Caoile" <piyo@users.sourceforge.net>
To: "Karl Hasselström" <kha@treskal.com>,
	"David Kågedal" <davidk@lysator.liu.se>
Cc: git@vger.kernel.org, "Junio C. Hamano" <gitster@pobox.com>
Subject: Re: encoding bug in git.el
Date: Wed, 21 May 2008 23:08:09 +0900	[thread overview]
Message-ID: <1f748ec60805210708q34a26bebh915037713caa9a87@mail.gmail.com> (raw)
In-Reply-To: <87mymkbo9x.fsf@lysator.liu.se>

Hi:

On Wed, May 21, 2008 at 7:31 AM, David Kågedal <davidk@lysator.liu.se> wrote:
> Karl Hasselström <kha@treskal.com> writes:
>
>> Recently, some commits started misrecording the "ö" in my name. (In
>> emacs, for example, it looks like this in a utf8 buffer:
>> Hasselstr\201\366m.) I'm guessing there's an extra latin1->utf8
>> conversion in there somewhere.
>
> The \201 looks more like Emacs' internal mule encoding, where
> everything that isn't ASCII is prefixed with \201 or something
> similar.

Thanks for reporting this.

I concur. This is not UTF-8 translation, but an emacs MULE encoding. I
suspect the U+F6 character is read in to the *git-commit* buffer in
latin-1 mode because git.el displays the Author line, then Emacs
writes that out as 0x81F6, because that is the emacs buffer code of
U+F6.

This is because git.el, upon git-commit-tree, always redefines the
environment variables like GIT_AUTHOR_NAME. However the difference is
that prior to commit dbe482, "env" handle the encoding while commit
dbe482 lets emacs process-environment handle it. Unfortunately the
string is passed without the proper recoding in the latter case.

Here is a proposed fix. I suggest that process-environment should be
given these envvars already encoded as shown in this code sample:

------------------ git.el ------------------
[not a proper git-diff]
@@ -216,6 +216,11 @@ and `git-diff-setup-hook'."
   "Build a list of NAME=VALUE strings from a list of environment strings."
   (mapcar (lambda (entry) (concat (car entry) "=" (cdr entry))) env))

+(defun git-get-env-strings-encoded (env encoding)
+  "Build a list of NAME=VALUE strings from a list of environment strings,
+converting from mule-encoding to ENCODING (e.g. mule-utf-8, latin-1, etc)."
+  (mapcar (lambda (entry) (concat (car entry) "="
(encode-coding-string (cdr entry) encoding))) env))
+
 (defun git-call-process-env (buffer env &rest args)
   "Wrapper for call-process that sets environment strings."
   (let ((process-environment (append (git-get-env-strings env)
@@ -265,7 +270,7 @@ and returns the process output as a string, or nil
if the git failed."

 (defun git-run-command-region (buffer start end env &rest args)
   "Run a git command with specified buffer region as input."
-  (unless (eq 0 (let ((process-environment (append (git-get-env-strings env)
+  (unless (eq 0 (let ((process-environment (append
(git-get-env-strings-encoded env coding-system-for-write)
                                                    process-environment)))
                   (git-run-process-region
                    buffer start end "git" args)))

The buffer text is saved with the encoding coding-system-for-write,
while the GIT_* envvars were not encoded, so when appending to
process-environment variable, use the same encoding.

(Reminder: the *git-commit* buffer's encoding is based on the git
config i18n.commitencoding, which in turn sets
buffer-file-coding-system, which in turn sets coding-system-for-write)

I tested this with U+F6 in the GIT_AUTHOR_NAME, git config user.name,
and the commit text, and it seems to work better (I think it's fixed).
Please review it. Also, I am not sure if this fix needs to be
propagated to the other areas where process-environment is redefined,
so YMMV.

(Lastly, while testing this for Japanese, I'm having some encoding
problem with meadow (Emacs on Windows), msysgit (git on Windows),
set-language-mode Japanese, utf-8, and M-x git-commit-file but I don't
think its related to this exact problem. Hopefully.)

>> It turns out that the breakage occurs when I commit with the
>> git-status mode from git.el, and it was introduced by this commit:
>>
>>   commit dbe48256b41c1e94d81f2458d7e84b1fdcb47026
>>   Author: Clifford Caoile <piyo@users.sourceforge.net>
>>
>>       git.el: Set process-environment instead of invoking env

:-)

This must be the reason why process-environment wasn't used in all places.

>> It's in master, but not yet in maint. (In fact, it's the _only_ change
>> to contrib/emacs that's in master but not in maint.)

Please forgive my ignorance, but what does this mean?

Best regards,
Clifford Caoile

  parent reply	other threads:[~2008-05-21 14:09 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-20 22:09 encoding bug in git.el Karl Hasselström
     [not found] ` <87mymkbo9x.fsf@lysator.liu.se>
2008-05-21 14:08   ` Clifford Caoile [this message]
2008-05-21 14:54     ` Karl Hasselström
2008-05-21 21:31       ` Clifford Caoile
2008-05-23  7:09         ` Karl Hasselström
2008-05-25 13:42     ` Karl Hasselström
2008-05-30 12:28       ` Karl Hasselström
2008-05-30 20:27         ` Junio C Hamano
2008-06-02 22:41           ` [PATCH] Revert "git.el: Set process-environment instead of invoking env" Karl Hasselström
2008-06-03 15:54             ` David Christensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1f748ec60805210708q34a26bebh915037713caa9a87@mail.gmail.com \
    --to=piyo@users.sourceforge.net \
    --cc=davidk@lysator.liu.se \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=kha@treskal.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).