* encoding bug in git.el @ 2008-05-20 22:09 Karl Hasselström [not found] ` <87mymkbo9x.fsf@lysator.liu.se> 0 siblings, 1 reply; 10+ messages in thread From: Karl Hasselström @ 2008-05-20 22:09 UTC (permalink / raw) To: Clifford Caoile; +Cc: Junio C Hamano, David Kågedal, git Recently, some commits started misrecording the "ö" in my name. (In emacs, for example, it looks like this in a utf8 buffer: Hasselstr\201\366m.) I'm guessing there's an extra latin1->utf8 conversion in there somewhere. It turns out that the breakage occurs when I commit with the git-status mode from git.el, and it was introduced by this commit: commit dbe48256b41c1e94d81f2458d7e84b1fdcb47026 Author: Clifford Caoile <piyo@users.sourceforge.net> git.el: Set process-environment instead of invoking env It's in master, but not yet in maint. (In fact, it's the _only_ change to contrib/emacs that's in master but not in maint.) -- Karl Hasselström, kha@treskal.com www.treskal.com/kalle ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <87mymkbo9x.fsf@lysator.liu.se>]
* Re: encoding bug in git.el [not found] ` <87mymkbo9x.fsf@lysator.liu.se> @ 2008-05-21 14:08 ` Clifford Caoile 2008-05-21 14:54 ` Karl Hasselström 2008-05-25 13:42 ` Karl Hasselström 0 siblings, 2 replies; 10+ messages in thread From: Clifford Caoile @ 2008-05-21 14:08 UTC (permalink / raw) To: Karl Hasselström, David Kågedal; +Cc: git, Junio C. Hamano Hi: On Wed, May 21, 2008 at 7:31 AM, David Kågedal <davidk@lysator.liu.se> wrote: > Karl Hasselström <kha@treskal.com> writes: > >> Recently, some commits started misrecording the "ö" in my name. (In >> emacs, for example, it looks like this in a utf8 buffer: >> Hasselstr\201\366m.) I'm guessing there's an extra latin1->utf8 >> conversion in there somewhere. > > The \201 looks more like Emacs' internal mule encoding, where > everything that isn't ASCII is prefixed with \201 or something > similar. Thanks for reporting this. I concur. This is not UTF-8 translation, but an emacs MULE encoding. I suspect the U+F6 character is read in to the *git-commit* buffer in latin-1 mode because git.el displays the Author line, then Emacs writes that out as 0x81F6, because that is the emacs buffer code of U+F6. This is because git.el, upon git-commit-tree, always redefines the environment variables like GIT_AUTHOR_NAME. However the difference is that prior to commit dbe482, "env" handle the encoding while commit dbe482 lets emacs process-environment handle it. Unfortunately the string is passed without the proper recoding in the latter case. Here is a proposed fix. I suggest that process-environment should be given these envvars already encoded as shown in this code sample: ------------------ git.el ------------------ [not a proper git-diff] @@ -216,6 +216,11 @@ and `git-diff-setup-hook'." "Build a list of NAME=VALUE strings from a list of environment strings." (mapcar (lambda (entry) (concat (car entry) "=" (cdr entry))) env)) +(defun git-get-env-strings-encoded (env encoding) + "Build a list of NAME=VALUE strings from a list of environment strings, +converting from mule-encoding to ENCODING (e.g. mule-utf-8, latin-1, etc)." + (mapcar (lambda (entry) (concat (car entry) "=" (encode-coding-string (cdr entry) encoding))) env)) + (defun git-call-process-env (buffer env &rest args) "Wrapper for call-process that sets environment strings." (let ((process-environment (append (git-get-env-strings env) @@ -265,7 +270,7 @@ and returns the process output as a string, or nil if the git failed." (defun git-run-command-region (buffer start end env &rest args) "Run a git command with specified buffer region as input." - (unless (eq 0 (let ((process-environment (append (git-get-env-strings env) + (unless (eq 0 (let ((process-environment (append (git-get-env-strings-encoded env coding-system-for-write) process-environment))) (git-run-process-region buffer start end "git" args))) The buffer text is saved with the encoding coding-system-for-write, while the GIT_* envvars were not encoded, so when appending to process-environment variable, use the same encoding. (Reminder: the *git-commit* buffer's encoding is based on the git config i18n.commitencoding, which in turn sets buffer-file-coding-system, which in turn sets coding-system-for-write) I tested this with U+F6 in the GIT_AUTHOR_NAME, git config user.name, and the commit text, and it seems to work better (I think it's fixed). Please review it. Also, I am not sure if this fix needs to be propagated to the other areas where process-environment is redefined, so YMMV. (Lastly, while testing this for Japanese, I'm having some encoding problem with meadow (Emacs on Windows), msysgit (git on Windows), set-language-mode Japanese, utf-8, and M-x git-commit-file but I don't think its related to this exact problem. Hopefully.) >> It turns out that the breakage occurs when I commit with the >> git-status mode from git.el, and it was introduced by this commit: >> >> commit dbe48256b41c1e94d81f2458d7e84b1fdcb47026 >> Author: Clifford Caoile <piyo@users.sourceforge.net> >> >> git.el: Set process-environment instead of invoking env :-) This must be the reason why process-environment wasn't used in all places. >> It's in master, but not yet in maint. (In fact, it's the _only_ change >> to contrib/emacs that's in master but not in maint.) Please forgive my ignorance, but what does this mean? Best regards, Clifford Caoile ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: encoding bug in git.el 2008-05-21 14:08 ` Clifford Caoile @ 2008-05-21 14:54 ` Karl Hasselström 2008-05-21 21:31 ` Clifford Caoile 2008-05-25 13:42 ` Karl Hasselström 1 sibling, 1 reply; 10+ messages in thread From: Karl Hasselström @ 2008-05-21 14:54 UTC (permalink / raw) To: Clifford Caoile; +Cc: David Kågedal, git, Junio C. Hamano On 2008-05-21 23:08:09 +0900, Clifford Caoile wrote: > > > It's in master, but not yet in maint. (In fact, it's the _only_ > > > change to contrib/emacs that's in master but not in maint.) > > Please forgive my ignorance, but what does this mean? That the change was committed to the "master" branch, and not the "maint" branch. So folks who run stable releases haven't seen the bug yet. -- Karl Hasselström, kha@treskal.com www.treskal.com/kalle ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: encoding bug in git.el 2008-05-21 14:54 ` Karl Hasselström @ 2008-05-21 21:31 ` Clifford Caoile 2008-05-23 7:09 ` Karl Hasselström 0 siblings, 1 reply; 10+ messages in thread From: Clifford Caoile @ 2008-05-21 21:31 UTC (permalink / raw) To: Karl Hasselström; +Cc: David Kågedal, git, Junio C. Hamano On Wed, May 21, 2008 at 11:54 PM, Karl Hasselström <kha@treskal.com> wrote: > On 2008-05-21 23:08:09 +0900, Clifford Caoile wrote: > >> > > It's in master, but not yet in maint. (In fact, it's the _only_ >> > > change to contrib/emacs that's in master but not in maint.) >> >> Please forgive my ignorance, but what does this mean? > > That the change was committed to the "master" branch, and not the > "maint" branch. So folks who run stable releases haven't seen the bug > yet. Ok I understand. Did you test the proposed fix I sent? I would like to know your feedback. Best regards, Clifford Caoile ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: encoding bug in git.el 2008-05-21 21:31 ` Clifford Caoile @ 2008-05-23 7:09 ` Karl Hasselström 0 siblings, 0 replies; 10+ messages in thread From: Karl Hasselström @ 2008-05-23 7:09 UTC (permalink / raw) To: Clifford Caoile; +Cc: David Kågedal, git, Junio C. Hamano On 2008-05-22 06:31:03 +0900, Clifford Caoile wrote: > Did you test the proposed fix I sent? I would like to know your > feedback. No, sorry, I haven't taken the time to do so yet. Will try to do so this weekend. -- Karl Hasselström, kha@treskal.com www.treskal.com/kalle ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: encoding bug in git.el 2008-05-21 14:08 ` Clifford Caoile 2008-05-21 14:54 ` Karl Hasselström @ 2008-05-25 13:42 ` Karl Hasselström 2008-05-30 12:28 ` Karl Hasselström 1 sibling, 1 reply; 10+ messages in thread From: Karl Hasselström @ 2008-05-25 13:42 UTC (permalink / raw) To: Clifford Caoile; +Cc: David Kågedal, git, Junio C. Hamano On 2008-05-21 23:08:09 +0900, Clifford Caoile wrote: > Here is a proposed fix. I suggest that process-environment should be > given these envvars already encoded as shown in this code sample: > > ------------------ git.el ------------------ > [not a proper git-diff] > @@ -216,6 +216,11 @@ and `git-diff-setup-hook'." > "Build a list of NAME=VALUE strings from a list of environment strings." > (mapcar (lambda (entry) (concat (car entry) "=" (cdr entry))) env)) > > +(defun git-get-env-strings-encoded (env encoding) > + "Build a list of NAME=VALUE strings from a list of environment strings, > +converting from mule-encoding to ENCODING (e.g. mule-utf-8, latin-1, etc)." > + (mapcar (lambda (entry) (concat (car entry) "=" > (encode-coding-string (cdr entry) encoding))) env)) > + > (defun git-call-process-env (buffer env &rest args) > "Wrapper for call-process that sets environment strings." > (let ((process-environment (append (git-get-env-strings env) > @@ -265,7 +270,7 @@ and returns the process output as a string, or nil > if the git failed." > > (defun git-run-command-region (buffer start end env &rest args) > "Run a git command with specified buffer region as input." > - (unless (eq 0 (let ((process-environment (append (git-get-env-strings env) > + (unless (eq 0 (let ((process-environment (append > (git-get-env-strings-encoded env coding-system-for-write) > process-environment))) > (git-run-process-region > buffer start end "git" args))) > > The buffer text is saved with the encoding coding-system-for-write, > while the GIT_* envvars were not encoded, so when appending to > process-environment variable, use the same encoding. I don't claim to understand any of the design issues around this, but your patch certainly fixes my problem (once I managed to apply it, which involved working around the lack of headers, non-matching offsets, and whitespace damage -- luckily it was just two hunks). So: Tested-by: Karl Hasselström <kha@treskal.com> Thanks for taking the time. -- Karl Hasselström, kha@treskal.com www.treskal.com/kalle ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: encoding bug in git.el 2008-05-25 13:42 ` Karl Hasselström @ 2008-05-30 12:28 ` Karl Hasselström 2008-05-30 20:27 ` Junio C Hamano 0 siblings, 1 reply; 10+ messages in thread From: Karl Hasselström @ 2008-05-30 12:28 UTC (permalink / raw) To: Clifford Caoile; +Cc: David Kågedal, git, Junio C. Hamano On 2008-05-25 15:42:00 +0200, Karl Hasselström wrote: > On 2008-05-21 23:08:09 +0900, Clifford Caoile wrote: > > > Here is a proposed fix. > > I don't claim to understand any of the design issues around this, > but your patch certainly fixes my problem (once I managed to apply > it, which involved working around the lack of headers, non-matching > offsets, and whitespace damage -- luckily it was just two hunks). > So: > > Tested-by: Karl Hasselström <kha@treskal.com> > > Thanks for taking the time. How are things going with this fix? Junio, I expect you're waiting for a properly cleaned-up patch, possibly with acks from relevant people? I think it would be a mistake to release 1.5.6 with this bug still in it; if not this bugfix, then a revert of the offending commit. -- Karl Hasselström, kha@treskal.com www.treskal.com/kalle ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: encoding bug in git.el 2008-05-30 12:28 ` Karl Hasselström @ 2008-05-30 20:27 ` Junio C Hamano 2008-06-02 22:41 ` [PATCH] Revert "git.el: Set process-environment instead of invoking env" Karl Hasselström 0 siblings, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2008-05-30 20:27 UTC (permalink / raw) To: Karl Hasselström Cc: Clifford Caoile, David Kågedal, git, Junio C. Hamano Karl Hasselström <kha@treskal.com> writes: > On 2008-05-25 15:42:00 +0200, Karl Hasselström wrote: > >> On 2008-05-21 23:08:09 +0900, Clifford Caoile wrote: >> >> > Here is a proposed fix. >> >> I don't claim to understand any of the design issues around this, >> but your patch certainly fixes my problem (once I managed to apply >> it, which involved working around the lack of headers, non-matching >> offsets, and whitespace damage -- luckily it was just two hunks). >> So: >> >> Tested-by: Karl Hasselström <kha@treskal.com> >> >> Thanks for taking the time. > > How are things going with this fix? Junio, I expect you're waiting for > a properly cleaned-up patch, possibly with acks from relevant people? You expected correctly. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] Revert "git.el: Set process-environment instead of invoking env" 2008-05-30 20:27 ` Junio C Hamano @ 2008-06-02 22:41 ` Karl Hasselström 2008-06-03 15:54 ` David Christensen 0 siblings, 1 reply; 10+ messages in thread From: Karl Hasselström @ 2008-06-02 22:41 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Clifford Caoile, David Kågedal This reverts commit dbe48256b41c1e94d81f2458d7e84b1fdcb47026, which caused mis-encoding of non-ASCII author/committer names when the git-status mode is used to create commits. Signed-off-by: Karl Hasselström <kha@treskal.com> --- On 2008-05-30 13:27:43 -0700, Junio C Hamano wrote: > Karl Hasselström <kha@treskal.com> writes: > > > How are things going with this fix? Junio, I expect you're waiting > > for a properly cleaned-up patch, possibly with acks from relevant > > people? > > You expected correctly. In case no one who understands how, why, and whether the fix works comes forward, here's a revert of the commit that introduced the problem. contrib/emacs/git.el | 11 +++++++---- 1 files changed, 7 insertions(+), 4 deletions(-) diff --git a/contrib/emacs/git.el b/contrib/emacs/git.el index 2557a76..4fa853f 100644 --- a/contrib/emacs/git.el +++ b/contrib/emacs/git.el @@ -232,8 +232,10 @@ and returns the process output as a string, or nil if the git failed." (defun git-run-command-region (buffer start end env &rest args) "Run a git command with specified buffer region as input." - (unless (eq 0 (let ((process-environment (append (git-get-env-strings env) - process-environment))) + (unless (eq 0 (if env + (git-run-process-region + buffer start end "env" + (append (git-get-env-strings env) (list "git") args)) (git-run-process-region buffer start end "git" args))) (error "Failed to run \"git %s\":\n%s" (mapconcat (lambda (x) x) args " ") (buffer-string)))) @@ -248,8 +250,9 @@ and returns the process output as a string, or nil if the git failed." (erase-buffer) (cd dir) (setq status - (let ((process-environment (append (git-get-env-strings env) - process-environment))) + (if env + (apply #'call-process "env" nil (list buffer t) nil + (append (git-get-env-strings env) (list hook-name) args)) (apply #'call-process hook-name nil (list buffer t) nil args)))) (display-message-or-buffer buffer) (eq 0 status))))) ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] Revert "git.el: Set process-environment instead of invoking env" 2008-06-02 22:41 ` [PATCH] Revert "git.el: Set process-environment instead of invoking env" Karl Hasselström @ 2008-06-03 15:54 ` David Christensen 0 siblings, 0 replies; 10+ messages in thread From: David Christensen @ 2008-06-03 15:54 UTC (permalink / raw) To: Karl Hasselström Cc: Junio C Hamano, git, Clifford Caoile, David Kågedal > This reverts commit dbe48256b41c1e94d81f2458d7e84b1fdcb47026, which > caused mis-encoding of non-ASCII author/committer names when the > git-status mode is used to create commits. > > Signed-off-by: Karl Hasselström <kha@treskal.com> > > --- > > On 2008-05-30 13:27:43 -0700, Junio C Hamano wrote: > >> Karl Hasselström <kha@treskal.com> writes: >> >>> How are things going with this fix? Junio, I expect you're waiting >>> for a properly cleaned-up patch, possibly with acks from relevant >>> people? >> >> You expected correctly. > > In case no one who understands how, why, and whether the fix works > comes forward, here's a revert of the commit that introduced the > problem. This likely is due to the process-coding-system selected by emacs; the correct functioning of this command will rely on both the current buffer's coding system and the coding system of the data returned by the invocation of git-status. In order for this to function properly, these should match. Both of these are variables which can be customized local to the buffer as part of the routine, so this could be fixed if we are able to determine at invocation time what coding system the git-status command will return in (presumably some form of utf-8, but I believe this is configurable per repo). I'd be glad to take a more in-depth look at this, but I'm not up on the code at this point. Regards, David -- David Christensen End Point Corporation david@endpoint.com ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-06-03 15:55 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-05-20 22:09 encoding bug in git.el Karl Hasselström [not found] ` <87mymkbo9x.fsf@lysator.liu.se> 2008-05-21 14:08 ` Clifford Caoile 2008-05-21 14:54 ` Karl Hasselström 2008-05-21 21:31 ` Clifford Caoile 2008-05-23 7:09 ` Karl Hasselström 2008-05-25 13:42 ` Karl Hasselström 2008-05-30 12:28 ` Karl Hasselström 2008-05-30 20:27 ` Junio C Hamano 2008-06-02 22:41 ` [PATCH] Revert "git.el: Set process-environment instead of invoking env" Karl Hasselström 2008-06-03 15:54 ` David Christensen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).