* [PATCH/RFC 0/2] post-receive-email: declaring and consistently using one output encoding @ 2012-03-08 11:51 Jonathan Nieder 2012-03-08 11:57 ` [PATCH 1/2] hooks/post-receive-email: set encoding to utf-8 Jonathan Nieder ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Jonathan Nieder @ 2012-03-08 11:51 UTC (permalink / raw) To: git Cc: Gerrit Pape, Alexey Shumkin, Johannes Sixt, Michael Haggerty, Jon Jensen Hi, These patches revisit the bug described at [1], where the sample post-receive script annoys receiving MUAs by not declaring what encoding it uses. Worse, sometimes the mails have a mixture of encodings. These patches standardize on UTF-8, but that is only for the sake of simplicity. A patch on top to make the choice of encoding customizable would probably not be too complicated, if someone is interested. Patches are targetted at 1.7.11 unless there is overwhelming interest in them landing sooner. Thanks to Alexander Gerasiov <gq@cs.msu.su> for the writing a patch long ago to get this started[1]. Thoughts? Gerrit Pape (1): bug#506445: hooks/post-receive-email: set encoding to utf-8 Jonathan Nieder (1): post-receive-email: defend against non-utf8 [i18n] logoutputencoding setting contrib/hooks/post-receive-email | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) [1] http://thread.gmane.org/gmane.comp.version-control.git/181737/focus=181755 [2] http://bugs.debian.org/506445 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] hooks/post-receive-email: set encoding to utf-8 2012-03-08 11:51 [PATCH/RFC 0/2] post-receive-email: declaring and consistently using one output encoding Jonathan Nieder @ 2012-03-08 11:57 ` Jonathan Nieder 2012-03-08 11:59 ` [PATCH 2/2] post-receive-email: defend against non UTF-8 i18n.logoutputencoding setting Jonathan Nieder 2012-03-08 19:12 ` [PATCH/RFC 0/2] post-receive-email: declaring and consistently using one output encoding Junio C Hamano 2 siblings, 0 replies; 9+ messages in thread From: Jonathan Nieder @ 2012-03-08 11:57 UTC (permalink / raw) To: git Cc: Gerrit Pape, Alexey Shumkin, Johannes Sixt, Michael Haggerty, Jon Jensen From: Gerrit Pape <pape@smarden.org> "git log" generates logs in UTF-8 encoding by default, but the post-receive-email example hook does not declare any encoding in the emails it sends. So add a line there: + Content-Type: text/plain; charset=utf-8 [jn: tweaked to also set the Content-Transfer-Encoding so MTAs know what kind of mangling might be needed when sending to a non 8-bit clean SMTP host] Requested-by: Alexander Gerasiov <gq@cs.msu.su> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> --- contrib/hooks/post-receive-email | 3 +++ 1 file changed, 3 insertions(+) diff --git a/contrib/hooks/post-receive-email b/contrib/hooks/post-receive-email index 01af9df1..dc184d0b 100755 --- a/contrib/hooks/post-receive-email +++ b/contrib/hooks/post-receive-email @@ -233,6 +233,9 @@ generate_email_header() cat <<-EOF To: $recipients Subject: ${emailprefix}$projectdesc $refname_type $short_refname ${change_type}d. $describe + MIME-Version: 1.0 + Content-Type: text/plain; charset=utf-8 + Content-Transfer-Encoding: 8bit X-Git-Refname: $refname X-Git-Reftype: $refname_type X-Git-Oldrev: $oldrev -- 1.7.9.2 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] post-receive-email: defend against non UTF-8 i18n.logoutputencoding setting 2012-03-08 11:51 [PATCH/RFC 0/2] post-receive-email: declaring and consistently using one output encoding Jonathan Nieder 2012-03-08 11:57 ` [PATCH 1/2] hooks/post-receive-email: set encoding to utf-8 Jonathan Nieder @ 2012-03-08 11:59 ` Jonathan Nieder 2012-03-08 13:50 ` Jeff King 2012-03-14 19:36 ` Alexey Shumkin 2012-03-08 19:12 ` [PATCH/RFC 0/2] post-receive-email: declaring and consistently using one output encoding Junio C Hamano 2 siblings, 2 replies; 9+ messages in thread From: Jonathan Nieder @ 2012-03-08 11:59 UTC (permalink / raw) To: git Cc: Gerrit Pape, Alexey Shumkin, Johannes Sixt, Michael Haggerty, Jon Jensen Johannes Sixt wrote: > Am 9/20/2011 12:42, schrieb Shumkin Alexey: >> 1. post-send-mail uses description file of a repo >> 2. gitweb also uses this file and AFAIK it assumes one to be in UTF-8 >> (I do not know whether it can be changed there but I tested gitweb once long >> time ago) >> 3. So if i18n.logoutputencoding is not UTF-8 we get a message composed >> with mixed encodings. This fact oblidge us to encode headers >> (as quoted printable at least) and synchronize body message that contain >> repo description (in UTF-8) and diffstat (in i18n.logoutputencoding). [...] > In this case, it may make sense to have a separate setting, but you should > call git like this: > > git -c "i18n.logoutputencoding=$emailcharset" show ... > git -c "i18n.logoutputencoding=$emailcharset" rev-list --pretty ... Something like this, I suppose? This teaches post-receive-email to use plumbing where possible and to explicitly declare what encoding it expects output to use. Based on an advice from Alexey Shumkin and Johannes Sixt, but all bugs are mine. Making the email charset configurable is left as an exercise for the interested reader. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> --- contrib/hooks/post-receive-email | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/contrib/hooks/post-receive-email b/contrib/hooks/post-receive-email index dc184d0b..b59e03cd 100755 --- a/contrib/hooks/post-receive-email +++ b/contrib/hooks/post-receive-email @@ -464,7 +464,7 @@ generate_delete_branch_email() echo " was $oldrev" echo "" echo $LOGBEGIN - git show -s --pretty=oneline $oldrev + git diff-tree --encoding=UTF-8 -s --pretty=oneline $oldrev echo $LOGEND } @@ -540,11 +540,11 @@ generate_atag_email() # performed on them if [ -n "$prevtag" ]; then # Show changes since the previous release - git rev-list --pretty=short "$prevtag..$newrev" | git shortlog + git shortlog --encoding=UTF-8 "$prevtag..$newrev" else # No previous tag, show all the changes since time # began - git rev-list --pretty=short $newrev | git shortlog + git shortlog --encoding=UTF-8 "$newrev" fi ;; *) @@ -564,7 +564,7 @@ generate_delete_atag_email() echo " was $oldrev" echo "" echo $LOGBEGIN - git show -s --pretty=oneline $oldrev + git diff-tree --encoding=UTF-8 -s --pretty=oneline $oldrev echo $LOGEND } @@ -610,7 +610,7 @@ generate_general_email() echo "" if [ "$newrev_type" = "commit" ]; then echo $LOGBEGIN - git show --no-color --root -s --pretty=medium $newrev + git diff-tree --encoding=UTF-8 --root -s --pretty=oneline $newrev echo $LOGEND else # What can we do here? The tag marks an object that is not @@ -629,7 +629,7 @@ generate_delete_general_email() echo " was $oldrev" echo "" echo $LOGBEGIN - git show -s --pretty=oneline $oldrev + git diff-tree --encoding=UTF-8 -s --pretty=oneline $oldrev echo $LOGEND } -- 1.7.9.2 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] post-receive-email: defend against non UTF-8 i18n.logoutputencoding setting 2012-03-08 11:59 ` [PATCH 2/2] post-receive-email: defend against non UTF-8 i18n.logoutputencoding setting Jonathan Nieder @ 2012-03-08 13:50 ` Jeff King 2012-03-08 13:52 ` Jonathan Nieder 2012-03-14 19:36 ` Alexey Shumkin 1 sibling, 1 reply; 9+ messages in thread From: Jeff King @ 2012-03-08 13:50 UTC (permalink / raw) To: Jonathan Nieder Cc: git, Gerrit Pape, Alexey Shumkin, Johannes Sixt, Michael Haggerty, Jon Jensen On Thu, Mar 08, 2012 at 05:59:57AM -0600, Jonathan Nieder wrote: > diff --git a/contrib/hooks/post-receive-email b/contrib/hooks/post-receive-email > index dc184d0b..b59e03cd 100755 > --- a/contrib/hooks/post-receive-email > +++ b/contrib/hooks/post-receive-email > @@ -464,7 +464,7 @@ generate_delete_branch_email() > echo " was $oldrev" > echo "" > echo $LOGBEGIN > - git show -s --pretty=oneline $oldrev > + git diff-tree --encoding=UTF-8 -s --pretty=oneline $oldrev > echo $LOGEND If you are using "-s" to suppress diff output, why are you using diff-tree? Wouldn't "rev-list -1" (or "rev-list --no-walk") work equally well and be a little more obvious? -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] post-receive-email: defend against non UTF-8 i18n.logoutputencoding setting 2012-03-08 13:50 ` Jeff King @ 2012-03-08 13:52 ` Jonathan Nieder 0 siblings, 0 replies; 9+ messages in thread From: Jonathan Nieder @ 2012-03-08 13:52 UTC (permalink / raw) To: Jeff King Cc: git, Gerrit Pape, Alexey Shumkin, Johannes Sixt, Michael Haggerty, Jon Jensen Jeff King wrote: > If you are using "-s" to suppress diff output, why are you using > diff-tree? Wouldn't "rev-list -1" (or "rev-list --no-walk") work equally > well and be a little more obvious? Just a habit --- some part of me is wired to think "diff-tree is the command to show a commit". Maybe rev-list didn't support --pretty in olden days? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] post-receive-email: defend against non UTF-8 i18n.logoutputencoding setting 2012-03-08 11:59 ` [PATCH 2/2] post-receive-email: defend against non UTF-8 i18n.logoutputencoding setting Jonathan Nieder 2012-03-08 13:50 ` Jeff King @ 2012-03-14 19:36 ` Alexey Shumkin 2012-03-14 19:55 ` Jonathan Nieder 1 sibling, 1 reply; 9+ messages in thread From: Alexey Shumkin @ 2012-03-14 19:36 UTC (permalink / raw) To: Jonathan Nieder Cc: git, Gerrit Pape, Johannes Sixt, Michael Haggerty, Jon Jensen I'd like to remind then following aspect > My proposition was in to send email message in explicitly defined > custom encoding. Why? In development process under Windows non-UTF-8 > encoding is used (cp1251 in my case). So, filenames have this > encoding, and as we know Git stores their names as is - in cp1251 - > without a conversion. And filenames are also used in diff-stat (with > core.quotepath= false, BTW, I did not take into account this config) > without any conversion. So if we'll make all text in UTF-8 but > filenames are still non-UTF-8, email would look corrupted. this is citation from http://thread.gmane.org/gmane.comp.version-control.git/181737/focus=183076 > Johannes Sixt wrote: > > Am 9/20/2011 12:42, schrieb Shumkin Alexey: > > >> 1. post-send-mail uses description file of a repo > >> 2. gitweb also uses this file and AFAIK it assumes one to be in > >> UTF-8 (I do not know whether it can be changed there but I tested > >> gitweb once long time ago) > >> 3. So if i18n.logoutputencoding is not UTF-8 we get a message > >> composed with mixed encodings. This fact oblidge us to encode > >> headers (as quoted printable at least) and synchronize body > >> message that contain repo description (in UTF-8) and diffstat (in > >> i18n.logoutputencoding). > [...] > > In this case, it may make sense to have a separate setting, but you > > should call git like this: > > > > git -c "i18n.logoutputencoding=$emailcharset" show ... > > git -c "i18n.logoutputencoding=$emailcharset" rev-list > > --pretty ... > > Something like this, I suppose? > > This teaches post-receive-email to use plumbing where possible and to > explicitly declare what encoding it expects output to use. > > Based on an advice from Alexey Shumkin and Johannes Sixt, but all bugs > are mine. Making the email charset configurable is left as an > exercise for the interested reader. > > Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> > --- > contrib/hooks/post-receive-email | 12 ++++++------ > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/contrib/hooks/post-receive-email > b/contrib/hooks/post-receive-email index dc184d0b..b59e03cd 100755 > --- a/contrib/hooks/post-receive-email > +++ b/contrib/hooks/post-receive-email > @@ -464,7 +464,7 @@ generate_delete_branch_email() > echo " was $oldrev" > echo "" > echo $LOGBEGIN > - git show -s --pretty=oneline $oldrev > + git diff-tree --encoding=UTF-8 -s --pretty=oneline $oldrev > echo $LOGEND > } > > @@ -540,11 +540,11 @@ generate_atag_email() > # performed on them > if [ -n "$prevtag" ]; then > # Show changes since the previous release > - git rev-list --pretty=short > "$prevtag..$newrev" | git shortlog > + git shortlog --encoding=UTF-8 > "$prevtag..$newrev" else > # No previous tag, show all the changes > since time # began > - git rev-list --pretty=short $newrev | git > shortlog > + git shortlog --encoding=UTF-8 "$newrev" > fi > ;; > *) > @@ -564,7 +564,7 @@ generate_delete_atag_email() > echo " was $oldrev" > echo "" > echo $LOGBEGIN > - git show -s --pretty=oneline $oldrev > + git diff-tree --encoding=UTF-8 -s --pretty=oneline $oldrev > echo $LOGEND > } > > @@ -610,7 +610,7 @@ generate_general_email() > echo "" > if [ "$newrev_type" = "commit" ]; then > echo $LOGBEGIN > - git show --no-color --root -s --pretty=medium $newrev > + git diff-tree --encoding=UTF-8 --root -s > --pretty=oneline $newrev echo $LOGEND > else > # What can we do here? The tag marks an object that > is not @@ -629,7 +629,7 @@ generate_delete_general_email() > echo " was $oldrev" > echo "" > echo $LOGBEGIN > - git show -s --pretty=oneline $oldrev > + git diff-tree --encoding=UTF-8 -s --pretty=oneline $oldrev > echo $LOGEND > } > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] post-receive-email: defend against non UTF-8 i18n.logoutputencoding setting 2012-03-14 19:36 ` Alexey Shumkin @ 2012-03-14 19:55 ` Jonathan Nieder 0 siblings, 0 replies; 9+ messages in thread From: Jonathan Nieder @ 2012-03-14 19:55 UTC (permalink / raw) To: Alexey Shumkin Cc: git, Gerrit Pape, Johannes Sixt, Michael Haggerty, Jon Jensen Alexey Shumkin wrote: > I'd like to remind then following aspect > >> My proposition was in to send email message in explicitly defined >> custom encoding. Why? In development process under Windows non-UTF-8 >> encoding is used (cp1251 in my case). So, filenames have this >> encoding, and as we know Git stores their names as is [...] >> with >> core.quotepath= false Sure. Do you think this patch makes that problem worse, and if so, do you have any ideas about how that could be prevented? Otherwise: >> Making the email charset configurable is left as an >> exercise for the interested reader. I did not want to do that part because I do not trust myself to understand the needs of people using non-utf8 and test it appropriately, but I tried to make sure the patch was structured in a way that would make it easy. Hoping clarifies a little, Jonathan ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH/RFC 0/2] post-receive-email: declaring and consistently using one output encoding 2012-03-08 11:51 [PATCH/RFC 0/2] post-receive-email: declaring and consistently using one output encoding Jonathan Nieder 2012-03-08 11:57 ` [PATCH 1/2] hooks/post-receive-email: set encoding to utf-8 Jonathan Nieder 2012-03-08 11:59 ` [PATCH 2/2] post-receive-email: defend against non UTF-8 i18n.logoutputencoding setting Jonathan Nieder @ 2012-03-08 19:12 ` Junio C Hamano 2012-03-14 17:54 ` Jon Jensen 2 siblings, 1 reply; 9+ messages in thread From: Junio C Hamano @ 2012-03-08 19:12 UTC (permalink / raw) To: Jonathan Nieder Cc: git, Gerrit Pape, Alexey Shumkin, Johannes Sixt, Michael Haggerty, Jon Jensen Jonathan Nieder <jrnieder@gmail.com> writes: > Patches are targetted at 1.7.11 unless there is overwhelming > interest in them landing sooner. Even if there is none, I wouldn't mind taking it for 1.7.10, if only to reduce the chance that the patch gets forgotten again, provided that there is a clear concensus that this is a good change that won't regress people who have been actually using this sample in real life. > <gq@cs.msu.su> for the writing a patch long ago to get this > started[1]. > > Thoughts? > > Gerrit Pape (1): > bug#506445: hooks/post-receive-email: set encoding to utf-8 > > Jonathan Nieder (1): > post-receive-email: defend against non-utf8 [i18n] logoutputencoding > setting > > contrib/hooks/post-receive-email | 15 +++++++++------ > 1 file changed, 9 insertions(+), 6 deletions(-) > > [1] http://thread.gmane.org/gmane.comp.version-control.git/181737/focus=181755 > [2] http://bugs.debian.org/506445 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH/RFC 0/2] post-receive-email: declaring and consistently using one output encoding 2012-03-08 19:12 ` [PATCH/RFC 0/2] post-receive-email: declaring and consistently using one output encoding Junio C Hamano @ 2012-03-14 17:54 ` Jon Jensen 0 siblings, 0 replies; 9+ messages in thread From: Jon Jensen @ 2012-03-14 17:54 UTC (permalink / raw) To: Junio C Hamano Cc: Jonathan Nieder, git, Gerrit Pape, Alexey Shumkin, Johannes Sixt, Michael Haggerty On Thu, 8 Mar 2012, Junio C Hamano wrote: >> Patches are targetted at 1.7.11 unless there is overwhelming interest >> in them landing sooner. > > Even if there is none, I wouldn't mind taking it for 1.7.10, if only to > reduce the chance that the patch gets forgotten again, provided that > there is a clear concensus that this is a good change that won't regress > people who have been actually using this sample in real life. +1 from me. This would solve an occasional problem we have because encodings aren't set in post-receive-email output. Jon ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-03-14 19:55 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-03-08 11:51 [PATCH/RFC 0/2] post-receive-email: declaring and consistently using one output encoding Jonathan Nieder 2012-03-08 11:57 ` [PATCH 1/2] hooks/post-receive-email: set encoding to utf-8 Jonathan Nieder 2012-03-08 11:59 ` [PATCH 2/2] post-receive-email: defend against non UTF-8 i18n.logoutputencoding setting Jonathan Nieder 2012-03-08 13:50 ` Jeff King 2012-03-08 13:52 ` Jonathan Nieder 2012-03-14 19:36 ` Alexey Shumkin 2012-03-14 19:55 ` Jonathan Nieder 2012-03-08 19:12 ` [PATCH/RFC 0/2] post-receive-email: declaring and consistently using one output encoding Junio C Hamano 2012-03-14 17:54 ` Jon Jensen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).