* [PATCH] RFC2047-encode email headers
@ 2006-10-22 12:02 Karl Hasselström
2006-10-22 12:12 ` Karl Hasselström
0 siblings, 1 reply; 5+ messages in thread
From: Karl Hasselström @ 2006-10-22 12:02 UTC (permalink / raw)
To: Catalin Marinas; +Cc: git
From: Karl Hasselström <kha@treskal.com>
Having non-ascii characters in email headers is illegal, but StGIT
currently does not care. I'm often bitten by this, since my name
doesn't fit in ascii.
This patch implements an encoding pass just before the email is sent
over the wire -- in particular, it comes after any interactive editing
and templates and such, so the user should never have to see the
rfc2047 encoding.
NOTE: The rfc2047 encoder needs to know the encoding of the input
string. This patch hard-codes this to utf8, since that should be by
far the most common non-ascii encoding, and since utf8 is already the
hardcoded character set for the email body. In the long run, we
probably want to get this from the locale, or from a command line
switch, or both.
Signed-off-by: Karl Hasselström <kha@treskal.com>
---
stgit/commands/mail.py | 45 +++++++++++++++++++++++++++++++++++++++++----
1 files changed, 41 insertions(+), 4 deletions(-)
diff --git a/stgit/commands/mail.py b/stgit/commands/mail.py
index 34504e6..b661308 100644
--- a/stgit/commands/mail.py
+++ b/stgit/commands/mail.py
@@ -15,7 +15,7 @@ along with this program; if not, write t
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
"""
-import sys, os, re, time, datetime, smtplib, email.Utils
+import sys, os, re, time, datetime, smtplib, email.Header, email.Utils
from optparse import OptionParser, make_option
from stgit.commands.common import *
@@ -403,6 +403,42 @@ def __build_message(tmpl, patch, patch_n
return msg.strip('\n')
+def encode_header(s, enc):
+ """Take an entire e-mail header line, encoded in enc, and
+ rfc2047-encode it."""
+ def trans(s):
+ return str(email.Header.Header(unicode(s, enc)))
+ words = s.split(' ')
+ first_encode = len(words)
+ last_encode = -1
+ for i in xrange(len(words)):
+ ew = trans(words[i])
+ if ew != words[i]:
+ first_encode = min(first_encode, i)
+ last_encode = max(last_encode, i)
+ if first_encode <= last_encode:
+ return ' '.join(filter(
+ None,
+ [' '.join(words[:first_encode]),
+ trans(' '.join(words[first_encode:last_encode+1])),
+ ' '.join(words[last_encode+1:])]))
+ else:
+ return s
+
+def encode_headers(msg, enc):
+ """rfc2047-encode the headers of msg, assuming it is encoded in
+ enc."""
+ in_header = True
+ lines = []
+ for line in msg.splitlines(True):
+ if in_header:
+ if line.strip():
+ line = encode_header(line, enc)
+ else:
+ in_header = False
+ lines.append(line)
+ return ''.join(lines)
+
def func(parser, options, args):
"""Send the patches by e-mail using the patchmail.tmpl file as
a template
@@ -461,7 +497,8 @@ def func(parser, options, args):
raise CmdException, 'No cover message template file found'
msg_id = email.Utils.make_msgid('stgit')
- msg = __build_cover(tmpl, total_nr, msg_id, options)
+ msg = encode_headers(__build_cover(tmpl, total_nr, msg_id, options),
+ 'UTF-8')
from_addr, to_addr_list = __parse_addresses(msg)
# subsequent e-mails are seen as replies to the first one
@@ -487,8 +524,8 @@ def func(parser, options, args):
for (p, patch_nr) in zip(patches, range(1, len(patches) + 1)):
msg_id = email.Utils.make_msgid('stgit')
- msg = __build_message(tmpl, p, patch_nr, total_nr, msg_id, ref_id,
- options)
+ msg = encode_headers(__build_message(tmpl, p, patch_nr, total_nr,
+ msg_id, ref_id, options), 'UTF-8')
from_addr, to_addr_list = __parse_addresses(msg)
# subsequent e-mails are seen as replies to the first one
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] RFC2047-encode email headers
2006-10-22 12:02 [PATCH] RFC2047-encode email headers Karl Hasselström
@ 2006-10-22 12:12 ` Karl Hasselström
2006-10-22 12:45 ` [PATCH 0/2] Make "stg mail" behave better with non-ascii characters Karl Hasselström
0 siblings, 1 reply; 5+ messages in thread
From: Karl Hasselström @ 2006-10-22 12:12 UTC (permalink / raw)
To: Catalin Marinas; +Cc: git
On 2006-10-22 14:02:17 +0200, Karl Hasselström wrote:
> From: Karl Hasselström <kha@treskal.com>
OK, this patch did what it was supposed to do -- which was to encode
the mail headers properly -- but StGIT still generates an 8-bit
encoded body, and vger doesn't seem to like that (see the X-Warning:
headers it added to the patch mail). That's another fix for another
day.
Catalin, if you take this patch, I'd appreciate it if you made double
sure that my name doesn't have garbage in it. (It may very well be
that the copy of the patch sent to you personally is unharmed; it all
works fine when I send patches to myself. vger is the only mail server
I have seen that has this problem.)
--
Karl Hasselström, kha@treskal.com
www.treskal.com/kalle
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 0/2] Make "stg mail" behave better with non-ascii characters
2006-10-22 12:12 ` Karl Hasselström
@ 2006-10-22 12:45 ` Karl Hasselström
2006-10-22 12:49 ` [PATCH 1/2] RFC2047-encode email headers Karl Hasselström
2006-10-22 12:49 ` [PATCH 2/2] QP-encode email body Karl Hasselström
0 siblings, 2 replies; 5+ messages in thread
From: Karl Hasselström @ 2006-10-22 12:45 UTC (permalink / raw)
To: Catalin Marinas; +Cc: git
These two patches teach "stg mail" to ref2047-escape non-ascii
characters in the mail headers (not doing so is illegal), and
QP-encodes the body (leaving it as 8bit is not well received by some
mail severs, notably vger).
The first patch is a resend, this time hopefully with my name intact.
--
Karl Hasselström, kha@treskal.com
www.treskal.com/kalle
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] RFC2047-encode email headers
2006-10-22 12:45 ` [PATCH 0/2] Make "stg mail" behave better with non-ascii characters Karl Hasselström
@ 2006-10-22 12:49 ` Karl Hasselström
2006-10-22 12:49 ` [PATCH 2/2] QP-encode email body Karl Hasselström
1 sibling, 0 replies; 5+ messages in thread
From: Karl Hasselström @ 2006-10-22 12:49 UTC (permalink / raw)
To: Catalin Marinas; +Cc: git
From: Karl Hasselström <kha@treskal.com>
Having non-ascii characters in email headers is illegal, but StGIT
currently does not care. I'm often bitten by this, since my name
doesn't fit in ascii.
This patch implements an encoding pass just before the email is sent
over the wire -- in particular, it comes after any interactive editing
and templates and such, so the user should never have to see the
rfc2047 encoding.
NOTE: The rfc2047 encoder needs to know the encoding of the input
string. This patch hard-codes this to utf8, since that should be by
far the most common non-ascii encoding, and since utf8 is already the
hardcoded character set for the email body. In the long run, we
probably want to get this from the locale, or from a command line
switch, or both.
Signed-off-by: Karl Hasselström <kha@treskal.com>
---
stgit/commands/mail.py | 45 +++++++++++++++++++++++++++++++++++++++++----
1 files changed, 41 insertions(+), 4 deletions(-)
diff --git a/stgit/commands/mail.py b/stgit/commands/mail.py
index 34504e6..b661308 100644
--- a/stgit/commands/mail.py
+++ b/stgit/commands/mail.py
@@ -15,7 +15,7 @@ along with this program; if not, write t
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
"""
-import sys, os, re, time, datetime, smtplib, email.Utils
+import sys, os, re, time, datetime, smtplib, email.Header, email.Utils
from optparse import OptionParser, make_option
from stgit.commands.common import *
@@ -403,6 +403,42 @@ def __build_message(tmpl, patch, patch_n
return msg.strip('\n')
+def encode_header(s, enc):
+ """Take an entire e-mail header line, encoded in enc, and
+ rfc2047-encode it."""
+ def trans(s):
+ return str(email.Header.Header(unicode(s, enc)))
+ words = s.split(' ')
+ first_encode = len(words)
+ last_encode = -1
+ for i in xrange(len(words)):
+ ew = trans(words[i])
+ if ew != words[i]:
+ first_encode = min(first_encode, i)
+ last_encode = max(last_encode, i)
+ if first_encode <= last_encode:
+ return ' '.join(filter(
+ None,
+ [' '.join(words[:first_encode]),
+ trans(' '.join(words[first_encode:last_encode+1])),
+ ' '.join(words[last_encode+1:])]))
+ else:
+ return s
+
+def encode_headers(msg, enc):
+ """rfc2047-encode the headers of msg, assuming it is encoded in
+ enc."""
+ in_header = True
+ lines = []
+ for line in msg.splitlines(True):
+ if in_header:
+ if line.strip():
+ line = encode_header(line, enc)
+ else:
+ in_header = False
+ lines.append(line)
+ return ''.join(lines)
+
def func(parser, options, args):
"""Send the patches by e-mail using the patchmail.tmpl file as
a template
@@ -461,7 +497,8 @@ def func(parser, options, args):
raise CmdException, 'No cover message template file found'
msg_id = email.Utils.make_msgid('stgit')
- msg = __build_cover(tmpl, total_nr, msg_id, options)
+ msg = encode_headers(__build_cover(tmpl, total_nr, msg_id, options),
+ 'UTF-8')
from_addr, to_addr_list = __parse_addresses(msg)
# subsequent e-mails are seen as replies to the first one
@@ -487,8 +524,8 @@ def func(parser, options, args):
for (p, patch_nr) in zip(patches, range(1, len(patches) + 1)):
msg_id = email.Utils.make_msgid('stgit')
- msg = __build_message(tmpl, p, patch_nr, total_nr, msg_id, ref_id,
- options)
+ msg = encode_headers(__build_message(tmpl, p, patch_nr, total_nr,
+ msg_id, ref_id, options), 'UTF-8')
from_addr, to_addr_list = __parse_addresses(msg)
# subsequent e-mails are seen as replies to the first one
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] QP-encode email body
2006-10-22 12:45 ` [PATCH 0/2] Make "stg mail" behave better with non-ascii characters Karl Hasselström
2006-10-22 12:49 ` [PATCH 1/2] RFC2047-encode email headers Karl Hasselström
@ 2006-10-22 12:49 ` Karl Hasselström
1 sibling, 0 replies; 5+ messages in thread
From: Karl Hasselström @ 2006-10-22 12:49 UTC (permalink / raw)
To: Catalin Marinas; +Cc: git
From: Karl Hasselström <kha@treskal.com>
Some mail servers dislike the 8bit transfer encoding, so use
quoted-printable instead.
Signed-off-by: Karl Hasselström <kha@treskal.com>
---
stgit/commands/mail.py | 16 +++++++++-------
1 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/stgit/commands/mail.py b/stgit/commands/mail.py
index b661308..885d5e9 100644
--- a/stgit/commands/mail.py
+++ b/stgit/commands/mail.py
@@ -15,7 +15,7 @@ along with this program; if not, write t
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
"""
-import sys, os, re, time, datetime, smtplib, email.Header, email.Utils
+import sys, os, re, time, datetime, quopri, smtplib, email.Header, email.Utils
from optparse import OptionParser, make_option
from stgit.commands.common import *
@@ -253,7 +253,7 @@ def __build_extra_headers():
"""Build extra headers like content-type etc.
"""
headers = 'Content-Type: text/plain; charset=utf-8; format=fixed\n'
- headers += 'Content-Transfer-Encoding: 8bit\n'
+ headers += 'Content-Transfer-Encoding: quoted-printable\n'
headers += 'User-Agent: StGIT/%s\n' % version.version
return headers
@@ -425,9 +425,9 @@ def encode_header(s, enc):
else:
return s
-def encode_headers(msg, enc):
- """rfc2047-encode the headers of msg, assuming it is encoded in
- enc."""
+def encode_message(msg, enc):
+ """rfc2047-encode the headers of msg, and quoted-printable-encode
+ the body. msg is assumed to be encoded in enc."""
in_header = True
lines = []
for line in msg.splitlines(True):
@@ -436,6 +436,8 @@ def encode_headers(msg, enc):
line = encode_header(line, enc)
else:
in_header = False
+ else:
+ line = quopri.encodestring(line)
lines.append(line)
return ''.join(lines)
@@ -497,7 +499,7 @@ def func(parser, options, args):
raise CmdException, 'No cover message template file found'
msg_id = email.Utils.make_msgid('stgit')
- msg = encode_headers(__build_cover(tmpl, total_nr, msg_id, options),
+ msg = encode_message(__build_cover(tmpl, total_nr, msg_id, options),
'UTF-8')
from_addr, to_addr_list = __parse_addresses(msg)
@@ -524,7 +526,7 @@ def func(parser, options, args):
for (p, patch_nr) in zip(patches, range(1, len(patches) + 1)):
msg_id = email.Utils.make_msgid('stgit')
- msg = encode_headers(__build_message(tmpl, p, patch_nr, total_nr,
+ msg = encode_message(__build_message(tmpl, p, patch_nr, total_nr,
msg_id, ref_id, options), 'UTF-8')
from_addr, to_addr_list = __parse_addresses(msg)
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-10-22 12:49 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-22 12:02 [PATCH] RFC2047-encode email headers Karl Hasselström
2006-10-22 12:12 ` Karl Hasselström
2006-10-22 12:45 ` [PATCH 0/2] Make "stg mail" behave better with non-ascii characters Karl Hasselström
2006-10-22 12:49 ` [PATCH 1/2] RFC2047-encode email headers Karl Hasselström
2006-10-22 12:49 ` [PATCH 2/2] QP-encode email body Karl Hasselström
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).