* [PATCH] generate a valid rfc2047 mail header for multi-line subject. @ 2011-02-14 8:09 xzer 2011-02-22 20:43 ` Junio C Hamano 0 siblings, 1 reply; 15+ messages in thread From: xzer @ 2011-02-14 8:09 UTC (permalink / raw) To: git; +Cc: xzer There is still a problem that git-am will lost the line break. It's not easy to retain it, but as the first step, we can generate a valid rfc2047 header now. --- pretty.c | 29 ++++++++++++++++++++++++++++- 1 files changed, 28 insertions(+), 1 deletions(-) diff --git a/pretty.c b/pretty.c index 8549934..f18a38d 100644 --- a/pretty.c +++ b/pretty.c @@ -249,6 +249,33 @@ needquote: strbuf_addstr(sb, "?="); } +static void add_rfc2047_multiline(struct strbuf *sb, const char *line, int len, + const char *encoding) +{ + int first = 1; + char *mline = xmemdupz(line, len); + const char *cline = mline; + int offset = 0, linelen = 0; + for (;;) { + linelen = get_one_line(cline); + + cline += linelen; + + if (!linelen) + break; + + if (!first) + strbuf_addf(sb, "\n "); + + offset = *(cline -1) == '\n'; + + add_rfc2047(sb, cline-linelen, linelen-offset, encoding); + first = 0; + + } + free(mline); +} + void pp_user_info(const char *what, enum cmit_fmt fmt, struct strbuf *sb, const char *line, enum date_mode dmode, const char *encoding) @@ -1115,7 +1142,7 @@ void pp_title_line(enum cmit_fmt fmt, strbuf_grow(sb, title.len + 1024); if (subject) { strbuf_addstr(sb, subject); - add_rfc2047(sb, title.buf, title.len, encoding); + add_rfc2047_multiline(sb, title.buf, title.len, encoding); } else { strbuf_addbuf(sb, &title); } -- 1.7.4.52.g00e6e.dirty ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] generate a valid rfc2047 mail header for multi-line subject. 2011-02-14 8:09 [PATCH] generate a valid rfc2047 mail header for multi-line subject xzer @ 2011-02-22 20:43 ` Junio C Hamano 2011-02-23 8:08 ` Jeff King 2011-02-23 15:34 ` xzer 0 siblings, 2 replies; 15+ messages in thread From: Junio C Hamano @ 2011-02-22 20:43 UTC (permalink / raw) To: xzer; +Cc: git xzer <xiaozhu@gmail.com> writes: > Subject: Re: [PATCH] generate a valid rfc2047 mail header for multi-line subject. We prefer to have "[PATCH] subsystem: description without final full-stop" here. > There is still a problem that git-am will lost the line break. What does "still" refer to? It is unclear under what condition the command lose "the line break" (nor which line break you are refering to; I am guessing that you have a commit that begins with a multi-line paragraph and you are talking about line breaks between the lines in the first paragraph). > It's not easy to retain it, but as the first step, we can generate > a valid rfc2047 header now. Please describe what is broken (iow, "Given this sample input, we currently generate this output, which is not a valid rfc2047") and what the new output looks like ("Update pp_title_line() to generate this output instead.") > --- Missing sign-off with a real name. > diff --git a/pretty.c b/pretty.c > index 8549934..f18a38d 100644 > --- a/pretty.c > +++ b/pretty.c > @@ -249,6 +249,33 @@ needquote: > strbuf_addstr(sb, "?="); > } > > +static void add_rfc2047_multiline(struct strbuf *sb, const char *line, int len, > + const char *encoding) > +{ > + int first = 1; > + char *mline = xmemdupz(line, len); > + const char *cline = mline; > + int offset = 0, linelen = 0; > + for (;;) { You seem to have indent that uses SPs instead of HT around here... > + linelen = get_one_line(cline); I can see you are trying to be careful not to let get_one_line() overstep past "len" the caller gave you by making a copy first, but is this overhead really necessary? After all we know in this static function that the caller is feeding the contents from a strbuf, which always have a terminating NUL (and that is why it is Ok that get_one_line() is not a counted string interface). > + > + cline += linelen; > + > + if (!linelen) > + break; > + > + if (!first) > + strbuf_addf(sb, "\n "); > + > + offset = *(cline -1) == '\n'; > + > + add_rfc2047(sb, cline-linelen, linelen-offset, encoding); > + first = 0; > + > + } > + free(mline); > +} So the general idea of this change (I am thinking aloud what should be in the updated commit log message as the problem description) is that: - We currently give an entire multi-line paragraph string to the add_rfc2047() function to be formatted as the title of the commit; - The add_rfc2047() functionjust passes "\n" through, without making it a folding whitespace followed by a newline, to help callers that want to use this function to produce a header line that is rfc 2822 conformant; - The patch introduces a new function add_rfc2047_multiline() that splits its input and performs line folding for such a caller (namely, the pp_title_line() function); - Another caller of add_rfc2047(), pp_user_info, is not changed, and it won't fold the name of the user that appear on the From: line. It is unclear if the last point is really the right thing to do, though. It is not a new problem that an author name that has a "\n" in it would break the output, but we probably would want to fix that case too here? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] generate a valid rfc2047 mail header for multi-line subject. 2011-02-22 20:43 ` Junio C Hamano @ 2011-02-23 8:08 ` Jeff King 2011-02-23 9:48 ` Jeff King 2011-02-23 17:34 ` Junio C Hamano 2011-02-23 15:34 ` xzer 1 sibling, 2 replies; 15+ messages in thread From: Jeff King @ 2011-02-23 8:08 UTC (permalink / raw) To: Junio C Hamano; +Cc: xzer, git On Tue, Feb 22, 2011 at 12:43:40PM -0800, Junio C Hamano wrote: > So the general idea of this change (I am thinking aloud what should be in > the updated commit log message as the problem description) is that: > > - We currently give an entire multi-line paragraph string to the > add_rfc2047() function to be formatted as the title of the commit; > > - The add_rfc2047() functionjust passes "\n" through, without making it a > folding whitespace followed by a newline, to help callers that want to > use this function to produce a header line that is rfc 2822 conformant; > > - The patch introduces a new function add_rfc2047_multiline() that splits > its input and performs line folding for such a caller (namely, the > pp_title_line() function); > > - Another caller of add_rfc2047(), pp_user_info, is not changed, and it > won't fold the name of the user that appear on the From: line. > > It is unclear if the last point is really the right thing to do, though. > It is not a new problem that an author name that has a "\n" in it would > break the output, but we probably would want to fix that case too here? Yeah, I think the best path forward is: 1. Stop feeding "pre-folded" subject lines to the email formatter. Give it the regular subject line with no newlines. 2. rfc2047 encoding should encode a literal newline. Which should generally never happen, but is probably the most sane thing to do if it does. 3. rfc2047 should fold all lines at some sane length. As it is now, we may sometimes generate long lines in headers (though in practice, I doubt this is much of a problem). I started to work on this, but got stuck on (3). Our existing wrap functions want NUL-terminated strings, and we are operating on a substring. I tried converting the wrap functions to handle lengths, but it got way uglier than I had hoped. I think just strdup'ing the subject temporarily is probably fine, though. Let me see what I can come up with. -Peff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] generate a valid rfc2047 mail header for multi-line subject. 2011-02-23 8:08 ` Jeff King @ 2011-02-23 9:48 ` Jeff King 2011-02-23 9:50 ` [PATCH 1/3] strbuf: add fixed-length version of add_wrapped_text Jeff King ` (3 more replies) 2011-02-23 17:34 ` Junio C Hamano 1 sibling, 4 replies; 15+ messages in thread From: Jeff King @ 2011-02-23 9:48 UTC (permalink / raw) To: Junio C Hamano; +Cc: xzer, git On Wed, Feb 23, 2011 at 03:08:54AM -0500, Jeff King wrote: > Yeah, I think the best path forward is: > > 1. Stop feeding "pre-folded" subject lines to the email formatter. > Give it the regular subject line with no newlines. > > 2. rfc2047 encoding should encode a literal newline. Which should > generally never happen, but is probably the most sane thing to do > if it does. > > 3. rfc2047 should fold all lines at some sane length. As it is now, we > may sometimes generate long lines in headers (though in practice, I > doubt this is much of a problem). So here is a series that does this. It still doesn't preserve subject newlines in "format-patch | am", but I don't think that was ever a goal of the code. If we want to add it as an optional feature on top (maybe as part of "-k"?), it should be easy to do (since the rfc2047 encoding will now preserve embedded newlines). [1/3]: strbuf: add fixed-length version of add_wrapped_text [2/3]: format-patch: wrap long header lines [3/3]: format-patch: rfc2047-encode newlines in headers -Peff ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/3] strbuf: add fixed-length version of add_wrapped_text 2011-02-23 9:48 ` Jeff King @ 2011-02-23 9:50 ` Jeff King 2011-02-23 9:58 ` [PATCH 2/3] format-patch: wrap long header lines Jeff King ` (2 subsequent siblings) 3 siblings, 0 replies; 15+ messages in thread From: Jeff King @ 2011-02-23 9:50 UTC (permalink / raw) To: Junio C Hamano; +Cc: xzer, git The function strbuf_add_wrapped_text takes a NUL-terminated string. This makes it annoying to wrap strings we have as a pointer and a length. Refactoring strbuf_add_wrapped_text and all of its sub-functions to handle fixed-length strings turned out to be really ugly. So this implementation is lame; it just strdups the text and operates on the NUL-terminated version. This should be fine as the strings we are wrapping are generally pretty short. If it becomes a problem, we can optimize later. Signed-off-by: Jeff King <peff@peff.net> --- utf8.c | 9 +++++++++ utf8.h | 2 ++ 2 files changed, 11 insertions(+), 0 deletions(-) diff --git a/utf8.c b/utf8.c index 84cfc72..8acbc66 100644 --- a/utf8.c +++ b/utf8.c @@ -405,6 +405,15 @@ new_line: } } +int strbuf_add_wrapped_bytes(struct strbuf *buf, const char *data, int len, + int indent, int indent2, int width) +{ + char *tmp = xstrndup(data, len); + int r = strbuf_add_wrapped_text(buf, tmp, indent, indent2, width); + free(tmp); + return r; +} + int is_encoding_utf8(const char *name) { if (!name) diff --git a/utf8.h b/utf8.h index ebc4d2f..81f2c82 100644 --- a/utf8.h +++ b/utf8.h @@ -10,6 +10,8 @@ int is_encoding_utf8(const char *name); int strbuf_add_wrapped_text(struct strbuf *buf, const char *text, int indent, int indent2, int width); +int strbuf_add_wrapped_bytes(struct strbuf *buf, const char *data, int len, + int indent, int indent2, int width); #ifndef NO_ICONV char *reencode_string(const char *in, const char *out_encoding, const char *in_encoding); -- 1.7.2.5.15.gfdd1c ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/3] format-patch: wrap long header lines 2011-02-23 9:48 ` Jeff King 2011-02-23 9:50 ` [PATCH 1/3] strbuf: add fixed-length version of add_wrapped_text Jeff King @ 2011-02-23 9:58 ` Jeff King 2011-02-23 9:59 ` [PATCH 3/3] format-patch: rfc2047-encode newlines in headers Jeff King 2011-02-23 15:16 ` [PATCH] generate a valid rfc2047 mail header for multi-line subject xzer 3 siblings, 0 replies; 15+ messages in thread From: Jeff King @ 2011-02-23 9:58 UTC (permalink / raw) To: Junio C Hamano; +Cc: xzer, git Subject and identity headers may be arbitrarily long. In the past, we just assumed that single-line headers would be reasonably short. For multi-line subjects that we squish into a single line, we just "pre-folded" the data in pp_title_line by adding a newline and indentation. There were two problems. One is that, although rare, single-line messages can actually be longer than the recommended line-length limits. The second is that the pre-folding interacted badly with rfc2047 encoding, leading to malformed headers. Instead, let's stop pre-folding the subject lines, and just fold everything based on length in add_rfc2047, whether it is encoded or not. Signed-off-by: Jeff King <peff@peff.net> --- Three things to note: 1. We call strbuf_add_wrapped_bytes for the non-encoded case. This nicely wraps on word and multi-character boundaries. But it will never do a "hard" wrap if there are no word boundaries, and it probably should at the 998-character mark which rfc2822 specifies as a hard limit. I don't know how much we care. For something like that you'd have to be maliciously trying to create a bogus patch. If you're mailing it, you could just create the bogus mail by hand. If you're trying to buffer overflow somebody's "format-patch | am" script, it won't do anything, as mailinfo does not have a limit on line length. 2. For the non-quoted case, technically we want to indent less on the first line than we do on subsequent lines (to account for "Subject: [PATCH]"). strbuf_add_wrapped_bytes doesn't support that notion. We could add it, but it probably doesn't matter. We just end up wrapping the subsequent lines a little tighter than we need to. 3. I used RFC2822's SHOULD value of 78 characters as a line length. That's probably unnecessarily conservative. In theory wrapping shouldn't make a difference to the data, but maybe people who hand-edit the result would prefer not to see wrapping? I dunno. In that case, we could set it to something higher like 120, which would still wrap the really ridiculous cases. pretty.c | 32 +++++++++++++---- t/t4014-format-patch.sh | 84 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 108 insertions(+), 8 deletions(-) diff --git a/pretty.c b/pretty.c index 8549934..0e167f4 100644 --- a/pretty.c +++ b/pretty.c @@ -216,7 +216,15 @@ static int is_rfc2047_special(char ch) static void add_rfc2047(struct strbuf *sb, const char *line, int len, const char *encoding) { - int i, last; + static const int max_length = 78; /* per rfc2822 */ + int i; + int line_len; + + /* How many bytes are already used on the current line? */ + for (i = sb->len - 1; i >= 0; i--) + if (sb->buf[i] == '\n') + break; + line_len = sb->len - (i+1); for (i = 0; i < len; i++) { int ch = line[i]; @@ -225,14 +233,21 @@ static void add_rfc2047(struct strbuf *sb, const char *line, int len, if ((i + 1 < len) && (ch == '=' && line[i+1] == '?')) goto needquote; } - strbuf_add(sb, line, len); + strbuf_add_wrapped_bytes(sb, line, len, 0, 1, max_length - line_len); return; needquote: strbuf_grow(sb, len * 3 + strlen(encoding) + 100); strbuf_addf(sb, "=?%s?q?", encoding); - for (i = last = 0; i < len; i++) { + line_len += strlen(encoding) + 5; /* 5 for =??q? */ + for (i = 0; i < len; i++) { unsigned ch = line[i] & 0xFF; + + if (line_len >= max_length - 2) { + strbuf_addf(sb, "?=\n =?%s?q?", encoding); + line_len = strlen(encoding) + 5 + 1; /* =??q? plus SP */ + } + /* * We encode ' ' using '=20' even though rfc2047 * allows using '_' for readability. Unfortunately, @@ -240,12 +255,14 @@ needquote: * leave the underscore in place. */ if (is_rfc2047_special(ch) || ch == ' ') { - strbuf_add(sb, line + last, i - last); strbuf_addf(sb, "=%02X", ch); - last = i + 1; + line_len += 3; + } + else { + strbuf_addch(sb, ch); + line_len++; } } - strbuf_add(sb, line + last, len - last); strbuf_addstr(sb, "?="); } @@ -1106,11 +1123,10 @@ void pp_title_line(enum cmit_fmt fmt, const char *encoding, int need_8bit_cte) { - const char *line_separator = (fmt == CMIT_FMT_EMAIL) ? "\n " : " "; struct strbuf title; strbuf_init(&title, 80); - *msg_p = format_subject(&title, *msg_p, line_separator); + *msg_p = format_subject(&title, *msg_p, " "); strbuf_grow(sb, title.len + 1024); if (subject) { diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh index 027c13d..9c66367 100755 --- a/t/t4014-format-patch.sh +++ b/t/t4014-format-patch.sh @@ -709,4 +709,88 @@ test_expect_success TTY 'format-patch --stdout paginates' ' test_path_is_missing .git/pager_used ' +test_expect_success 'format-patch handles multi-line subjects' ' + rm -rf patches/ && + echo content >>file && + for i in one two three; do echo $i; done >msg && + git add file && + git commit -F msg && + git format-patch -o patches -1 && + grep ^Subject: patches/0001-one.patch >actual && + echo "Subject: [PATCH] one two three" >expect && + test_cmp expect actual +' + +test_expect_success 'format-patch handles multi-line encoded subjects' ' + rm -rf patches/ && + echo content >>file && + for i in en två tre; do echo $i; done >msg && + git add file && + git commit -F msg && + git format-patch -o patches -1 && + grep ^Subject: patches/0001-en.patch >actual && + echo "Subject: [PATCH] =?UTF-8?q?en=20tv=C3=A5=20tre?=" >expect && + test_cmp expect actual +' + +M8="foo bar " +M64=$M8$M8$M8$M8$M8$M8$M8$M8 +M512=$M64$M64$M64$M64$M64$M64$M64$M64 +cat >expect <<'EOF' +Subject: [PATCH] foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo + bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar + foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo + bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar + foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo + bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar + foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo + bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar + foo bar foo bar foo bar foo bar +EOF +test_expect_success 'format-patch wraps extremely long headers (ascii)' ' + echo content >>file && + git add file && + git commit -m "$M512" && + git format-patch --stdout -1 >patch && + sed -n "/^Subject/p; /^ /p; /^$/q" <patch >subject && + test_cmp expect subject +' + +M8="föö bar " +M64=$M8$M8$M8$M8$M8$M8$M8$M8 +M512=$M64$M64$M64$M64$M64$M64$M64$M64 +cat >expect <<'EOF' +Subject: [PATCH] =?UTF-8?q?f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= + =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= + =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= + =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= + =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= + =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= + =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= + =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= + =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= + =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= + =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= + =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= + =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= + =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= + =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= + =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= + =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= + =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= + =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= + =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= + =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6=C3=B6=20bar=20f=C3=B6?= + =?UTF-8?q?=C3=B6=20bar=20f=C3=B6=C3=B6=20bar?= +EOF +test_expect_success 'format-patch wraps extremely long headers (rfc2047)' ' + rm -rf patches/ && + echo content >>file && + git add file && + git commit -m "$M512" && + git format-patch --stdout -1 >patch && + sed -n "/^Subject/p; /^ /p; /^$/q" <patch >subject && + test_cmp expect subject +' + test_done -- 1.7.2.5.15.gfdd1c ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/3] format-patch: rfc2047-encode newlines in headers 2011-02-23 9:48 ` Jeff King 2011-02-23 9:50 ` [PATCH 1/3] strbuf: add fixed-length version of add_wrapped_text Jeff King 2011-02-23 9:58 ` [PATCH 2/3] format-patch: wrap long header lines Jeff King @ 2011-02-23 9:59 ` Jeff King 2011-02-23 21:47 ` Junio C Hamano 2011-02-23 15:16 ` [PATCH] generate a valid rfc2047 mail header for multi-line subject xzer 3 siblings, 1 reply; 15+ messages in thread From: Jeff King @ 2011-02-23 9:59 UTC (permalink / raw) To: Junio C Hamano; +Cc: xzer, git These should generally never happen, as we already concatenate multiples in subjects into a single line. But let's be defensive, since not encoding them means we will output malformed headers. Signed-off-by: Jeff King <peff@peff.net> --- pretty.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/pretty.c b/pretty.c index 0e167f4..65d20a7 100644 --- a/pretty.c +++ b/pretty.c @@ -228,7 +228,7 @@ static void add_rfc2047(struct strbuf *sb, const char *line, int len, for (i = 0; i < len; i++) { int ch = line[i]; - if (non_ascii(ch)) + if (non_ascii(ch) || ch == '\n') goto needquote; if ((i + 1 < len) && (ch == '=' && line[i+1] == '?')) goto needquote; @@ -254,7 +254,7 @@ needquote: * many programs do not understand this and just * leave the underscore in place. */ - if (is_rfc2047_special(ch) || ch == ' ') { + if (is_rfc2047_special(ch) || ch == ' ' || ch == '\n') { strbuf_addf(sb, "=%02X", ch); line_len += 3; } -- 1.7.2.5.15.gfdd1c ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] format-patch: rfc2047-encode newlines in headers 2011-02-23 9:59 ` [PATCH 3/3] format-patch: rfc2047-encode newlines in headers Jeff King @ 2011-02-23 21:47 ` Junio C Hamano 2011-02-24 7:15 ` Jeff King 0 siblings, 1 reply; 15+ messages in thread From: Junio C Hamano @ 2011-02-23 21:47 UTC (permalink / raw) To: Jeff King; +Cc: xzer, git Jeff King <peff@peff.net> writes: > These should generally never happen, as we already > concatenate multiples in subjects into a single line. But > let's be defensive, since not encoding them means we will > output malformed headers. In this particular case, wouldn't it be more conservative and defensive to produce malformed headers so that the patch won't leave the originator? I have a suspicion that mailinfo would choke on the output of this one, even though I didn't try. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] format-patch: rfc2047-encode newlines in headers 2011-02-23 21:47 ` Junio C Hamano @ 2011-02-24 7:15 ` Jeff King 0 siblings, 0 replies; 15+ messages in thread From: Jeff King @ 2011-02-24 7:15 UTC (permalink / raw) To: Junio C Hamano; +Cc: xzer, git On Wed, Feb 23, 2011 at 01:47:53PM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > These should generally never happen, as we already > > concatenate multiples in subjects into a single line. But > > let's be defensive, since not encoding them means we will > > output malformed headers. > > In this particular case, wouldn't it be more conservative and defensive to > produce malformed headers so that the patch won't leave the > originator? No. If you go back to xzer's original mail, the malformed headers didn't cause messages not to be sent. They just resulted in corrupted and missing data on the receiver side. I don't think we can rely on any MUA or MTA having a particular behavior for malformed mail. Some of them may complain, but many won't. > I have a suspicion that mailinfo would choke on the output of this > one, even though I didn't try. Actually, it does quite well. Without "-k", mailinfo turns it into a single line, which is what I would expect. With "-k", the info file contains: Author: Jeff King Email: peff@peff.net Subject: this is a long Subject: subject line with Subject: many lines in it Date: Wed, 23 Feb 2011 11:30:43 -0500 which "git am" turns back into the original multi-line subject. So I think it's definitely the right thing to do. Not only does it avoid us generating malformed mail, but because existing mailinfo handles it sanely, it makes it easy to do a "preserve-newlines" patch on top (which I'm still not sure is a great idea, but I can see the use in certain circumstances). -Peff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] generate a valid rfc2047 mail header for multi-line subject. 2011-02-23 9:48 ` Jeff King ` (2 preceding siblings ...) 2011-02-23 9:59 ` [PATCH 3/3] format-patch: rfc2047-encode newlines in headers Jeff King @ 2011-02-23 15:16 ` xzer 2011-02-23 16:35 ` Jeff King 3 siblings, 1 reply; 15+ messages in thread From: xzer @ 2011-02-23 15:16 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git 2011/2/23 Jeff King <peff@peff.net>: > On Wed, Feb 23, 2011 at 03:08:54AM -0500, Jeff King wrote: > >> Yeah, I think the best path forward is: >> >> 1. Stop feeding "pre-folded" subject lines to the email formatter. >> Give it the regular subject line with no newlines. >> >> 2. rfc2047 encoding should encode a literal newline. Which should >> generally never happen, but is probably the most sane thing to do >> if it does. >> >> 3. rfc2047 should fold all lines at some sane length. As it is now, we >> may sometimes generate long lines in headers (though in practice, I >> doubt this is much of a problem). > > So here is a series that does this. It still doesn't preserve subject > newlines in "format-patch | am", but I don't think that was ever a goal > of the code. If we want to add it as an optional feature on top (maybe > as part of "-k"?), it should be easy to do (since the rfc2047 encoding > will now preserve embedded newlines). > > [1/3]: strbuf: add fixed-length version of add_wrapped_text > [2/3]: format-patch: wrap long header lines > [3/3]: format-patch: rfc2047-encode newlines in headers > > -Peff > To the first point, I really want to find a way that we can remain the line breaker after import a formatted patch. That's why I add a new function to product multi line header, I want to do something which is special to subject. In my usage, I told my men every day that don't write too long in the first paragraph, but there are always somebody who forgets it, then I will get a patch with a very long subject just like a nightmare(yes, I gave them my temporary fix which I submitted here, so they can write as long as they want). So I want to know whether we can generate a 2047 compatible header so that mailer can catch it correctly and the git-am can import it with line breaker correctly too. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] generate a valid rfc2047 mail header for multi-line subject. 2011-02-23 15:16 ` [PATCH] generate a valid rfc2047 mail header for multi-line subject xzer @ 2011-02-23 16:35 ` Jeff King 0 siblings, 0 replies; 15+ messages in thread From: Jeff King @ 2011-02-23 16:35 UTC (permalink / raw) To: xzer; +Cc: Junio C Hamano, git On Thu, Feb 24, 2011 at 12:16:04AM +0900, xzer wrote: > To the first point, I really want to find a way that we can remain the > line breaker > after import a formatted patch. That's why I add a new function to product multi > line header, I want to do something which is special to subject. In my usage, > I told my men every day that don't write too long in the first > paragraph, but there > are always somebody who forgets it, then I will get a patch with a > very long subject > just like a nightmare(yes, I gave them my temporary fix which I submitted here, > so they can write as long as they want). > > So I want to know whether we can generate a 2047 compatible header so > that mailer > can catch it correctly and the git-am can import it with line breaker > correctly too. Yes. With my patches, if you feed a subject with linebreaks to add_rfc2047, they will be encoded. So you just need an extra patch on top of mine that will use straight linebreaks (_not_ linebreaks with an extra space) in pp_title_line. Below is a quick and dirty patch to do that when "-k" is specified. You will also need to specify "-k" with applying it with "git am", but other than that it seems to work. However, I'm still not sure it's a good idea. Other parts of git will try to treat your paragraph as a single line (e.g., git log --oneline). Plus this patch is ugly because of the number of layers of abstraction we have to pass the keep-subject through. I'm not sure there's a good way around that. --- diff --git a/builtin/log.c b/builtin/log.c index d8c6c28..3fdf488 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -768,7 +768,7 @@ static void make_cover_letter(struct rev_info *rev, int use_stdout, pp_user_info(NULL, CMIT_FMT_EMAIL, &sb, committer, DATE_RFC2822, encoding); pp_title_line(CMIT_FMT_EMAIL, &msg, &sb, subject_start, extra_headers, - encoding, need_8bit_cte); + encoding, need_8bit_cte, 0); pp_remainder(CMIT_FMT_EMAIL, &msg, &sb, 0); printf("%s\n", sb.buf); @@ -1130,6 +1130,7 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix) die ("-n and -k are mutually exclusive."); if (keep_subject && subject_prefix) die ("--subject-prefix and -k are mutually exclusive."); + rev.preserve_subject = keep_subject; argc = setup_revisions(argc, argv, &rev, &s_r_opt); if (argc > 1) diff --git a/commit.h b/commit.h index 659c87c..6eace1c 100644 --- a/commit.h +++ b/commit.h @@ -73,6 +73,7 @@ struct pretty_print_context int abbrev; const char *subject; const char *after_subject; + int preserve_subject; enum date_mode date_mode; int need_8bit_cte; int show_notes; @@ -107,7 +108,8 @@ void pp_title_line(enum cmit_fmt fmt, const char *subject, const char *after_subject, const char *encoding, - int need_8bit_cte); + int need_8bit_cte, + int preserve_lines); void pp_remainder(enum cmit_fmt fmt, const char **msg_p, struct strbuf *sb, diff --git a/log-tree.c b/log-tree.c index b46ed3b..9b9aaf2 100644 --- a/log-tree.c +++ b/log-tree.c @@ -504,6 +504,7 @@ void show_log(struct rev_info *opt) ctx.date_mode = opt->date_mode; ctx.abbrev = opt->diffopt.abbrev; ctx.after_subject = extra_headers; + ctx.preserve_subject = opt->preserve_subject; ctx.reflog_info = opt->reflog_info; pretty_print_commit(opt->commit_format, commit, &msgbuf, &ctx); diff --git a/pretty.c b/pretty.c index 65d20a7..315f1d2 100644 --- a/pretty.c +++ b/pretty.c @@ -1121,12 +1121,13 @@ void pp_title_line(enum cmit_fmt fmt, const char *subject, const char *after_subject, const char *encoding, - int need_8bit_cte) + int need_8bit_cte, + int preserve_lines) { struct strbuf title; strbuf_init(&title, 80); - *msg_p = format_subject(&title, *msg_p, " "); + *msg_p = format_subject(&title, *msg_p, preserve_lines ? "\n" : " "); strbuf_grow(sb, title.len + 1024); if (subject) { @@ -1254,7 +1255,8 @@ void pretty_print_commit(enum cmit_fmt fmt, const struct commit *commit, /* These formats treat the title line specially. */ if (fmt == CMIT_FMT_ONELINE || fmt == CMIT_FMT_EMAIL) pp_title_line(fmt, &msg, sb, context->subject, - context->after_subject, encoding, need_8bit_cte); + context->after_subject, encoding, need_8bit_cte, + context->preserve_subject); beginning_of_body = sb->len; if (fmt != CMIT_FMT_ONELINE) diff --git a/revision.h b/revision.h index 05659c6..f8ddd83 100644 --- a/revision.h +++ b/revision.h @@ -90,7 +90,8 @@ struct rev_info { abbrev_commit:1, use_terminator:1, missing_newline:1, - date_mode_explicit:1; + date_mode_explicit:1, + preserve_subject:1; unsigned int disable_stdin:1; enum date_mode date_mode; ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] generate a valid rfc2047 mail header for multi-line subject. 2011-02-23 8:08 ` Jeff King 2011-02-23 9:48 ` Jeff King @ 2011-02-23 17:34 ` Junio C Hamano 2011-02-24 7:34 ` Jeff King 1 sibling, 1 reply; 15+ messages in thread From: Junio C Hamano @ 2011-02-23 17:34 UTC (permalink / raw) To: Jeff King; +Cc: xzer, git Jeff King <peff@peff.net> writes: > Yeah, I think the best path forward is: > > 1. Stop feeding "pre-folded" subject lines to the email formatter. > Give it the regular subject line with no newlines. A bit of history. The original design of the pp_title_line() function since 4234a76 (Extend --pretty=oneline to cover the first paragraph, 2007-06-11) was to notice a multi-line paragraph and turn embedded newlines into line folds (this seems to be a breakage specific to non-ASCII titles). As RFC 5322 (or 822/2822 for that matter) does not allow newlines in field bodies (2.2: A field body MUST NOT include CR and LF except when used in "folding" and "unfolding"...), it was the only way to allow the recipient to tell where the original line breaks were to fold at the line breaks in the original commit message. Then the recipient _can_ be git aware and turn the folding CRLF-SP into a LF, not just a SP, relying on the hope that the transport between the sender and the recipient would not clobber line folding, to recover the original. The rebase pipeline (i.e. "format-patch | am") would have satisfied such a flaky assumption and that was the only reason I wrote the line folding on the output side that way. These days, however, "am" invoked in the rebase pipeline knows to slurp the message not from the patch text but from the original message, so we can safely depart form the original design rationale. > 2. rfc2047 encoding should encode a literal newline. Which should > generally never happen, but is probably the most sane thing to do > if it does. I was re-reading RFC 2047 and its 5. (3) [Page 8] seems to imply that this might be allowed: "Only printable and white space character data should be encoded using this scheme."; I think LF is counted as a white space character in this context, but it is a bit unclear. If this "encode newline via 2047" _were_ allowed, I would say that my preference is not to go with your 1. above. Instead I would prefer to see us feed the entire first paragraph, whether it is a single-liner or multi-line paragraph, to the step 2 ... > 3. rfc2047 should fold all lines at some sane length... ... and the have step3 fold its result to limit the physical length of the output line(s). Note that a multi-line first paragraph always will be encoded using 2047 because we cannot have a newline in the field body per RFC5322. But going the above route would allow us to recover the original first paragraph intact. We might need to tweak the receiving end a bit, though. IIRC, mailinfo output assumed we will always be dealing with a single-liner subject. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] generate a valid rfc2047 mail header for multi-line subject. 2011-02-23 17:34 ` Junio C Hamano @ 2011-02-24 7:34 ` Jeff King 0 siblings, 0 replies; 15+ messages in thread From: Jeff King @ 2011-02-24 7:34 UTC (permalink / raw) To: Junio C Hamano; +Cc: xzer, git On Wed, Feb 23, 2011 at 09:34:32AM -0800, Junio C Hamano wrote: > A bit of history. The original design of the pp_title_line() function > since 4234a76 (Extend --pretty=oneline to cover the first paragraph, > 2007-06-11) was to notice a multi-line paragraph and turn embedded > newlines into line folds (this seems to be a breakage specific to > non-ASCII titles). Yes, the only actual breakage is the interaction of the folding with non-ascii titles. The "fold everything" is a new feature. And one that is probably not all that necessary in the real world, as rfc2822 allows up to 998 characters. So it is more about complying with the SHOULD there than the MUST. > As RFC 5322 (or 822/2822 for that matter) does not allow newlines in field Wow, I'm out of date. I had no idea 2822 had been superseded. ;) > bodies (2.2: A field body MUST NOT include CR and LF except when used in > "folding" and "unfolding"...), it was the only way to allow the recipient > to tell where the original line breaks were to fold at the line breaks in > the original commit message. Then the recipient _can_ be git aware and > turn the folding CRLF-SP into a LF, not just a SP, relying on the hope > that the transport between the sender and the recipient would not clobber > line folding, to recover the original. Ah, that makes the current code make a lot more sense. Thanks for the history. > The rebase pipeline (i.e. "format-patch | am") would have satisfied such a > flaky assumption and that was the only reason I wrote the line folding on > the output side that way. These days, however, "am" invoked in the rebase > pipeline knows to slurp the message not from the patch text but from the > original message, so we can safely depart form the original design rationale. Agreed. > I was re-reading RFC 2047 and its 5. (3) [Page 8] seems to imply that this > might be allowed: "Only printable and white space character data should be > encoded using this scheme."; I think LF is counted as a white space > character in this context, but it is a bit unclear. Yeah, the section is a little vague. I think the intent is that senders should encode reasonable text things, not multimedia garbage, and that receivers should be wary of getting arbitrary crap. So I think we are following the spirit of the section in any case. Reading over it, though, I do notice that we are specifically forbidden to break multi-byte characters between encoded-words. And my implementation doesn't take care about that. I think in practice, any reasonable implementation would just concatenate the results and be happy, but you never know. > If this "encode newline via 2047" _were_ allowed, I would say that my > preference is not to go with your 1. above. Instead I would prefer to see > us feed the entire first paragraph, whether it is a single-liner or > multi-line paragraph, to the step 2 ... Hmm. I disagree. I thought the decision was made long ago to convert such multi-paragraph subjects into a single line in most cases. Because supporting such subjects at all was never about encouraging people to flaunt the subject-line convention, but about letting the tools have a reasonable default behavior for commits imported from systems that did not follow that convention. On top of that, I think there is some question of how encoded newlines in a subject line will be handled by MUAs in the wild (given the ambiguity of rfc2047 mentioned above). So perhaps it is better to be conservative and not generate them by default. And on top of that, it is not just a "how should it be if starting from scratch" decision. We have been flattening multi-line ascii subjects for years, so this would be a behavior change. So I would think this should be triggered by an option ("-k" makes the most sense to me) if anything. I am somewhat lukewarm on even that; as you mentioned above, the rebase pipeline has a better preservation mechanism these days, so it is really about people who want to email patches to each other while disregarding the subject-line convention. > > 3. rfc2047 should fold all lines at some sane length... > > ... and the have step3 fold its result to limit the physical length of the > output line(s). Note that a multi-line first paragraph always will be > encoded using 2047 because we cannot have a newline in the field body per > RFC5322. But going the above route would allow us to recover the original > first paragraph intact. Yes, exactly. > We might need to tweak the receiving end a bit, though. IIRC, mailinfo > output assumed we will always be dealing with a single-liner subject. I don't know if it's intentional or accidental, but mailinfo handles it just as I would expect. See my other message. -Peff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] generate a valid rfc2047 mail header for multi-line subject. 2011-02-22 20:43 ` Junio C Hamano 2011-02-23 8:08 ` Jeff King @ 2011-02-23 15:34 ` xzer 2011-02-23 17:45 ` Junio C Hamano 1 sibling, 1 reply; 15+ messages in thread From: xzer @ 2011-02-23 15:34 UTC (permalink / raw) To: Junio C Hamano; +Cc: git 2011/2/23 Junio C Hamano <gitster@pobox.com>: > xzer <xiaozhu@gmail.com> writes: > >> Subject: Re: [PATCH] generate a valid rfc2047 mail header for multi-line subject. > > We prefer to have "[PATCH] subsystem: description without final full-stop" here. > >> There is still a problem that git-am will lost the line break. > > What does "still" refer to? It is unclear under what condition the > command lose "the line break" (nor which line break you are refering to; I > am guessing that you have a commit that begins with a multi-line paragraph > and you are talking about line breaks between the lines in the first > paragraph). > Yes, that is what I am refering, the line breaks in the first paragraph. >> It's not easy to retain it, but as the first step, we can generate >> a valid rfc2047 header now. > > Please describe what is broken (iow, "Given this sample input, we > currently generate this output, which is not a valid rfc2047") and what > the new output looks like ("Update pp_title_line() to generate this output > instead.") > At present we can only concatenate the lines in the first paragraph so that we can generate a valid rfc2047 mail, but we will lost the line breaks after import the patch by git-am. >> --- > > Missing sign-off with a real name. > I am sorry that I didn't find the document of submitting a patch until yesterday, Thanks for your comment. >> diff --git a/pretty.c b/pretty.c >> index 8549934..f18a38d 100644 >> --- a/pretty.c >> +++ b/pretty.c >> @@ -249,6 +249,33 @@ needquote: >> strbuf_addstr(sb, "?="); >> } >> >> +static void add_rfc2047_multiline(struct strbuf *sb, const char *line, int len, >> + const char *encoding) >> +{ >> + int first = 1; >> + char *mline = xmemdupz(line, len); >> + const char *cline = mline; >> + int offset = 0, linelen = 0; >> + for (;;) { > > You seem to have indent that uses SPs instead of HT around here... > >> + linelen = get_one_line(cline); > > I can see you are trying to be careful not to let get_one_line() overstep > past "len" the caller gave you by making a copy first, but is this > overhead really necessary? After all we know in this static function that > the caller is feeding the contents from a strbuf, which always have a > terminating NUL (and that is why it is Ok that get_one_line() is not a > counted string interface). > I am not sure that who will call this function in future, I think since there is a argument as len, so I'd better to obey the function declare. >> + >> + cline += linelen; >> + >> + if (!linelen) >> + break; >> + >> + if (!first) >> + strbuf_addf(sb, "\n "); >> + >> + offset = *(cline -1) == '\n'; >> + >> + add_rfc2047(sb, cline-linelen, linelen-offset, encoding); >> + first = 0; >> + >> + } >> + free(mline); >> +} > > So the general idea of this change (I am thinking aloud what should be in > the updated commit log message as the problem description) is that: > > - We currently give an entire multi-line paragraph string to the > add_rfc2047() function to be formatted as the title of the commit; > > - The add_rfc2047() functionjust passes "\n" through, without making it a > folding whitespace followed by a newline, to help callers that want to > use this function to produce a header line that is rfc 2822 conformant; > > - The patch introduces a new function add_rfc2047_multiline() that splits > its input and performs line folding for such a caller (namely, the > pp_title_line() function); > > - Another caller of add_rfc2047(), pp_user_info, is not changed, and it > won't fold the name of the user that appear on the From: line. > > It is unclear if the last point is really the right thing to do, though. > It is not a new problem that an author name that has a "\n" in it would > break the output, but we probably would want to fix that case too here? > Your comment is just right for what I tried to do, I explained why I add a new function for subject specially in the mail which replied to Jeff, I want to remain the line breaks after import the patch, so I think I need do something here in future, it will be compatible with rfc2047 and also can be imported with line breaks correctly. I don't know how yet, so I just want to left a possibility. So I introduce a new function for subject only. xzer ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] generate a valid rfc2047 mail header for multi-line subject. 2011-02-23 15:34 ` xzer @ 2011-02-23 17:45 ` Junio C Hamano 0 siblings, 0 replies; 15+ messages in thread From: Junio C Hamano @ 2011-02-23 17:45 UTC (permalink / raw) To: xzer; +Cc: Junio C Hamano, git xzer <xiaozhu@gmail.com> writes: > 2011/2/23 Junio C Hamano <gitster@pobox.com>: >> xzer <xiaozhu@gmail.com> writes: >> >>> Subject: Re: [PATCH] generate a valid rfc2047 mail header for multi-line subject. >> >> We prefer to have "[PATCH] subsystem: description without final full-stop" here. >> >>> There is still a problem that git-am will lost the line break. >> >> What does "still" refer to? It is unclear under what condition the >> command lose "the line break" (nor which line break you are refering to; I >> am guessing that you have a commit that begins with a multi-line paragraph >> and you are talking about line breaks between the lines in the first >> paragraph). > > Yes, that is what I am refering, the line breaks in the first paragraph. Hmm, I gave a suggestion and asked three questions, and only get one answer back? >> ... After all we know in this static function that >> the caller is feeding the contents from a strbuf, which always have a >> terminating NUL (and that is why it is Ok that get_one_line() is not a >> counted string interface). > > I am not sure that who will call this function in future, I think since there is > a argument as len, so I'd better to obey the function declare. If that is the case I would have preferred to see you give get_one_line() that is a function static to this file an ability to read from a counted string, instead of making an extra allcation. But I think you will notice that all the callchain that pass a pointer into the message around knows and relies on the fact that the buffer is NUL terminated if you look around in the file, and that was why I made that suggestion. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2011-02-24 7:34 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-02-14 8:09 [PATCH] generate a valid rfc2047 mail header for multi-line subject xzer 2011-02-22 20:43 ` Junio C Hamano 2011-02-23 8:08 ` Jeff King 2011-02-23 9:48 ` Jeff King 2011-02-23 9:50 ` [PATCH 1/3] strbuf: add fixed-length version of add_wrapped_text Jeff King 2011-02-23 9:58 ` [PATCH 2/3] format-patch: wrap long header lines Jeff King 2011-02-23 9:59 ` [PATCH 3/3] format-patch: rfc2047-encode newlines in headers Jeff King 2011-02-23 21:47 ` Junio C Hamano 2011-02-24 7:15 ` Jeff King 2011-02-23 15:16 ` [PATCH] generate a valid rfc2047 mail header for multi-line subject xzer 2011-02-23 16:35 ` Jeff King 2011-02-23 17:34 ` Junio C Hamano 2011-02-24 7:34 ` Jeff King 2011-02-23 15:34 ` xzer 2011-02-23 17:45 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).