git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* imap-send badly handles commit bodies beginning with "From <"
@ 2011-10-28 18:00 Andrew Eikum
  2011-10-28 20:32 ` Jeff King
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Eikum @ 2011-10-28 18:00 UTC (permalink / raw)
  To: git

Ran into this today. I had a commit message that looked like:

---
Do something

>From <http://url>:
Words
---

I put it through imap-send to email it to my project, and ended up
with this output:

sending 1 messages
 200% (2/1) done

On the server side, it was split into two mails on either side of that
commit message's From line with neither mail actually containing the
From line. To fix it, I just changed it to "Copied from <url>:" :-P

Ain't mbox grand?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: imap-send badly handles commit bodies beginning with "From <"
  2011-10-28 18:00 imap-send badly handles commit bodies beginning with "From <" Andrew Eikum
@ 2011-10-28 20:32 ` Jeff King
  2011-10-28 21:21   ` Andrew Eikum
  2011-10-30  9:01   ` Magnus Bäck
  0 siblings, 2 replies; 8+ messages in thread
From: Jeff King @ 2011-10-28 20:32 UTC (permalink / raw)
  To: Andrew Eikum; +Cc: git

On Fri, Oct 28, 2011 at 01:00:44PM -0500, Andrew Eikum wrote:

> On the server side, it was split into two mails on either side of that
> commit message's From line with neither mail actually containing the
> From line. To fix it, I just changed it to "Copied from <url>:" :-P
> 
> Ain't mbox grand?

Mbox does have this problem, but I think in this case it is a
particularly crappy implementation of mbox in imap-send. Look at
imap-send.c:split_msg; it just looks for "From ".

It should at least check for something that looks like a timestamp, like
git-mailsplit does. Maybe mailsplit's is_from_line should be factored
out so that it can be reused in imap-send.

Want to work on a patch?

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: imap-send badly handles commit bodies beginning with "From <"
  2011-10-28 20:32 ` Jeff King
@ 2011-10-28 21:21   ` Andrew Eikum
  2011-10-28 21:37     ` Jeff King
  2011-10-30  9:01   ` Magnus Bäck
  1 sibling, 1 reply; 8+ messages in thread
From: Andrew Eikum @ 2011-10-28 21:21 UTC (permalink / raw)
  To: Jeff King; +Cc: Andrew Eikum, git

On Fri, Oct 28, 2011 at 01:32:57PM -0700, Jeff King wrote:
> Mbox does have this problem, but I think in this case it is a
> particularly crappy implementation of mbox in imap-send. Look at
> imap-send.c:split_msg; it just looks for "From ".
> 
> It should at least check for something that looks like a timestamp, like
> git-mailsplit does. Maybe mailsplit's is_from_line should be factored
> out so that it can be reused in imap-send.

Since we have a program called "mailsplit," wouldn't it make more
sense to have imap-send use its implementation to split mail instead
of sharing just the From line detection?

> Want to work on a patch?

I was hoping it'd be a quick matter of pulling mailsplit's
implementation out of builtin and into the top level, but I see it's
got some global variables that are tangled enough that I actually have
to understand the code before I can pull it apart :)

If no one beats me to it, I'll work on this next week. It's late on
Friday and I'm moving house this weekend.

Quick question, since I'm not intimately familiar with Git's code: I
was thinking of creating a new compilation unit at the top level,
mailutils.{c,h}, and referencing it from both imap-send.c and
builtin/splitmail.c. Does that seem like the right approach? Is there
an existing compilation unit I should be placing splitmail's guts into
instead?

Andrew

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: imap-send badly handles commit bodies beginning with "From <"
  2011-10-28 21:21   ` Andrew Eikum
@ 2011-10-28 21:37     ` Jeff King
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff King @ 2011-10-28 21:37 UTC (permalink / raw)
  To: Andrew Eikum; +Cc: git

On Fri, Oct 28, 2011 at 04:21:22PM -0500, Andrew Eikum wrote:

> Since we have a program called "mailsplit," wouldn't it make more
> sense to have imap-send use its implementation to split mail instead
> of sharing just the From line detection?

Potentially, yeah. I was thinking of just pulling over the from line
detection (which is the real black magic bit), but it looks like
imap-send's mbox handling could use some general attention (maybe it
would be possible to not read the entire mbox into memory, for example).

> I was hoping it'd be a quick matter of pulling mailsplit's
> implementation out of builtin and into the top level, but I see it's
> got some global variables that are tangled enough that I actually have
> to understand the code before I can pull it apart :)
>
> If no one beats me to it, I'll work on this next week. It's late on
> Friday and I'm moving house this weekend.

No rush. Let us know if you have questions.

> Quick question, since I'm not intimately familiar with Git's code: I
> was thinking of creating a new compilation unit at the top level,
> mailutils.{c,h}, and referencing it from both imap-send.c and
> builtin/splitmail.c. Does that seem like the right approach? Is there
> an existing compilation unit I should be placing splitmail's guts into
> instead?

Yes, I think a new file makes sense here. Make sure to update LIB_H and
LIB_OBJS in the Makefile.

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: imap-send badly handles commit bodies beginning with "From <"
  2011-10-28 20:32 ` Jeff King
  2011-10-28 21:21   ` Andrew Eikum
@ 2011-10-30  9:01   ` Magnus Bäck
  2011-11-01 15:38     ` Jeff King
  1 sibling, 1 reply; 8+ messages in thread
From: Magnus Bäck @ 2011-10-30  9:01 UTC (permalink / raw)
  To: git; +Cc: Andrew Eikum, Jeff King

On Friday, October 28, 2011 at 22:32 CEST,
     Jeff King <peff@peff.net> wrote:

> On Fri, Oct 28, 2011 at 01:00:44PM -0500, Andrew Eikum wrote:
> 
> > On the server side, it was split into two mails on either side
> > of that commit message's From line with neither mail actually
> > containing the From line. To fix it, I just changed it to "Copied
> > from <url>:" :-P
> > 
> > Ain't mbox grand?
> 
> Mbox does have this problem, but I think in this case it is a
> particularly crappy implementation of mbox in imap-send. Look at
> imap-send.c:split_msg; it just looks for "From ".

While there seems to be about a million different implementations of
mbox creation and parsing, the relevant RFC[0] points to [1] as an
authoritative source. The latter claims that lines matching "^From "
denote a message boundary and that lines within a message that match
the same pattern should be quoted with ">". That would suggest that
the problem isn't imap-send.c but whatever code produces the mbox
file in the first place. Of course, if that software isn't part of
Git I guess we'll have to deal with the situation anyway. And whatever
the RFCs say, we still need to be as compatible is possible with
whatever software is out there.

> It should at least check for something that looks like a timestamp,
> like git-mailsplit does. Maybe mailsplit's is_from_line should be
> factored out so that it can be reused in imap-send.

I guess that's a reasonable "liberal in what you accept" mitigation.

(As a sidenote, I'm getting the ">From" quoting in my maildir message
files where no such quoting is expected, so "From" lines are shown as
">From" in my MUA. I don't know if it's Procmail screwing things up or
what's going on.)

[0] http://tools.ietf.org/html/rfc4155
[1] http://qmail.org./man/man5/mbox.html

-- 
Magnus Bäck                   Opinions are my own and do not necessarily
SW Configuration Manager      represent the ones of my employer, etc.
Sony Ericsson

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: imap-send badly handles commit bodies beginning with "From <"
  2011-10-30  9:01   ` Magnus Bäck
@ 2011-11-01 15:38     ` Jeff King
  2011-11-01 16:06       ` Michael Haggerty
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2011-11-01 15:38 UTC (permalink / raw)
  To: Magnus Bäck; +Cc: git, Andrew Eikum

On Sun, Oct 30, 2011 at 10:01:11AM +0100, Magnus Bäck wrote:

> > Mbox does have this problem, but I think in this case it is a
> > particularly crappy implementation of mbox in imap-send. Look at
> > imap-send.c:split_msg; it just looks for "From ".
> 
> While there seems to be about a million different implementations of
> mbox creation and parsing, the relevant RFC[0] points to [1] as an
> authoritative source. The latter claims that lines matching "^From "
> denote a message boundary and that lines within a message that match
> the same pattern should be quoted with ">". That would suggest that
> the problem isn't imap-send.c but whatever code produces the mbox
> file in the first place. Of course, if that software isn't part of
> Git I guess we'll have to deal with the situation anyway. And whatever
> the RFCs say, we still need to be as compatible is possible with
> whatever software is out there.

Right. If you properly quote and unquote "From " lines, then mbox can be
unambiguous. But many pieces of software don't quote them (including
git, I think, but I didn't check), so it's prudent when reading to look
for something that actually appears to be a "From" line.

If somebody wants to tackle >From quoting of commit messages in
git-format-patch, they can certainly do so. In practice, it doesn't tend
to come up (because sane readers expect there to be a date at the end of
the line), so nobody has put forth the effort.

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: imap-send badly handles commit bodies beginning with "From <"
  2011-11-01 15:38     ` Jeff King
@ 2011-11-01 16:06       ` Michael Haggerty
  2011-11-01 16:14         ` Jeff King
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Haggerty @ 2011-11-01 16:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Magnus Bäck, git, Andrew Eikum

On 11/01/2011 04:38 PM, Jeff King wrote:
> Right. If you properly quote and unquote "From " lines, then mbox can be
> unambiguous.

That is not quite true.  The RFC says only that lines matching "^From "
should be quoted, not lines matching "^>From " (or, generally, "^>*From
").  So the quoting is lossy; it is *not* possible to tell whether a
line starting with ">From " should be unquoted (it could have been
">From " in the original).

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: imap-send badly handles commit bodies beginning with "From <"
  2011-11-01 16:06       ` Michael Haggerty
@ 2011-11-01 16:14         ` Jeff King
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff King @ 2011-11-01 16:14 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Magnus Bäck, git, Andrew Eikum

On Tue, Nov 01, 2011 at 05:06:48PM +0100, Michael Haggerty wrote:

> On 11/01/2011 04:38 PM, Jeff King wrote:
> > Right. If you properly quote and unquote "From " lines, then mbox can be
> > unambiguous.
> 
> That is not quite true.  The RFC says only that lines matching "^From "
> should be quoted, not lines matching "^>From " (or, generally, "^>*From
> ").  So the quoting is lossy; it is *not* possible to tell whether a
> line starting with ">From " should be unquoted (it could have been
> ">From " in the original).

That was what I meant by "properly". Note that the second link Magnus
mentioned (and which is referred to in the RFC in the paragraph
immediately following the discussion of "from" quoting) discusses this
explicitly.

The real issue with mbox is not that it can't be done well, but that you
have no clue which variant the writing end used. In practice, it works
OK because it's simple and those corner cases just don't come up much
(at least for a reasonably defensive reader).

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-11-01 16:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-28 18:00 imap-send badly handles commit bodies beginning with "From <" Andrew Eikum
2011-10-28 20:32 ` Jeff King
2011-10-28 21:21   ` Andrew Eikum
2011-10-28 21:37     ` Jeff King
2011-10-30  9:01   ` Magnus Bäck
2011-11-01 15:38     ` Jeff King
2011-11-01 16:06       ` Michael Haggerty
2011-11-01 16:14         ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).