git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Carl Worth <cworth@cworth.org>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: git <git@vger.kernel.org>, Junio C Hamano <gitster@pobox.com>
Subject: Re: Make "git am" properly unescape lines matching ">>*From "
Date: Tue, 08 Jun 2010 14:52:34 -0700	[thread overview]
Message-ID: <87typdi44d.fsf@yoom.home.cworth.org> (raw)
In-Reply-To: <4C0EAD00.8000706@zytor.com>

[-- Attachment #1: Type: text/plain, Size: 3718 bytes --]

On Tue, 08 Jun 2010 13:50:08 -0700, "H. Peter Anvin" <hpa@zytor.com> wrote:
> On 06/08/2010 12:57 PM, Carl Worth wrote:
> > When I did that, I was careful to escape lines from the bodies of email
> > messages that begin with zero or more '>' characters followed
> > immediately by "From " (From_ lines) by adding an initial '>'. [2]
...
> The problem with that is that it is not universally applied.

Right. And since I can't fix this universe, I'd like to at least start
with getting notmuch and git to use the same thing. Currently, git is
using a non-standard not-quite-safe mbox format while notmuch doesn't
yet emit anything like mbox. So we have a nice opportunity to fix these
two projects to at least work well together, (if we can agree on a
format).

> As far as I can tell, the Content-Length: is the most reliably handled
> format and probably is what we should use.  This is the "mboxcl2" format
> in your list.[*]  Unfortunately "mboxcl2" and "mboxrd" cannot be
> distinguished from each other by inspection, which is a major defect of
> both formats.

What do you mean by "most reliably handled format"?

Of the four mbox formats listed on the page I cited[*], "mboxo" and
"mboxcl" are easy to discard as they both irreversibly corrupt messages.

That leaves both "mboxrd" and "mboxcl2" as candidates. Either of these
formats is reliable if both the reader and writer use the same
format. When the reader and writer don't agree, then there are problems
as follows ("W:" indicates writing, "R:" indicates reading expecting a
particular format):

W:mboxrd  then R:mboxcl2 -> Reader may corrupt by failing to remove '>'
			    Reader must give up/guess without CL headers
			    Guessing is at least unlikely to mis-split messages

W:mboxcl2 then R:mboxrd  -> Reader may corrupt by erroneously removing '>'
			    Reader may mis-split messages on "From " in content

I preferred to implement mboxrd over mboxcl2 for several reasons:

  1. The mboxrd writer implementation is much simpler. This format
     affords a simple streaming implementation where mboxcl2 requires
     knowing the length of the message in advance.

  2. The mboxrd format is robust in the face of file changes that
     invalidate the Content-Length headers, (for example, a person
     can hand-edit an mboxrd file without invalidating it, but cannot do
     the same with an mboxcl2 file).

  3. The mboxrd reader implementation is much simpler. An mboxcl2 reader
     necessarily has special-cases that an mboxrd implementation does
     not. What to do if there is no Content-Length header? What to do if
     the Content-Length header appears wrong? etc. Recovery code for
     these cases might well be to fallback to something like an mboxrd
     implementation, which demonstrates the increased complexity here.

As can be seen in my patch, doing an mboxrd reader in git-mailsplit was
quite simple. An mboxcl2 reader would be quite a bit more complicated,
but with no actual benefit in reliability, (assuming that the reader
matches the writer).

> The statement that "the entire "mbox" family of mailbox formats is
> gradually becoming irrelevant, and of only historical interest" is also
> pretty silly -- mbox is still the preferred format for moving groups of
> email from MUA to MUA, even if it is no longer used for active live
> spool storage.  But, of course, you knew that already.

Indeed. Though I was surprised to recently find that postfix does still
by default deliver to /var/mail/$user in "mboxo" format (ugh).

-Carl

[*] http://homepage.ntlworld.com/jonathan.deboynepollard/FGA/mail-mbox-formats.html

-- 
carl.d.worth@intel.com

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

  reply	other threads:[~2010-06-08 21:52 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <87hbldjo0s.fsf@yoom.home.cworth.org>
2010-06-08 20:02 ` [PATCH 1/2] mailsplit: Remove any '>' characters used to escape From_ lines in mbox Carl Worth
2010-06-08 20:02   ` [PATCH 2/2] Add test from From_-line escaping Carl Worth
2010-06-08 20:47 ` Make "git am" properly unescape lines matching ">>*From " Carl Worth
2010-06-08 20:54   ` H. Peter Anvin
2010-06-08 21:30     ` Carl Worth
2010-06-08 20:50 ` H. Peter Anvin
2010-06-08 21:52   ` Carl Worth [this message]
2010-06-08 22:10     ` H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87typdi44d.fsf@yoom.home.cworth.org \
    --to=cworth@cworth.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hpa@zytor.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).