git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] git-am: less strong format "mbox" detection
@ 2009-07-14  6:40 Nicolas Sebrecht
  2009-07-14  7:16 ` Giuseppe Bilotta
  0 siblings, 1 reply; 10+ messages in thread
From: Nicolas Sebrecht @ 2009-07-14  6:40 UTC (permalink / raw)
  To: git; +Cc: Giuseppe Bilotta, Nicolas Sebrecht

Thunderbird (v1.* at least) likes to start e-mails with "X-Account-Key:".
Also, I have some emails starting with "Return-Path:" or ""Delivered-To:".

Signed-off-by: Nicolas Sebrecht <ni.s@laposte.net>
---
 git-am.sh |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/git-am.sh b/git-am.sh
index d64d997..d10a8e0 100755
--- a/git-am.sh
+++ b/git-am.sh
@@ -169,7 +169,7 @@ check_patch_format () {
 		read l2
 		read l3
 		case "$l1" in
-		"From "* | "From: "*)
+		"From "* | "From: "* | "X-Account-Key:"* | "Return-Path:"* | "Delivered-To:"*)
 			patch_format=mbox
 			;;
 		'# This series applies on GIT commit'*)
-- 
1.6.4.rc0.121.g2937a.dirty

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] git-am: less strong format "mbox" detection
  2009-07-14  6:40 [PATCH] git-am: less strong format "mbox" detection Nicolas Sebrecht
@ 2009-07-14  7:16 ` Giuseppe Bilotta
  2009-07-14  8:20   ` [PATCH] " Nicolas Sebrecht
  0 siblings, 1 reply; 10+ messages in thread
From: Giuseppe Bilotta @ 2009-07-14  7:16 UTC (permalink / raw)
  To: Nicolas Sebrecht; +Cc: git

On Tue, Jul 14, 2009 at 8:40 AM, Nicolas Sebrecht<ni.s@laposte.net> wrote:
> Thunderbird (v1.* at least) likes to start e-mails with "X-Account-Key:".
> Also, I have some emails starting with "Return-Path:" or ""Delivered-To:".
>
> Signed-off-by: Nicolas Sebrecht <ni.s@laposte.net>
> ---
>  git-am.sh |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/git-am.sh b/git-am.sh
> index d64d997..d10a8e0 100755
> --- a/git-am.sh
> +++ b/git-am.sh
> @@ -169,7 +169,7 @@ check_patch_format () {
>                read l2
>                read l3
>                case "$l1" in
> -               "From "* | "From: "*)
> +               "From "* | "From: "* | "X-Account-Key:"* | "Return-Path:"* | "Delivered-To:"*)

Nitpick: for consistency, should we either expect a space after the
colon also in the new keys, or not expect i in the From: key either. I
don't think the RFC requires a space, but most clients probably add
it.

-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] Re: git-am: less strong format "mbox" detection
  2009-07-14  7:16 ` Giuseppe Bilotta
@ 2009-07-14  8:20   ` Nicolas Sebrecht
  2009-07-14  8:35     ` Johannes Sixt
  2009-07-14  8:42     ` Junio C Hamano
  0 siblings, 2 replies; 10+ messages in thread
From: Nicolas Sebrecht @ 2009-07-14  8:20 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: Nicolas Sebrecht, git

Le 14/07/09, Giuseppe Bilotta a écrit :

> > diff --git a/git-am.sh b/git-am.sh
> > index d64d997..d10a8e0 100755
> > --- a/git-am.sh
> > +++ b/git-am.sh
> > @@ -169,7 +169,7 @@ check_patch_format () {
> >                read l2
> >                read l3
> >                case "$l1" in
> > -               "From "* | "From: "*)
> > +               "From "* | "From: "* | "X-Account-Key:"* | "Return-Path:"* | "Delivered-To:"*)
> 
> Nitpick: for consistency, should we either expect a space after the
> colon also in the new keys, or not expect i in the From: key either. I
> don't think the RFC requires a space, but most clients probably add
> it.

RFC 822 says:
     
" 3.4.2. WHITE SPACE
     
  Note:  In structured field bodies, multiple linear space ASCII
         characters  (namely  HTABs  and  SPACEs) are treated as
         single spaces and may freely surround any  symbol.   In
         all header fields, the only place in which at least one
         LWSP-char is REQUIRED is at the beginning of  continua-
         tion lines in a folded field.
"

A trailing space after the colon is not required. I'll remove it and
resend a patch.

And why should we accept "From "?

-- 
Nicolas Sebrecht

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Re: git-am: less strong format "mbox" detection
  2009-07-14  8:20   ` [PATCH] " Nicolas Sebrecht
@ 2009-07-14  8:35     ` Johannes Sixt
  2009-07-14  8:42     ` Junio C Hamano
  1 sibling, 0 replies; 10+ messages in thread
From: Johannes Sixt @ 2009-07-14  8:35 UTC (permalink / raw)
  To: Nicolas Sebrecht; +Cc: Giuseppe Bilotta, git

Nicolas Sebrecht schrieb:
> Le 14/07/09, Giuseppe Bilotta a écrit :
> 
>>> diff --git a/git-am.sh b/git-am.sh
>>> index d64d997..d10a8e0 100755
>>> --- a/git-am.sh
>>> +++ b/git-am.sh
>>> @@ -169,7 +169,7 @@ check_patch_format () {
>>>                read l2
>>>                read l3
>>>                case "$l1" in
>>> -               "From "* | "From: "*)
>>> +               "From "* | "From: "* | "X-Account-Key:"* | "Return-Path:"* | "Delivered-To:"*)
> 
> And why should we accept "From "?

Because mbox format must begin with "From ".

The question is rather: Why should we accept anything else? The case arm
in question is about detecting mbox format.

-- Hannes

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Re: git-am: less strong format "mbox" detection
  2009-07-14  8:20   ` [PATCH] " Nicolas Sebrecht
  2009-07-14  8:35     ` Johannes Sixt
@ 2009-07-14  8:42     ` Junio C Hamano
  2009-07-14 12:23       ` Nicolas Sebrecht
  1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2009-07-14  8:42 UTC (permalink / raw)
  To: Nicolas Sebrecht; +Cc: Giuseppe Bilotta, git

Nicolas Sebrecht <nicolas.s.dev@gmx.fr> writes:

> And why should we accept "From "?

You are going totally in a wrong way around with this.

Berkeley mbox format is what we support, and "From " is the _only_
delimiter between pieces of e-mail in the file.  We happen to allow "From:
" in order to merely be extra nice for people who create mbox looking file
by hand; I think it is an improvement not to require an optional SP after
the colon there, but that is totally an independent issue.

I cannot offhand say if allowing anything but "From:" is necessarily an
improvement, or making the format detection unnecessarily risky of
misidentification.  I do not particularly like the idea of allowing only
some randomly selected fields like Return-Path and Delibered-To and not
accepting others, let alone totally nonstandard X-Foo fields.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] Re: git-am: less strong format "mbox" detection
  2009-07-14  8:42     ` Junio C Hamano
@ 2009-07-14 12:23       ` Nicolas Sebrecht
  2009-07-15  5:52         ` [PATCH v2] git-am: fix maildir support regression for unordered headers in emails Nicolas Sebrecht
  0 siblings, 1 reply; 10+ messages in thread
From: Nicolas Sebrecht @ 2009-07-14 12:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Sebrecht, Giuseppe Bilotta, git

The 14/07/09, Junio C Hamano wrote:

> Nicolas Sebrecht <nicolas.s.dev@gmx.fr> writes:
> 
> > And why should we accept "From "?
> 
> You are going totally in a wrong way around with this.
> 
> Berkeley mbox format is what we support, and "From " is the _only_
> delimiter between pieces of e-mail in the file.  We happen to allow "From:
> " in order to merely be extra nice for people who create mbox looking file
> by hand; I think it is an improvement not to require an optional SP after
> the colon there, but that is totally an independent issue.

I see, thank you.

> I cannot offhand say if allowing anything but "From:" is necessarily an
> improvement, or making the format detection unnecessarily risky of
> misidentification.  I do not particularly like the idea of allowing only
> some randomly selected fields like Return-Path and Delibered-To and not
> accepting others, let alone totally nonstandard X-Foo fields.

I'll look at the source closer to know how it can be done the smart way.

As the manual of git-am says it accepts maildir format too (checked
now), we have a regression since

	a5a6755a1d4707bf2fab7752e5c974ebf63d086a

in case of maildir _and_ "verbatim" emails starting with anything else that
"From " or "From: ".

I think the RFC doesn't require any fixed order for the header fields
(will check). So, I'm not very optimist because format detection starts
with

	read l1
	read l2
	read l3

wich doesn't help to make a difference between mailbox and maildir.


-- 
Nicolas Sebrecht

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2] git-am: fix maildir support regression for unordered headers in emails
  2009-07-14 12:23       ` Nicolas Sebrecht
@ 2009-07-15  5:52         ` Nicolas Sebrecht
  2009-07-15  7:27           ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Nicolas Sebrecht @ 2009-07-15  5:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Giuseppe Bilotta, Johannes Sixt, Nicolas Sebrecht

Patch format detection introduced by a5a6755a1d4707bf2fab7752e5c974ebf63d086a
may refuse valid patches from verbatim emails.

Emails may have header fields in a random order.

Signed-off-by: Nicolas Sebrecht <nicolas.s.dev@gmx.fr>
---
 git-am.sh |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/git-am.sh b/git-am.sh
index d64d997..18e53d0 100755
--- a/git-am.sh
+++ b/git-am.sh
@@ -146,6 +146,7 @@ clean_abort () {
 }
 
 patch_format=
+is_verbatim_email=
 
 check_patch_format () {
 	# early return if patch_format was set from the command line
@@ -191,6 +192,15 @@ check_patch_format () {
 			esac
 			;;
 		esac
+		# Keep maildir workflows support.
+		# Verbatim emails may have header fields in random order.
+		is_verbatim_email='true'
+		for line in "$l1" "$l2" "$l3"; do
+			printf "$line" | grep --quiet --extended-regexp '^([^\ ])+: +.*' ||
+				is_verbatim_email='false'
+		done
+		# next treatments don't differ from mailbox format
+		[[ $is_verbatim_email == 'true' ]] && patch_format=mbox
 	} < "$1" || clean_abort
 }
 
-- 
1.6.4.rc0.129.gf738

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] git-am: fix maildir support regression for unordered headers in emails
  2009-07-15  5:52         ` [PATCH v2] git-am: fix maildir support regression for unordered headers in emails Nicolas Sebrecht
@ 2009-07-15  7:27           ` Junio C Hamano
  2009-07-15 12:54             ` Derek Fawcus
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2009-07-15  7:27 UTC (permalink / raw)
  To: Nicolas Sebrecht; +Cc: git, Giuseppe Bilotta, Johannes Sixt

Nicolas Sebrecht <nicolas.s.dev@gmx.fr> writes:

> Patch format detection introduced by a5a6755a1d4707bf2fab7752e5c974ebf63d086a
> may refuse valid patches from verbatim emails.

It is unclear what you meant by "verbatim email".  A verbatim e-mail
in mbox begins with "From " header that is already covered in the existing
code long before support for stgit/hg was added.

> +		# Keep maildir workflows support.
> +		# Verbatim emails may have header fields in random order.
> +		is_verbatim_email='true'

We do not fold lines like this,

> +		for line in "$l1" "$l2" "$l3"; do

Instead, we write like this:

	for x in a b c
	do
		cmd ...

> +			printf "$line" | grep --quiet --extended-regexp '^([^\ ])+: +.*' ||

We use GNUism spelling --extended-regexp nowhere in scripted Porcelains;
just say -E here, and do not omit -e before the pattern.

Likewise for --quiet.  Just say -q.  When in doubt, be conservative and
stick to POSIX for portability.

    http://www.opengroup.org/onlinepubs/9699919799/utilities/grep.html#tag_20_55

Your regexp has too many issues.

 - It is too loose and too strict at the same time.  RFC 2822 section 2.2
   and 3.6.8 specify that a field name must be composed of printable
   US-ASCII characters except colon and space, so if you really want to be
   lenient, it should instead begin with [^: ]+: (and you do not need any
   capture).

 - I think however starting the regexp with "^[A-Za-z]+(-[A-Za-z]+)*:"
   would be more appropriate in practice, though.

 - You do not need to end the expression with ".*"; omitting that would
   match the same set of lines anyway.

 - I thought you were advocating for not requiring SP after the colon?

 - What happens if $l2 or $l3 is a subsequent folded line?  For example,
   in my MUA edit buffer, this message begins with:

       To: Nicolas Sebrecht <nicolas.s.dev@gmx.fr>
       Cc: <git@vger.kernel.org>,
           Giuseppe Bilotta <giuseppe.bilotta@gmail.com>,
           Johannes Sixt <j.sixt@viscovery.net>
       Subject: Re: [PATCH v2] git-am: fix maildir support ...

   The third line would not match your regexp.

> +				is_verbatim_email='false'
> +		done
> +		# next treatments don't differ from mailbox format
> +		[[ $is_verbatim_email == 'true' ]] && patch_format=mbox

We do not use non-portable [[ ]] anywhere in our shell script.  Write

	if test true = "$is_verbatim_email"
        then
        	patch_format=mbox
	fi

if you really want to keep this code structure.

I actually do not think you would even need an extra is_verbatim_email
variable, though.  Assuming that I understand what you are trying to do,
this is probably how I would write it:

	sed -e '/^$/q' -e '/^[ 	]/d' "$1" |
        grep -v -E -e '^[A-Za-z]+(-[A-Za-z]+)*:' >/dev/null ||
        patch_format=mbox

But I am not convinced that I understand what _problem_ you are trying to
solve in the first place.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] git-am: fix maildir support regression for unordered headers in emails
  2009-07-15  7:27           ` Junio C Hamano
@ 2009-07-15 12:54             ` Derek Fawcus
  2009-07-15 16:19               ` [PATCH v2] " Nicolas Sebrecht
  0 siblings, 1 reply; 10+ messages in thread
From: Derek Fawcus @ 2009-07-15 12:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Sebrecht, git, Giuseppe Bilotta, Johannes Sixt

On Wed, Jul 15, 2009 at 12:27:05AM -0700, Junio C Hamano wrote:
> Nicolas Sebrecht <nicolas.s.dev@gmx.fr> writes:
> 
> > Patch format detection introduced by a5a6755a1d4707bf2fab7752e5c974ebf63d086a
> > may refuse valid patches from verbatim emails.
> 
> It is unclear what you meant by "verbatim email".  A verbatim e-mail
> in mbox begins with "From " header that is already covered in the existing
> code long before support for stgit/hg was added.

I believe he is referring to the claimed support for maildir format boxes.

> But I am not convinced that I understand what _problem_ you are trying to
> solve in the first place.

Assuming it is maildir support,  then there is no 'header' as such in the
file which can be detected.  One could try and detect that the contents
are structured as an RFC822 message (but with local line ends),  or one
could try and detect that the file is within a maildir folder.

It seems this patch is taking the former approach and trying to ensure
the file consists of header fields.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2] Re: git-am: fix maildir support regression for unordered headers in emails
  2009-07-15 12:54             ` Derek Fawcus
@ 2009-07-15 16:19               ` Nicolas Sebrecht
  0 siblings, 0 replies; 10+ messages in thread
From: Nicolas Sebrecht @ 2009-07-15 16:19 UTC (permalink / raw)
  To: Derek Fawcus
  Cc: Junio C Hamano, Nicolas Sebrecht, git, Giuseppe Bilotta,
	Johannes Sixt

The 15/07/09, Derek Fawcus wrote:
> On Wed, Jul 15, 2009 at 12:27:05AM -0700, Junio C Hamano wrote:
>
> > It is unclear what you meant by "verbatim email".  A verbatim e-mail
> > in mbox begins with "From " header that is already covered in the existing
> > code long before support for stgit/hg was added.
> 
> I believe he is referring to the claimed support for maildir format boxes.

You're right. In a maildir each email is the file as is.

> > But I am not convinced that I understand what _problem_ you are trying to
> > solve in the first place.
> 
> Assuming it is maildir support,  then there is no 'header' as such in the
> file which can be detected.

True.

>                              One could try and detect that the contents
> are structured as an RFC822 message (but with local line ends),  or one
> could try and detect that the file is within a maildir folder.
> 
> It seems this patch is taking the former approach and trying to ensure
> the file consists of header fields.

You're perfectly right. I think it's the best approach because if the
files are moved to another folder (the repo?), they are still valid
patches.


-- 
Nicolas Sebrecht

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-07-15 16:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-14  6:40 [PATCH] git-am: less strong format "mbox" detection Nicolas Sebrecht
2009-07-14  7:16 ` Giuseppe Bilotta
2009-07-14  8:20   ` [PATCH] " Nicolas Sebrecht
2009-07-14  8:35     ` Johannes Sixt
2009-07-14  8:42     ` Junio C Hamano
2009-07-14 12:23       ` Nicolas Sebrecht
2009-07-15  5:52         ` [PATCH v2] git-am: fix maildir support regression for unordered headers in emails Nicolas Sebrecht
2009-07-15  7:27           ` Junio C Hamano
2009-07-15 12:54             ` Derek Fawcus
2009-07-15 16:19               ` [PATCH v2] " Nicolas Sebrecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).