git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mailinfo: parse From header more strictly when run from rebase
@ 2010-06-07  4:45 Jay Soffian
  2010-06-15 16:14 ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Jay Soffian @ 2010-06-07  4:45 UTC (permalink / raw)
  To: git; +Cc: Jay Soffian, Junio C Hamano

Given an email address of the form:

 author@example.com <author@example.com@11B4E7C6-762E-4CF5-B2EB-3F7BD596981D>

results in "author@example.com" being used as both the name and the
address. This is due to an assumption in handle_from() that the
first '@' denotes the address portion. The remainder of the string
is then assumed to be the name, which is rejected by
get_sane_name(), resulting in address being used for the name as
well.

I considered making handle_from() smarter by instead looking first
for an '@' contained by brackets. But get_sane_name() would still
reject "author@example.com" for the name, resulting in
"author@example.com@11B4E7C6-762E-4CF5-B2EB-3F7BD596981D" being used
for both the name and the address. Hardly an improvement.

In practice, I've only encountered this issue when rebasing commits
that were generated by git svn. So instead of adding more heuristics
to handle_from(), just make it stricter when run from rebase.

We use undocumented --no-inbody-headers to know when we're running
from rebase and then assume the address is of the form:

 name <email>

in which case email is extracted purely by brackets and name is
trimmed only for whitespace.

Signed-off-by: Jay Soffian <jaysoffian@gmail.com>
---
 builtin/mailinfo.c                   |   15 ++++++++++++---
 t/t5100-mailinfo.sh                  |    2 +-
 t/t5100/0017                         |   17 +++++++++++++++++
 t/t5100/info0017                     |    5 +++++
 t/t5100/info0017--no-inbody-headers  |    5 +++++
 t/t5100/patch0017                    |   12 ++++++++++++
 t/t5100/patch0017--no-inbody-headers |   12 ++++++++++++
 t/t5100/sample.mbox                  |   17 +++++++++++++++++
 8 files changed, 81 insertions(+), 4 deletions(-)
 create mode 100644 t/t5100/0017
 create mode 100644 t/t5100/info0017
 create mode 100644 t/t5100/info0017--no-inbody-headers
 create mode 100644 t/t5100/msg0017
 create mode 100644 t/t5100/msg0017--no-inbody-headers
 create mode 100644 t/t5100/patch0017
 create mode 100644 t/t5100/patch0017--no-inbody-headers

diff --git a/builtin/mailinfo.c b/builtin/mailinfo.c
index 4a9729b..0cb5521 100644
--- a/builtin/mailinfo.c
+++ b/builtin/mailinfo.c
@@ -47,7 +47,7 @@ static void get_sane_name(struct strbuf *out, struct strbuf *name, struct strbuf
 	strbuf_addbuf(out, src);
 }
 
-static void parse_bogus_from(const struct strbuf *line)
+static void parse_bracketed_from(const struct strbuf *line, int strict)
 {
 	/* John Doe <johndoe> */
 
@@ -71,7 +71,8 @@ static void parse_bogus_from(const struct strbuf *line)
 	strbuf_reset(&name);
 	strbuf_add(&name, line->buf, bra - line->buf);
 	strbuf_trim(&name);
-	get_sane_name(&name, &name, &email);
+	if (!strict)
+		get_sane_name(&name, &name, &email);
 }
 
 static void handle_from(const struct strbuf *from)
@@ -80,12 +81,20 @@ static void handle_from(const struct strbuf *from)
 	size_t el;
 	struct strbuf f;
 
+	if (!use_inbody_headers) {
+		/* we're running from rebase, so we can assume the from header
+		 * is well-formed and use the stricter parse_bracketed_from
+		 */
+		parse_bracketed_from(from, 1);
+		return;
+	}
+
 	strbuf_init(&f, from->len);
 	strbuf_addbuf(&f, from);
 
 	at = strchr(f.buf, '@');
 	if (!at) {
-		parse_bogus_from(from);
+		parse_bracketed_from(from, 0);
 		return;
 	}
 
diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
index ebc36c1..d8bbad8 100755
--- a/t/t5100-mailinfo.sh
+++ b/t/t5100-mailinfo.sh
@@ -11,7 +11,7 @@ test_expect_success 'split sample box' \
 	'git mailsplit -o. "$TEST_DIRECTORY"/t5100/sample.mbox >last &&
 	last=`cat last` &&
 	echo total is $last &&
-	test `cat last` = 16'
+	test `cat last` = 17'
 
 check_mailinfo () {
 	mail=$1 opt=$2
diff --git a/t/t5100/0017 b/t/t5100/0017
new file mode 100644
index 0000000..085d414
--- /dev/null
+++ b/t/t5100/0017
@@ -0,0 +1,17 @@
+From 9251ac1cdbf624c31ec0a9996f0b38dc7bcdcd60 Mon Sep 17 00:00:00 2001
+From: author@example.com <author@example.com@11B4E7C6-762E-4CF5-B2EB-3F7BD596981D>
+Date: Sun, 6 Jun 2010 22:14:37 -0400
+Subject: initial
+
+---
+ hello |    1 +
+ 1 files changed, 1 insertions(+), 0 deletions(-)
+ create mode 100644 hello
+
+diff --git a/hello b/hello
+new file mode 100644
+index 0000000..ce01362
+--- /dev/null
++++ b/hello
+@@ -0,0 +1 @@
++hello
diff --git a/t/t5100/info0017 b/t/t5100/info0017
new file mode 100644
index 0000000..992cfe0
--- /dev/null
+++ b/t/t5100/info0017
@@ -0,0 +1,5 @@
+Author: author@example.com
+Email: author@example.com
+Subject: initial
+Date: Sun, 6 Jun 2010 22:14:37 -0400
+
diff --git a/t/t5100/info0017--no-inbody-headers b/t/t5100/info0017--no-inbody-headers
new file mode 100644
index 0000000..9d74dce
--- /dev/null
+++ b/t/t5100/info0017--no-inbody-headers
@@ -0,0 +1,5 @@
+Author: author@example.com
+Email: author@example.com@11B4E7C6-762E-4CF5-B2EB-3F7BD596981D
+Subject: initial
+Date: Sun, 6 Jun 2010 22:14:37 -0400
+
diff --git a/t/t5100/msg0017 b/t/t5100/msg0017
new file mode 100644
index 0000000..e69de29
diff --git a/t/t5100/msg0017--no-inbody-headers b/t/t5100/msg0017--no-inbody-headers
new file mode 100644
index 0000000..e69de29
diff --git a/t/t5100/patch0017 b/t/t5100/patch0017
new file mode 100644
index 0000000..d5640e6
--- /dev/null
+++ b/t/t5100/patch0017
@@ -0,0 +1,12 @@
+---
+ hello |    1 +
+ 1 files changed, 1 insertions(+), 0 deletions(-)
+ create mode 100644 hello
+
+diff --git a/hello b/hello
+new file mode 100644
+index 0000000..ce01362
+--- /dev/null
++++ b/hello
+@@ -0,0 +1 @@
++hello
diff --git a/t/t5100/patch0017--no-inbody-headers b/t/t5100/patch0017--no-inbody-headers
new file mode 100644
index 0000000..d5640e6
--- /dev/null
+++ b/t/t5100/patch0017--no-inbody-headers
@@ -0,0 +1,12 @@
+---
+ hello |    1 +
+ 1 files changed, 1 insertions(+), 0 deletions(-)
+ create mode 100644 hello
+
+diff --git a/hello b/hello
+new file mode 100644
+index 0000000..ce01362
+--- /dev/null
++++ b/hello
+@@ -0,0 +1 @@
++hello
diff --git a/t/t5100/sample.mbox b/t/t5100/sample.mbox
index de10312..50f71ea 100644
--- a/t/t5100/sample.mbox
+++ b/t/t5100/sample.mbox
@@ -683,3 +683,20 @@ index e69de29..d95f3ad 100644
 @@ -0,0 +1 @@
 +content
 
+From 9251ac1cdbf624c31ec0a9996f0b38dc7bcdcd60 Mon Sep 17 00:00:00 2001
+From: author@example.com <author@example.com@11B4E7C6-762E-4CF5-B2EB-3F7BD596981D>
+Date: Sun, 6 Jun 2010 22:14:37 -0400
+Subject: initial
+
+---
+ hello |    1 +
+ 1 files changed, 1 insertions(+), 0 deletions(-)
+ create mode 100644 hello
+
+diff --git a/hello b/hello
+new file mode 100644
+index 0000000..ce01362
+--- /dev/null
++++ b/hello
+@@ -0,0 +1 @@
++hello
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] mailinfo: parse From header more strictly when run from rebase
  2010-06-07  4:45 [PATCH] mailinfo: parse From header more strictly when run from rebase Jay Soffian
@ 2010-06-15 16:14 ` Junio C Hamano
  2010-06-15 16:41   ` Jay Soffian
  0 siblings, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2010-06-15 16:14 UTC (permalink / raw)
  To: Jay Soffian; +Cc: git

Jay Soffian <jaysoffian@gmail.com> writes:

> Given an email address of the form:
>
>  author@example.com <author@example.com@11B4E7C6-762E-4CF5-B2EB-3F7BD596981D>
>
> results in "author@example.com" being used as both the name and the
> address. This is due to an assumption in handle_from() that the
> first '@' denotes the address portion. The remainder of the string
> is then assumed to be the name, which is rejected by
> get_sane_name(), resulting in address being used for the name as
> well.

Sorry, but it is unclear from your message what problem you are trying to
solve, what your desired outcome is, and why you think that outcome is
desired.  The bracketed one shown above does _not_ look like a valid email
address, so using

    name = "author@example.com"
    email = "author@example.com@11B4..."

does not sound like a good thing to do to begin with.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mailinfo: parse From header more strictly when run from  rebase
  2010-06-15 16:14 ` Junio C Hamano
@ 2010-06-15 16:41   ` Jay Soffian
  2010-06-15 17:01     ` Jay Soffian
  0 siblings, 1 reply; 4+ messages in thread
From: Jay Soffian @ 2010-06-15 16:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, Jun 15, 2010 at 12:14 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Jay Soffian <jaysoffian@gmail.com> writes:
>
>> Given an email address of the form:
>>
>>  author@example.com <author@example.com@11B4E7C6-762E-4CF5-B2EB-3F7BD596981D>
>>
>> results in "author@example.com" being used as both the name and the
>> address. This is due to an assumption in handle_from() that the
>> first '@' denotes the address portion. The remainder of the string
>> is then assumed to be the name, which is rejected by
>> get_sane_name(), resulting in address being used for the name as
>> well.
>
> Sorry, but it is unclear from your message what problem you are trying to
> solve, what your desired outcome is, and why you think that outcome is
> desired.  The bracketed one shown above does _not_ look like a valid email
> address, so using
>
>    name = "author@example.com"
>    email = "author@example.com@11B4..."
>
> does not sound like a good thing to do to begin with.

You left out this part of my commit message:

"In practice, I've only encountered this issue when rebasing commits
that were generated by git svn."

Which is exactly what I was doing. I'm rebasing commits from this repo:

http://src.chromium.org/cgi-bin/gitweb.cgi?p=chromium.git

The rebase operation is munging the address and transforming an author of

  user@chromium.org <user@chromium.org@0039d316-1c4b-4281-b951-d872f2087c98>

into

  user@chromium.org <user@chromium.org>

Regardless of whether the address is valid or not, rebase shouldn't munge it.

j.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mailinfo: parse From header more strictly when run from  rebase
  2010-06-15 16:41   ` Jay Soffian
@ 2010-06-15 17:01     ` Jay Soffian
  0 siblings, 0 replies; 4+ messages in thread
From: Jay Soffian @ 2010-06-15 17:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, Jun 15, 2010 at 12:41 PM, Jay Soffian <jaysoffian@gmail.com> wrote:
> On Tue, Jun 15, 2010 at 12:14 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Sorry, but it is unclear from your message what problem you are trying to
>> solve, what your desired outcome is, and why you think that outcome is
>> desired.  The bracketed one shown above does _not_ look like a valid email
>> address, so using
>>
>>    name = "author@example.com"
>>    email = "author@example.com@11B4..."
>>
>> does not sound like a good thing to do to begin with.

In fact, that is an invalid email address (I thought it might be valid
per historical source-routed syntax, but I just checked the RFCs and
it's not even valid for that). Nonetheless, my goal is that I can
rebase a commit without the address (even an invalid one) being
munged.

j.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-06-15 17:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-07  4:45 [PATCH] mailinfo: parse From header more strictly when run from rebase Jay Soffian
2010-06-15 16:14 ` Junio C Hamano
2010-06-15 16:41   ` Jay Soffian
2010-06-15 17:01     ` Jay Soffian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).