git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Cleanup git-send-email.perl:extract_valid_email
@ 2006-06-03 17:11 Horst H. von Brand
  2006-06-03 22:49 ` Eric Wong
  0 siblings, 1 reply; 11+ messages in thread
From: Horst H. von Brand @ 2006-06-03 17:11 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Horst H. von Brand

- Fix the regular expressions for local addresses
- Fix the fallback regexp for non-local addresses, simplify the logic

Signed-off-by: Horst H. von Brand <vonbrand@inf.utfsm.cl>
---
 git-send-email.perl |    9 +++------
 1 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index ed1d89b..a7a7797 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -314,18 +314,15 @@ sub extract_valid_address {
 	my $address = shift;
 
 	# check for a local address:
-	return $address if ($address =~ /^([\w\-]+)$/);
+	return $address if ($address =~ /^([\w\-.]+)$/);
 
 	if ($have_email_valid) {
 		return Email::Valid->address($address);
 	} else {
 		# less robust/correct than the monster regexp in Email::Valid,
 		# but still does a 99% job, and one less dependency
-		my $cleaned_address;
-		if ($address =~ /([^\"<>\s]+@[^<>\s]+)/) {
-			$cleaned_address = $1;
-		}
-		return $cleaned_address;
+		$address =~ /([\w\-.]+@[\w\-.]+)/;
+		return $1;
 	}
 }
 
-- 
1.3.3.g86f7

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] Cleanup git-send-email.perl:extract_valid_email
  2006-06-03 17:11 [PATCH] Cleanup git-send-email.perl:extract_valid_email Horst H. von Brand
@ 2006-06-03 22:49 ` Eric Wong
  2006-06-04  0:10   ` Horst von Brand
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Wong @ 2006-06-03 22:49 UTC (permalink / raw)
  To: Horst H. von Brand; +Cc: git, Junio C Hamano, Ryan Anderson

"Horst H. von Brand" <vonbrand@inf.utfsm.cl> wrote:
> - Fix the regular expressions for local addresses
> - Fix the fallback regexp for non-local addresses, simplify the logic
> 
> Signed-off-by: Horst H. von Brand <vonbrand@inf.utfsm.cl>
> ---
>  git-send-email.perl |    9 +++------
>  1 files changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/git-send-email.perl b/git-send-email.perl
> index ed1d89b..a7a7797 100755
> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -314,18 +314,15 @@ sub extract_valid_address {
>  	my $address = shift;
>  
>  	# check for a local address:
> -	return $address if ($address =~ /^([\w\-]+)$/);
> +	return $address if ($address =~ /^([\w\-.]+)$/);

I keep forgetting this, '+' is a valid (and useful) setup, too.

>  
>  	if ($have_email_valid) {
>  		return Email::Valid->address($address);
>  	} else {
>  		# less robust/correct than the monster regexp in Email::Valid,
>  		# but still does a 99% job, and one less dependency
> -		my $cleaned_address;
> -		if ($address =~ /([^\"<>\s]+@[^<>\s]+)/) {
> -			$cleaned_address = $1;
> -		}
> -		return $cleaned_address;
> +		$address =~ /([\w\-.]+@[\w\-.]+)/;
> +		return $1;

Actually, I'm retracting my earlier ack on this.  This is way too
restrictive.  I'd rather allow an occasional invalid email address than
to reject valid ones.  I generally trust git users to know what they're
doing when entering email addresses[1].

*, $, ^, +, = are all valid characters in the username portion (not sure
about local accounts, though), and I'm sure there are more that I don't
know about.

[1] - of course, without address book support in git-send-email, I think
I would've left the 'k' out of Junio's email address by now.  Of course
there's also the issue of where I work and having several people using
variations of rob/robert in their email address.  I'm likely to make
errors like these far more than entering addresses that email systems
consider invalid, and I suspect I'm not the only one.

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Cleanup git-send-email.perl:extract_valid_email
  2006-06-03 22:49 ` Eric Wong
@ 2006-06-04  0:10   ` Horst von Brand
  2006-06-06  6:59     ` Junio C Hamano
  0 siblings, 1 reply; 11+ messages in thread
From: Horst von Brand @ 2006-06-04  0:10 UTC (permalink / raw)
  To: Eric Wong; +Cc: Horst H. von Brand, git, Junio C Hamano, Ryan Anderson

Eric Wong <normalperson@yhbt.net> wrote:
> "Horst H. von Brand" <vonbrand@inf.utfsm.cl> wrote:
> > - Fix the regular expressions for local addresses
> > - Fix the fallback regexp for non-local addresses, simplify the logic
> > 
> > Signed-off-by: Horst H. von Brand <vonbrand@inf.utfsm.cl>
> > ---
> >  git-send-email.perl |    9 +++------
> >  1 files changed, 3 insertions(+), 6 deletions(-)
> > 
> > diff --git a/git-send-email.perl b/git-send-email.perl
> > index ed1d89b..a7a7797 100755
> > --- a/git-send-email.perl
> > +++ b/git-send-email.perl
> > @@ -314,18 +314,15 @@ sub extract_valid_address {
> >  	my $address = shift;
> >  
> >  	# check for a local address:
> > -	return $address if ($address =~ /^([\w\-]+)$/);
> > +	return $address if ($address =~ /^([\w\-.]+)$/);
> 
> I keep forgetting this, '+' is a valid (and useful) setup, too.

Oops...

> >  	if ($have_email_valid) {
> >  		return Email::Valid->address($address);
> >  	} else {
> >  		# less robust/correct than the monster regexp in Email::Valid,
> >  		# but still does a 99% job, and one less dependency
> > -		my $cleaned_address;
> > -		if ($address =~ /([^\"<>\s]+@[^<>\s]+)/) {
> > -			$cleaned_address = $1;
> > -		}
> > -		return $cleaned_address;
> > +		$address =~ /([\w\-.]+@[\w\-.]+)/;
> > +		return $1;
> 
> Actually, I'm retracting my earlier ack on this.  This is way too
> restrictive.  I'd rather allow an occasional invalid email address than
> to reject valid ones.  I generally trust git users to know what they're
> doing when entering email addresses[1].
> 
> *, $, ^, +, = are all valid characters in the username portion (not sure
> about local accounts, though), and I'm sure there are more that I don't
> know about.

As a general principle, I prefer to check what is legal instead of trying
to filter out what isn't.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Cleanup git-send-email.perl:extract_valid_email
  2006-06-04  0:10   ` Horst von Brand
@ 2006-06-06  6:59     ` Junio C Hamano
  2006-06-06 15:42       ` Horst von Brand
  0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2006-06-06  6:59 UTC (permalink / raw)
  To: Horst von Brand, Ryan Anderson, Eric Wong; +Cc: git

Horst von Brand <vonbrand@inf.utfsm.cl> writes:

>> >  	# check for a local address:
>> > -	return $address if ($address =~ /^([\w\-]+)$/);
>> > +	return $address if ($address =~ /^([\w\-.]+)$/);
>> 
>> I keep forgetting this, '+' is a valid (and useful) setup, too.
>
> Oops...
>> 
>> Actually, I'm retracting my earlier ack on this.  This is way too
>> restrictive.  I'd rather allow an occasional invalid email address than
>> to reject valid ones.  I generally trust git users to know what they're
>> doing when entering email addresses[1].
>> 
>> *, $, ^, +, = are all valid characters in the username portion (not sure
>> about local accounts, though), and I'm sure there are more that I don't
>> know about.
>
> As a general principle, I prefer to check what is legal instead of trying
> to filter out what isn't.

If we start doing addr-spec in RFC2822 (page 17) ourselves, we
should rather be using Email::Valid.  A permissive sanity check
to catch obvious mistakes would be more appropriate here than
being RFC police.

I think something like the attached, on top of your patch, would
be appropriate for upcoming 1.4.0.

-- >8 --
send-email: be more lenient and just catch obvious mistakes.

This cleans up the pattern matching subroutine by introducing
two variables to hold regexp to approximately match local-part
and domain in the e-mail address.  It is meant to catch obvious
mistakes with a cheap check.

The patch also moves "scalar" to force Email::Valid->address()
to work in !wantarray environment to extract_valid_address;
earlier it was in the caller of the subroutine, which was way
too error prone.

---
diff --git a/git-send-email.perl b/git-send-email.perl
index a7a7797..700d0c3 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -312,16 +312,18 @@ our ($message_id, $cc, %mail, $subject, 
 
 sub extract_valid_address {
 	my $address = shift;
+	my $local_part_regexp = '[^<>"\s@]+';
+	my $domain_regexp = '[^.<>"\s@]+\.[^<>"\s@]+';
 
 	# check for a local address:
-	return $address if ($address =~ /^([\w\-.]+)$/);
+	return $address if ($address =~ /^($local_part_regexp)$/);
 
 	if ($have_email_valid) {
-		return Email::Valid->address($address);
+		return scalar Email::Valid->address($address);
 	} else {
 		# less robust/correct than the monster regexp in Email::Valid,
 		# but still does a 99% job, and one less dependency
-		$address =~ /([\w\-.]+@[\w\-.]+)/;
+		$address =~ /($local_part_regexp\@$domain_regexp)/;
 		return $1;
 	}
 }
@@ -384,7 +386,7 @@ X-Mailer: git-send-email $gitversion
 		defined $pid or die $!;
 		if (!$pid) {
 			exec($smtp_server,'-i',
-			     map { scalar extract_valid_address($_) }
+			     map { extract_valid_address($_) }
 			     @recipients) or die $!;
 		}
 		print $sm "$header\n$message";

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] Cleanup git-send-email.perl:extract_valid_email
  2006-06-06  6:59     ` Junio C Hamano
@ 2006-06-06 15:42       ` Horst von Brand
  2006-06-06 15:54         ` Junio C Hamano
  0 siblings, 1 reply; 11+ messages in thread
From: Horst von Brand @ 2006-06-06 15:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Horst von Brand, Ryan Anderson, Eric Wong, git

Junio C Hamano <junkio@cox.net> wrote:
> Horst von Brand <vonbrand@inf.utfsm.cl> writes:
> 
> >> >  	# check for a local address:
> >> > -	return $address if ($address =~ /^([\w\-]+)$/);
> >> > +	return $address if ($address =~ /^([\w\-.]+)$/);
> >> 
> >> I keep forgetting this, '+' is a valid (and useful) setup, too.
> >
> > Oops...
> >> 
> >> Actually, I'm retracting my earlier ack on this.  This is way too
> >> restrictive.  I'd rather allow an occasional invalid email address than
> >> to reject valid ones.  I generally trust git users to know what they're
> >> doing when entering email addresses[1].
> >> 
> >> *, $, ^, +, = are all valid characters in the username portion (not sure
> >> about local accounts, though), and I'm sure there are more that I don't
> >> know about.
> >
> > As a general principle, I prefer to check what is legal instead of trying
> > to filter out what isn't.

> If we start doing addr-spec in RFC2822 (page 17) ourselves, we
> should rather be using Email::Valid.  A permissive sanity check
> to catch obvious mistakes would be more appropriate here than
> being RFC police.

OK.

> I think something like the attached, on top of your patch, would
> be appropriate for upcoming 1.4.0.
> 
> -- >8 --
> send-email: be more lenient and just catch obvious mistakes.
> 
> This cleans up the pattern matching subroutine by introducing
> two variables to hold regexp to approximately match local-part
> and domain in the e-mail address.  It is meant to catch obvious
> mistakes with a cheap check.
> 
> The patch also moves "scalar" to force Email::Valid->address()
> to work in !wantarray environment to extract_valid_address;
> earlier it was in the caller of the subroutine, which was way
> too error prone.

Right.

> ---
> diff --git a/git-send-email.perl b/git-send-email.perl
> index a7a7797..700d0c3 100755
> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -312,16 +312,18 @@ our ($message_id, $cc, %mail, $subject, 
>  
>  sub extract_valid_address {
>  	my $address = shift;
> +	my $local_part_regexp = '[^<>"\s@]+';
> +	my $domain_regexp = '[^.<>"\s@]+\.[^<>"\s@]+';

This forces a '.' in the domain, while vonbrand@localhost is perfectly
reasonable. Plus it doesn't disallow adyacent '.'s. What about:

        my $domain_regexp = '[^.<>"\s@]+(\.[^<>"\s@]+)*';

(but this is probably nitpicking...)
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Cleanup git-send-email.perl:extract_valid_email
  2006-06-06 15:42       ` Horst von Brand
@ 2006-06-06 15:54         ` Junio C Hamano
  2006-06-06 16:05           ` Horst von Brand
  0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2006-06-06 15:54 UTC (permalink / raw)
  To: Horst von Brand; +Cc: git

Horst von Brand <vonbrand@inf.utfsm.cl> writes:

>> diff --git a/git-send-email.perl b/git-send-email.perl
>> index a7a7797..700d0c3 100755
>> --- a/git-send-email.perl
>> +++ b/git-send-email.perl
>> @@ -312,16 +312,18 @@ our ($message_id, $cc, %mail, $subject, 
>>  
>>  sub extract_valid_address {
>>  	my $address = shift;
>> +	my $local_part_regexp = '[^<>"\s@]+';
>> +	my $domain_regexp = '[^.<>"\s@]+\.[^<>"\s@]+';
>
> This forces a '.' in the domain, while vonbrand@localhost is perfectly
> reasonable. Plus it doesn't disallow adyacent '.'s. What about:
>
>         my $domain_regexp = '[^.<>"\s@]+(\.[^<>"\s@]+)*';
>
> (but this is probably nitpicking...)

I do not have preference either way about allowing an address
like tld-administrator@net myself, but Email::Valid->address
does not seem to allow it, and I just copied that behaviour for
consistency between two alternative implementations.

I think you meant to say:

>         my $domain_regexp = '[^.<>"\s@]+(\.[^.<>"\s@]+)*';

(i.e. exclude dot from the latter character class), but I am
inclined to do this instead:

	my $domain_regexp = '[^.<>"\s@]+(?:\.[^.<>"\s@]+)+';

(i.e. still require at least two levels).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Cleanup git-send-email.perl:extract_valid_email
  2006-06-06 15:54         ` Junio C Hamano
@ 2006-06-06 16:05           ` Horst von Brand
  2006-06-06 16:58             ` Junio C Hamano
  0 siblings, 1 reply; 11+ messages in thread
From: Horst von Brand @ 2006-06-06 16:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Horst von Brand, git

Junio C Hamano <junkio@cox.net> wrote:
> Horst von Brand <vonbrand@inf.utfsm.cl> writes:
> 
> >> diff --git a/git-send-email.perl b/git-send-email.perl
> >> index a7a7797..700d0c3 100755
> >> --- a/git-send-email.perl
> >> +++ b/git-send-email.perl
> >> @@ -312,16 +312,18 @@ our ($message_id, $cc, %mail, $subject, 
> >>  
> >>  sub extract_valid_address {
> >>  	my $address = shift;
> >> +	my $local_part_regexp = '[^<>"\s@]+';
> >> +	my $domain_regexp = '[^.<>"\s@]+\.[^<>"\s@]+';
> >
> > This forces a '.' in the domain, while vonbrand@localhost is perfectly
> > reasonable. Plus it doesn't disallow adyacent '.'s. What about:
> >
> >         my $domain_regexp = '[^.<>"\s@]+(\.[^<>"\s@]+)*';
> >
> > (but this is probably nitpicking...)
> 
> I do not have preference either way about allowing an address
> like tld-administrator@net myself, but Email::Valid->address
> does not seem to allow it, and I just copied that behaviour for
> consistency between two alternative implementations.

Reasonable.

> I think you meant to say:
> 
> >         my $domain_regexp = '[^.<>"\s@]+(\.[^.<>"\s@]+)*';
> 
> (i.e. exclude dot from the latter character class),

Right, my bad.

>                                                     but I am
> inclined to do this instead:
> 
> 	my $domain_regexp = '[^.<>"\s@]+(?:\.[^.<>"\s@]+)+';
> 
> (i.e. still require at least two levels).

OK, but be careful as this (?:...) is an extended regexp (needs /x on
match). I'd just leave it plain (the performance impact shouldn't be
noticeable). I don't see any use except for $1, so the extra parenthesis
should be safe.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Cleanup git-send-email.perl:extract_valid_email
  2006-06-06 16:05           ` Horst von Brand
@ 2006-06-06 16:58             ` Junio C Hamano
  2006-06-06 21:24               ` Horst von Brand
  0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2006-06-06 16:58 UTC (permalink / raw)
  To: Horst von Brand; +Cc: git

Horst von Brand <vonbrand@inf.utfsm.cl> writes:

> Junio C Hamano <junkio@cox.net> wrote:
>>                                                     but I am
>> inclined to do this instead:
>> 
>> 	my $domain_regexp = '[^.<>"\s@]+(?:\.[^.<>"\s@]+)+';
>> 
>> (i.e. still require at least two levels).
>
> OK, but be careful as this (?:...) is an extended regexp (needs /x on
> match).

Are you sure about /x?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Cleanup git-send-email.perl:extract_valid_email
  2006-06-06 16:58             ` Junio C Hamano
@ 2006-06-06 21:24               ` Horst von Brand
  2006-06-06 21:39                 ` Junio C Hamano
  0 siblings, 1 reply; 11+ messages in thread
From: Horst von Brand @ 2006-06-06 21:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Horst von Brand, git

Junio C Hamano <junkio@cox.net> wrote:
> Horst von Brand <vonbrand@inf.utfsm.cl> writes:
> > Junio C Hamano <junkio@cox.net> wrote:
> >>                                                     but I am
> >> inclined to do this instead:
> >> 
> >> 	my $domain_regexp = '[^.<>"\s@]+(?:\.[^.<>"\s@]+)+';
> >> 
> >> (i.e. still require at least two levels).
> >
> > OK, but be careful as this (?:...) is an extended regexp (needs /x on
> > match).

> Are you sure about /x?

The manual (perlop(1)) says you need /x to match extended regexps, and
(?...) is the marker for such (perlre(1)). But my perl here (5.5.8-6 on
Fedora rawhide) doesn't care...
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Cleanup git-send-email.perl:extract_valid_email
  2006-06-06 21:24               ` Horst von Brand
@ 2006-06-06 21:39                 ` Junio C Hamano
  2006-06-06 22:48                   ` Horst von Brand
  0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2006-06-06 21:39 UTC (permalink / raw)
  To: Horst von Brand; +Cc: git

Horst von Brand <vonbrand@inf.utfsm.cl> writes:

>> > OK, but be careful as this (?:...) is an extended regexp (needs /x on
>> > match).
>
>> Are you sure about /x?
>
> The manual (perlop(1)) says you need /x to match extended regexps, and
> (?...) is the marker for such (perlre(1)).

I always had the impression that eXtended in the context to talk
about /x was about ignoring whitespaces and forcing people to
write \s (or perhaps \040) when they mean a whitespace and had
nothing to do with (?...) stuff.  Let me look up the fine
manual.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Cleanup git-send-email.perl:extract_valid_email
  2006-06-06 21:39                 ` Junio C Hamano
@ 2006-06-06 22:48                   ` Horst von Brand
  0 siblings, 0 replies; 11+ messages in thread
From: Horst von Brand @ 2006-06-06 22:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Horst von Brand, git

Junio C Hamano <junkio@cox.net> wrote:
> Horst von Brand <vonbrand@inf.utfsm.cl> writes:
> >> > OK, but be careful as this (?:...) is an extended regexp (needs /x on
> >> > match).
> >
> >> Are you sure about /x?
> >
> > The manual (perlop(1)) says you need /x to match extended regexps, and
> > (?...) is the marker for such (perlre(1)).

> I always had the impression that eXtended in the context to talk
> about /x was about ignoring whitespaces and forcing people to
> write \s (or perhaps \040) when they mean a whitespace and had
> nothing to do with (?...) stuff.  Let me look up the fine
> manual.

You might be right... and it even sounds sensible; but both (?...) stuff
and the ignoring of space is described as extended here.

Note that \s is a space character (' ', '\t', ...), which is not the same
as \040 (and that one assumes ASCII...).
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-06-06 22:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-03 17:11 [PATCH] Cleanup git-send-email.perl:extract_valid_email Horst H. von Brand
2006-06-03 22:49 ` Eric Wong
2006-06-04  0:10   ` Horst von Brand
2006-06-06  6:59     ` Junio C Hamano
2006-06-06 15:42       ` Horst von Brand
2006-06-06 15:54         ` Junio C Hamano
2006-06-06 16:05           ` Horst von Brand
2006-06-06 16:58             ` Junio C Hamano
2006-06-06 21:24               ` Horst von Brand
2006-06-06 21:39                 ` Junio C Hamano
2006-06-06 22:48                   ` Horst von Brand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).