* [PATCH] Cleanup git-send-email.perl:extract_valid_email @ 2006-06-03 17:11 Horst H. von Brand 2006-06-03 22:49 ` Eric Wong 0 siblings, 1 reply; 11+ messages in thread From: Horst H. von Brand @ 2006-06-03 17:11 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Horst H. von Brand - Fix the regular expressions for local addresses - Fix the fallback regexp for non-local addresses, simplify the logic Signed-off-by: Horst H. von Brand <vonbrand@inf.utfsm.cl> --- git-send-email.perl | 9 +++------ 1 files changed, 3 insertions(+), 6 deletions(-) diff --git a/git-send-email.perl b/git-send-email.perl index ed1d89b..a7a7797 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -314,18 +314,15 @@ sub extract_valid_address { my $address = shift; # check for a local address: - return $address if ($address =~ /^([\w\-]+)$/); + return $address if ($address =~ /^([\w\-.]+)$/); if ($have_email_valid) { return Email::Valid->address($address); } else { # less robust/correct than the monster regexp in Email::Valid, # but still does a 99% job, and one less dependency - my $cleaned_address; - if ($address =~ /([^\"<>\s]+@[^<>\s]+)/) { - $cleaned_address = $1; - } - return $cleaned_address; + $address =~ /([\w\-.]+@[\w\-.]+)/; + return $1; } } -- 1.3.3.g86f7 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] Cleanup git-send-email.perl:extract_valid_email 2006-06-03 17:11 [PATCH] Cleanup git-send-email.perl:extract_valid_email Horst H. von Brand @ 2006-06-03 22:49 ` Eric Wong 2006-06-04 0:10 ` Horst von Brand 0 siblings, 1 reply; 11+ messages in thread From: Eric Wong @ 2006-06-03 22:49 UTC (permalink / raw) To: Horst H. von Brand; +Cc: git, Junio C Hamano, Ryan Anderson "Horst H. von Brand" <vonbrand@inf.utfsm.cl> wrote: > - Fix the regular expressions for local addresses > - Fix the fallback regexp for non-local addresses, simplify the logic > > Signed-off-by: Horst H. von Brand <vonbrand@inf.utfsm.cl> > --- > git-send-email.perl | 9 +++------ > 1 files changed, 3 insertions(+), 6 deletions(-) > > diff --git a/git-send-email.perl b/git-send-email.perl > index ed1d89b..a7a7797 100755 > --- a/git-send-email.perl > +++ b/git-send-email.perl > @@ -314,18 +314,15 @@ sub extract_valid_address { > my $address = shift; > > # check for a local address: > - return $address if ($address =~ /^([\w\-]+)$/); > + return $address if ($address =~ /^([\w\-.]+)$/); I keep forgetting this, '+' is a valid (and useful) setup, too. > > if ($have_email_valid) { > return Email::Valid->address($address); > } else { > # less robust/correct than the monster regexp in Email::Valid, > # but still does a 99% job, and one less dependency > - my $cleaned_address; > - if ($address =~ /([^\"<>\s]+@[^<>\s]+)/) { > - $cleaned_address = $1; > - } > - return $cleaned_address; > + $address =~ /([\w\-.]+@[\w\-.]+)/; > + return $1; Actually, I'm retracting my earlier ack on this. This is way too restrictive. I'd rather allow an occasional invalid email address than to reject valid ones. I generally trust git users to know what they're doing when entering email addresses[1]. *, $, ^, +, = are all valid characters in the username portion (not sure about local accounts, though), and I'm sure there are more that I don't know about. [1] - of course, without address book support in git-send-email, I think I would've left the 'k' out of Junio's email address by now. Of course there's also the issue of where I work and having several people using variations of rob/robert in their email address. I'm likely to make errors like these far more than entering addresses that email systems consider invalid, and I suspect I'm not the only one. -- Eric Wong ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Cleanup git-send-email.perl:extract_valid_email 2006-06-03 22:49 ` Eric Wong @ 2006-06-04 0:10 ` Horst von Brand 2006-06-06 6:59 ` Junio C Hamano 0 siblings, 1 reply; 11+ messages in thread From: Horst von Brand @ 2006-06-04 0:10 UTC (permalink / raw) To: Eric Wong; +Cc: Horst H. von Brand, git, Junio C Hamano, Ryan Anderson Eric Wong <normalperson@yhbt.net> wrote: > "Horst H. von Brand" <vonbrand@inf.utfsm.cl> wrote: > > - Fix the regular expressions for local addresses > > - Fix the fallback regexp for non-local addresses, simplify the logic > > > > Signed-off-by: Horst H. von Brand <vonbrand@inf.utfsm.cl> > > --- > > git-send-email.perl | 9 +++------ > > 1 files changed, 3 insertions(+), 6 deletions(-) > > > > diff --git a/git-send-email.perl b/git-send-email.perl > > index ed1d89b..a7a7797 100755 > > --- a/git-send-email.perl > > +++ b/git-send-email.perl > > @@ -314,18 +314,15 @@ sub extract_valid_address { > > my $address = shift; > > > > # check for a local address: > > - return $address if ($address =~ /^([\w\-]+)$/); > > + return $address if ($address =~ /^([\w\-.]+)$/); > > I keep forgetting this, '+' is a valid (and useful) setup, too. Oops... > > if ($have_email_valid) { > > return Email::Valid->address($address); > > } else { > > # less robust/correct than the monster regexp in Email::Valid, > > # but still does a 99% job, and one less dependency > > - my $cleaned_address; > > - if ($address =~ /([^\"<>\s]+@[^<>\s]+)/) { > > - $cleaned_address = $1; > > - } > > - return $cleaned_address; > > + $address =~ /([\w\-.]+@[\w\-.]+)/; > > + return $1; > > Actually, I'm retracting my earlier ack on this. This is way too > restrictive. I'd rather allow an occasional invalid email address than > to reject valid ones. I generally trust git users to know what they're > doing when entering email addresses[1]. > > *, $, ^, +, = are all valid characters in the username portion (not sure > about local accounts, though), and I'm sure there are more that I don't > know about. As a general principle, I prefer to check what is legal instead of trying to filter out what isn't. -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Cleanup git-send-email.perl:extract_valid_email 2006-06-04 0:10 ` Horst von Brand @ 2006-06-06 6:59 ` Junio C Hamano 2006-06-06 15:42 ` Horst von Brand 0 siblings, 1 reply; 11+ messages in thread From: Junio C Hamano @ 2006-06-06 6:59 UTC (permalink / raw) To: Horst von Brand, Ryan Anderson, Eric Wong; +Cc: git Horst von Brand <vonbrand@inf.utfsm.cl> writes: >> > # check for a local address: >> > - return $address if ($address =~ /^([\w\-]+)$/); >> > + return $address if ($address =~ /^([\w\-.]+)$/); >> >> I keep forgetting this, '+' is a valid (and useful) setup, too. > > Oops... >> >> Actually, I'm retracting my earlier ack on this. This is way too >> restrictive. I'd rather allow an occasional invalid email address than >> to reject valid ones. I generally trust git users to know what they're >> doing when entering email addresses[1]. >> >> *, $, ^, +, = are all valid characters in the username portion (not sure >> about local accounts, though), and I'm sure there are more that I don't >> know about. > > As a general principle, I prefer to check what is legal instead of trying > to filter out what isn't. If we start doing addr-spec in RFC2822 (page 17) ourselves, we should rather be using Email::Valid. A permissive sanity check to catch obvious mistakes would be more appropriate here than being RFC police. I think something like the attached, on top of your patch, would be appropriate for upcoming 1.4.0. -- >8 -- send-email: be more lenient and just catch obvious mistakes. This cleans up the pattern matching subroutine by introducing two variables to hold regexp to approximately match local-part and domain in the e-mail address. It is meant to catch obvious mistakes with a cheap check. The patch also moves "scalar" to force Email::Valid->address() to work in !wantarray environment to extract_valid_address; earlier it was in the caller of the subroutine, which was way too error prone. --- diff --git a/git-send-email.perl b/git-send-email.perl index a7a7797..700d0c3 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -312,16 +312,18 @@ our ($message_id, $cc, %mail, $subject, sub extract_valid_address { my $address = shift; + my $local_part_regexp = '[^<>"\s@]+'; + my $domain_regexp = '[^.<>"\s@]+\.[^<>"\s@]+'; # check for a local address: - return $address if ($address =~ /^([\w\-.]+)$/); + return $address if ($address =~ /^($local_part_regexp)$/); if ($have_email_valid) { - return Email::Valid->address($address); + return scalar Email::Valid->address($address); } else { # less robust/correct than the monster regexp in Email::Valid, # but still does a 99% job, and one less dependency - $address =~ /([\w\-.]+@[\w\-.]+)/; + $address =~ /($local_part_regexp\@$domain_regexp)/; return $1; } } @@ -384,7 +386,7 @@ X-Mailer: git-send-email $gitversion defined $pid or die $!; if (!$pid) { exec($smtp_server,'-i', - map { scalar extract_valid_address($_) } + map { extract_valid_address($_) } @recipients) or die $!; } print $sm "$header\n$message"; ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] Cleanup git-send-email.perl:extract_valid_email 2006-06-06 6:59 ` Junio C Hamano @ 2006-06-06 15:42 ` Horst von Brand 2006-06-06 15:54 ` Junio C Hamano 0 siblings, 1 reply; 11+ messages in thread From: Horst von Brand @ 2006-06-06 15:42 UTC (permalink / raw) To: Junio C Hamano; +Cc: Horst von Brand, Ryan Anderson, Eric Wong, git Junio C Hamano <junkio@cox.net> wrote: > Horst von Brand <vonbrand@inf.utfsm.cl> writes: > > >> > # check for a local address: > >> > - return $address if ($address =~ /^([\w\-]+)$/); > >> > + return $address if ($address =~ /^([\w\-.]+)$/); > >> > >> I keep forgetting this, '+' is a valid (and useful) setup, too. > > > > Oops... > >> > >> Actually, I'm retracting my earlier ack on this. This is way too > >> restrictive. I'd rather allow an occasional invalid email address than > >> to reject valid ones. I generally trust git users to know what they're > >> doing when entering email addresses[1]. > >> > >> *, $, ^, +, = are all valid characters in the username portion (not sure > >> about local accounts, though), and I'm sure there are more that I don't > >> know about. > > > > As a general principle, I prefer to check what is legal instead of trying > > to filter out what isn't. > If we start doing addr-spec in RFC2822 (page 17) ourselves, we > should rather be using Email::Valid. A permissive sanity check > to catch obvious mistakes would be more appropriate here than > being RFC police. OK. > I think something like the attached, on top of your patch, would > be appropriate for upcoming 1.4.0. > > -- >8 -- > send-email: be more lenient and just catch obvious mistakes. > > This cleans up the pattern matching subroutine by introducing > two variables to hold regexp to approximately match local-part > and domain in the e-mail address. It is meant to catch obvious > mistakes with a cheap check. > > The patch also moves "scalar" to force Email::Valid->address() > to work in !wantarray environment to extract_valid_address; > earlier it was in the caller of the subroutine, which was way > too error prone. Right. > --- > diff --git a/git-send-email.perl b/git-send-email.perl > index a7a7797..700d0c3 100755 > --- a/git-send-email.perl > +++ b/git-send-email.perl > @@ -312,16 +312,18 @@ our ($message_id, $cc, %mail, $subject, > > sub extract_valid_address { > my $address = shift; > + my $local_part_regexp = '[^<>"\s@]+'; > + my $domain_regexp = '[^.<>"\s@]+\.[^<>"\s@]+'; This forces a '.' in the domain, while vonbrand@localhost is perfectly reasonable. Plus it doesn't disallow adyacent '.'s. What about: my $domain_regexp = '[^.<>"\s@]+(\.[^<>"\s@]+)*'; (but this is probably nitpicking...) -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Cleanup git-send-email.perl:extract_valid_email 2006-06-06 15:42 ` Horst von Brand @ 2006-06-06 15:54 ` Junio C Hamano 2006-06-06 16:05 ` Horst von Brand 0 siblings, 1 reply; 11+ messages in thread From: Junio C Hamano @ 2006-06-06 15:54 UTC (permalink / raw) To: Horst von Brand; +Cc: git Horst von Brand <vonbrand@inf.utfsm.cl> writes: >> diff --git a/git-send-email.perl b/git-send-email.perl >> index a7a7797..700d0c3 100755 >> --- a/git-send-email.perl >> +++ b/git-send-email.perl >> @@ -312,16 +312,18 @@ our ($message_id, $cc, %mail, $subject, >> >> sub extract_valid_address { >> my $address = shift; >> + my $local_part_regexp = '[^<>"\s@]+'; >> + my $domain_regexp = '[^.<>"\s@]+\.[^<>"\s@]+'; > > This forces a '.' in the domain, while vonbrand@localhost is perfectly > reasonable. Plus it doesn't disallow adyacent '.'s. What about: > > my $domain_regexp = '[^.<>"\s@]+(\.[^<>"\s@]+)*'; > > (but this is probably nitpicking...) I do not have preference either way about allowing an address like tld-administrator@net myself, but Email::Valid->address does not seem to allow it, and I just copied that behaviour for consistency between two alternative implementations. I think you meant to say: > my $domain_regexp = '[^.<>"\s@]+(\.[^.<>"\s@]+)*'; (i.e. exclude dot from the latter character class), but I am inclined to do this instead: my $domain_regexp = '[^.<>"\s@]+(?:\.[^.<>"\s@]+)+'; (i.e. still require at least two levels). ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Cleanup git-send-email.perl:extract_valid_email 2006-06-06 15:54 ` Junio C Hamano @ 2006-06-06 16:05 ` Horst von Brand 2006-06-06 16:58 ` Junio C Hamano 0 siblings, 1 reply; 11+ messages in thread From: Horst von Brand @ 2006-06-06 16:05 UTC (permalink / raw) To: Junio C Hamano; +Cc: Horst von Brand, git Junio C Hamano <junkio@cox.net> wrote: > Horst von Brand <vonbrand@inf.utfsm.cl> writes: > > >> diff --git a/git-send-email.perl b/git-send-email.perl > >> index a7a7797..700d0c3 100755 > >> --- a/git-send-email.perl > >> +++ b/git-send-email.perl > >> @@ -312,16 +312,18 @@ our ($message_id, $cc, %mail, $subject, > >> > >> sub extract_valid_address { > >> my $address = shift; > >> + my $local_part_regexp = '[^<>"\s@]+'; > >> + my $domain_regexp = '[^.<>"\s@]+\.[^<>"\s@]+'; > > > > This forces a '.' in the domain, while vonbrand@localhost is perfectly > > reasonable. Plus it doesn't disallow adyacent '.'s. What about: > > > > my $domain_regexp = '[^.<>"\s@]+(\.[^<>"\s@]+)*'; > > > > (but this is probably nitpicking...) > > I do not have preference either way about allowing an address > like tld-administrator@net myself, but Email::Valid->address > does not seem to allow it, and I just copied that behaviour for > consistency between two alternative implementations. Reasonable. > I think you meant to say: > > > my $domain_regexp = '[^.<>"\s@]+(\.[^.<>"\s@]+)*'; > > (i.e. exclude dot from the latter character class), Right, my bad. > but I am > inclined to do this instead: > > my $domain_regexp = '[^.<>"\s@]+(?:\.[^.<>"\s@]+)+'; > > (i.e. still require at least two levels). OK, but be careful as this (?:...) is an extended regexp (needs /x on match). I'd just leave it plain (the performance impact shouldn't be noticeable). I don't see any use except for $1, so the extra parenthesis should be safe. -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Cleanup git-send-email.perl:extract_valid_email 2006-06-06 16:05 ` Horst von Brand @ 2006-06-06 16:58 ` Junio C Hamano 2006-06-06 21:24 ` Horst von Brand 0 siblings, 1 reply; 11+ messages in thread From: Junio C Hamano @ 2006-06-06 16:58 UTC (permalink / raw) To: Horst von Brand; +Cc: git Horst von Brand <vonbrand@inf.utfsm.cl> writes: > Junio C Hamano <junkio@cox.net> wrote: >> but I am >> inclined to do this instead: >> >> my $domain_regexp = '[^.<>"\s@]+(?:\.[^.<>"\s@]+)+'; >> >> (i.e. still require at least two levels). > > OK, but be careful as this (?:...) is an extended regexp (needs /x on > match). Are you sure about /x? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Cleanup git-send-email.perl:extract_valid_email 2006-06-06 16:58 ` Junio C Hamano @ 2006-06-06 21:24 ` Horst von Brand 2006-06-06 21:39 ` Junio C Hamano 0 siblings, 1 reply; 11+ messages in thread From: Horst von Brand @ 2006-06-06 21:24 UTC (permalink / raw) To: Junio C Hamano; +Cc: Horst von Brand, git Junio C Hamano <junkio@cox.net> wrote: > Horst von Brand <vonbrand@inf.utfsm.cl> writes: > > Junio C Hamano <junkio@cox.net> wrote: > >> but I am > >> inclined to do this instead: > >> > >> my $domain_regexp = '[^.<>"\s@]+(?:\.[^.<>"\s@]+)+'; > >> > >> (i.e. still require at least two levels). > > > > OK, but be careful as this (?:...) is an extended regexp (needs /x on > > match). > Are you sure about /x? The manual (perlop(1)) says you need /x to match extended regexps, and (?...) is the marker for such (perlre(1)). But my perl here (5.5.8-6 on Fedora rawhide) doesn't care... -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Cleanup git-send-email.perl:extract_valid_email 2006-06-06 21:24 ` Horst von Brand @ 2006-06-06 21:39 ` Junio C Hamano 2006-06-06 22:48 ` Horst von Brand 0 siblings, 1 reply; 11+ messages in thread From: Junio C Hamano @ 2006-06-06 21:39 UTC (permalink / raw) To: Horst von Brand; +Cc: git Horst von Brand <vonbrand@inf.utfsm.cl> writes: >> > OK, but be careful as this (?:...) is an extended regexp (needs /x on >> > match). > >> Are you sure about /x? > > The manual (perlop(1)) says you need /x to match extended regexps, and > (?...) is the marker for such (perlre(1)). I always had the impression that eXtended in the context to talk about /x was about ignoring whitespaces and forcing people to write \s (or perhaps \040) when they mean a whitespace and had nothing to do with (?...) stuff. Let me look up the fine manual. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] Cleanup git-send-email.perl:extract_valid_email 2006-06-06 21:39 ` Junio C Hamano @ 2006-06-06 22:48 ` Horst von Brand 0 siblings, 0 replies; 11+ messages in thread From: Horst von Brand @ 2006-06-06 22:48 UTC (permalink / raw) To: Junio C Hamano; +Cc: Horst von Brand, git Junio C Hamano <junkio@cox.net> wrote: > Horst von Brand <vonbrand@inf.utfsm.cl> writes: > >> > OK, but be careful as this (?:...) is an extended regexp (needs /x on > >> > match). > > > >> Are you sure about /x? > > > > The manual (perlop(1)) says you need /x to match extended regexps, and > > (?...) is the marker for such (perlre(1)). > I always had the impression that eXtended in the context to talk > about /x was about ignoring whitespaces and forcing people to > write \s (or perhaps \040) when they mean a whitespace and had > nothing to do with (?...) stuff. Let me look up the fine > manual. You might be right... and it even sounds sensible; but both (?...) stuff and the ignoring of space is described as extended here. Note that \s is a space character (' ', '\t', ...), which is not the same as \040 (and that one assumes ASCII...). -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513 ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2006-06-06 22:48 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-06-03 17:11 [PATCH] Cleanup git-send-email.perl:extract_valid_email Horst H. von Brand 2006-06-03 22:49 ` Eric Wong 2006-06-04 0:10 ` Horst von Brand 2006-06-06 6:59 ` Junio C Hamano 2006-06-06 15:42 ` Horst von Brand 2006-06-06 15:54 ` Junio C Hamano 2006-06-06 16:05 ` Horst von Brand 2006-06-06 16:58 ` Junio C Hamano 2006-06-06 21:24 ` Horst von Brand 2006-06-06 21:39 ` Junio C Hamano 2006-06-06 22:48 ` Horst von Brand
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).