* [RFC] git-send-email: Cache generated message-ids, use them when prompting @ 2013-08-17 0:58 Rasmus Villemoes 2013-08-18 21:08 ` Junio C Hamano 2013-08-21 19:04 ` [PATCH v2 0/2] git-send-email: Message-ID caching Rasmus Villemoes 0 siblings, 2 replies; 7+ messages in thread From: Rasmus Villemoes @ 2013-08-17 0:58 UTC (permalink / raw) To: gitster; +Cc: git, Rasmus Villemoes This is mostly a proof of concept/RFC patch. The idea is for git-send-email to store the message-ids it generates, along with the Subject and Date headers of the message. When prompting for which Message-ID should be used in In-Reply-To, display a list of recent emails (identifed using the Date/Subject pairs; the message-ids themselves are not for human consumption). Choosing from that list will then use the corresponding message-id; otherwise, the behaviour is as usual. When composing v2 or v3 of a patch or patch series, this avoids the need to get one's MUA to display the Message-ID of the earlier email (which is cumbersome in some MUAs) and then copy-paste that. If this idea is accepted, I'm certain I'll get to use the feature immediately, since the patch is not quite ready for inclusion :-) Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk> --- git-send-email.perl | 101 +++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 96 insertions(+), 5 deletions(-) diff --git a/git-send-email.perl b/git-send-email.perl index f608d9b..2e3685c 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -26,6 +26,7 @@ use Data::Dumper; use Term::ANSIColor; use File::Temp qw/ tempdir tempfile /; use File::Spec::Functions qw(catfile); +use Date::Parse; use Error qw(:try); use Git; @@ -203,6 +204,7 @@ my ($validate, $confirm); my (@suppress_cc); my ($auto_8bit_encoding); my ($compose_encoding); +my ($msgid_cache_file, $msgid_maxprompt); my ($debug_net_smtp) = 0; # Net::SMTP, see send_message() @@ -214,7 +216,7 @@ my %config_bool_settings = ( "signedoffcc" => [\$signed_off_by_cc, undef], # Deprecated "validate" => [\$validate, 1], "multiedit" => [\$multiedit, undef], - "annotate" => [\$annotate, undef] + "annotate" => [\$annotate, undef], ); my %config_settings = ( @@ -237,6 +239,7 @@ my %config_settings = ( "from" => \$sender, "assume8bitencoding" => \$auto_8bit_encoding, "composeencoding" => \$compose_encoding, + "msgidcachefile" => \$msgid_cache_file, ); my %config_path_settings = ( @@ -311,6 +314,7 @@ my $rc = GetOptions("h" => \$help, "8bit-encoding=s" => \$auto_8bit_encoding, "compose-encoding=s" => \$compose_encoding, "force" => \$force, + "msgid-cache-file=s" => \$msgid_cache_file, ); usage() if $help; @@ -784,10 +788,31 @@ sub expand_one_alias { @bcclist = validate_address_list(sanitize_address_list(@bcclist)); if ($thread && !defined $initial_reply_to) { - $initial_reply_to = ask( - "Message-ID to be used as In-Reply-To for the first email (if any)? ", - default => "", - valid_re => qr/\@.*\./, confirm_only => 1); + my @choices; + if ($msgid_cache_file) { + @choices = msgid_cache_getmatches(); + } + if (@choices) { + my $prompt = ''; + my $i = 0; + $prompt .= sprintf "(%d) [%s] %s\n", $i++, $_->{date}, $_->{subject} + for (@choices); + $prompt .= sprintf "Answer 0-%d to use the Message-ID of one of the above\n", $#choices; + $prompt .= "Message-ID to be used as In-Reply-To for the first email (if any)? "; + $initial_reply_to = + ask($prompt, + default => "", + valid_re => qr/\@.*\.|^[0-9]+$/, confirm_only => 1); + if ($initial_reply_to =~ /^[0-9]+$/ && $initial_reply_to < @choices) { + $initial_reply_to = $choices[$initial_reply_to]{id}; + } + } + else { + $initial_reply_to = + ask("Message-ID to be used as In-Reply-To for the first email (if any)? ", + default => "", + valid_re => qr/\@.*\./, confirm_only => 1); + } } if (defined $initial_reply_to) { $initial_reply_to =~ s/^\s*<?//; @@ -1282,6 +1307,8 @@ X-Mailer: git-send-email $gitversion } } + msgid_cache_this($message_id, $subject, $date) if ($msgid_cache_file && !$dry_run); + return 1; } @@ -1508,6 +1535,8 @@ sub cleanup_compose_files { $smtp->quit if $smtp; +msgid_cache_write() if $msgid_cache_file; + sub unique_email_list { my %seen; my @emails; @@ -1556,3 +1585,65 @@ sub body_or_subject_has_nonascii { } return 0; } + +my @msgid_new_entries; + +# For now, use a simple tab-separated format: +# +# $id\t$date\t$subject\n +sub msgid_cache_read { + my $fh; + my $line; + my @entries; + if (not open ($fh, '<', $msgid_cache_file)) { + # A non-existing cache file is ok, but should we warn if errno != ENOENT? + return (); + } + while ($line = <$fh>) { + chomp($line); + my ($id, $date, $subject) = split /\t/, $line; + my $epoch = str2time($date); + push @entries, {id=>$id, date=>$date, epoch=>$epoch, subject=>$subject}; + } + close($fh); + return @entries; +} +sub msgid_cache_write { + my $fh; + if (not open($fh, '>>', $msgid_cache_file)) { + warn "cannot open $msgid_cache_file for appending: $!"; + return; + } + printf $fh "%s\t%s\t%s\n", $_->{id}, $_->{date}, $_->{subject} for (@msgid_new_entries); + close($fh); +} +# Return an array of cached message-ids, ordered by "relevance". It +# might make sense to take the Subject of the new mail as an extra +# argument and do some kind of fuzzy matching against the old +# subjects, but for now "more relevant" simply means "newer". +sub msgid_cache_getmatches { + my ($maxentries) = @_; + $maxentries //= 10; + my @list = msgid_cache_read(); + @list = sort {$b->{epoch} <=> $a->{epoch}} @list; + @list = @list[0 .. $maxentries-1] if (@list > $maxentries); + return @list; +} + +sub msgid_cache_this { + my $msgid = shift; + my $subject = shift; + my $date = shift; + # Make sure there are no tabs which will confuse us, and save + # some valuable horizontal real-estate by removing redundant + # whitespace. + if ($subject) { + $subject =~ s/^\s+|\s+$//g; + $subject =~ s/\s+/ /g; + } + # Replace undef or the empty string by an actual string. Nobody uses "0" as the subject... + $subject ||= '(none)'; + $date //= format_2822_time(time()); + $date =~ s/\s+/ /g; + push @msgid_new_entries, {id => $msgid, subject => $subject, date => $date}; +} -- 1.8.4.rc3.2.gb900fc8 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC] git-send-email: Cache generated message-ids, use them when prompting 2013-08-17 0:58 [RFC] git-send-email: Cache generated message-ids, use them when prompting Rasmus Villemoes @ 2013-08-18 21:08 ` Junio C Hamano 2013-08-18 22:24 ` brian m. carlson 2013-08-21 19:04 ` [PATCH v2 0/2] git-send-email: Message-ID caching Rasmus Villemoes 1 sibling, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2013-08-18 21:08 UTC (permalink / raw) To: Rasmus Villemoes; +Cc: git Rasmus Villemoes <rv@rasmusvillemoes.dk> writes: > This is mostly a proof of concept/RFC patch. The idea is for > git-send-email to store the message-ids it generates, along with the > Subject and Date headers of the message. When prompting for which > Message-ID should be used in In-Reply-To, display a list of recent > emails (identifed using the Date/Subject pairs; the message-ids > themselves are not for human consumption). Choosing from that list > will then use the corresponding message-id; otherwise, the behaviour > is as usual. > > When composing v2 or v3 of a patch or patch series, this avoids the > need to get one's MUA to display the Message-ID of the earlier email > (which is cumbersome in some MUAs) and then copy-paste that. > > If this idea is accepted, I'm certain I'll get to use the feature > immediately, since the patch is not quite ready for inclusion :-) > > Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk> > --- > git-send-email.perl | 101 +++++++++++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 96 insertions(+), 5 deletions(-) > > diff --git a/git-send-email.perl b/git-send-email.perl > index f608d9b..2e3685c 100755 > --- a/git-send-email.perl > +++ b/git-send-email.perl > @@ -26,6 +26,7 @@ use Data::Dumper; > use Term::ANSIColor; > use File::Temp qw/ tempdir tempfile /; > use File::Spec::Functions qw(catfile); > +use Date::Parse; Hmm, is this part of core that we do not have to worry about some people not having it? It appears that Git/SVN/Log.pm explicitly avoids using it. > @@ -203,6 +204,7 @@ my ($validate, $confirm); > my (@suppress_cc); > my ($auto_8bit_encoding); > my ($compose_encoding); > +my ($msgid_cache_file, $msgid_maxprompt); I do not see $msgid_maxprompt used anywhere in the new code. > > my ($debug_net_smtp) = 0; # Net::SMTP, see send_message() > > @@ -214,7 +216,7 @@ my %config_bool_settings = ( > "signedoffcc" => [\$signed_off_by_cc, undef], # Deprecated > "validate" => [\$validate, 1], > "multiedit" => [\$multiedit, undef], > - "annotate" => [\$annotate, undef] > + "annotate" => [\$annotate, undef], Is this related to this patch in any way? > ); > > my %config_settings = ( > @@ -237,6 +239,7 @@ my %config_settings = ( > "from" => \$sender, > "assume8bitencoding" => \$auto_8bit_encoding, > "composeencoding" => \$compose_encoding, > + "msgidcachefile" => \$msgid_cache_file, > ); > > my %config_path_settings = ( > @@ -311,6 +314,7 @@ my $rc = GetOptions("h" => \$help, > "8bit-encoding=s" => \$auto_8bit_encoding, > "compose-encoding=s" => \$compose_encoding, > "force" => \$force, > + "msgid-cache-file=s" => \$msgid_cache_file, > ); Is there a standard, recommended location we suggest users to store this? > usage() if $help; > @@ -784,10 +788,31 @@ sub expand_one_alias { > @bcclist = validate_address_list(sanitize_address_list(@bcclist)); > > if ($thread && !defined $initial_reply_to) { > - $initial_reply_to = ask( > - "Message-ID to be used as In-Reply-To for the first email (if any)? ", > - default => "", > - valid_re => qr/\@.*\./, confirm_only => 1); > + my @choices; > + if ($msgid_cache_file) { > + @choices = msgid_cache_getmatches(); It is a bit strange that "filename is specified => we will find the latest 10" before seeing if we even check to see if that file exists. I would have expected that a two-step "filename is given => try to read it" and "if we did read something => give choices" process would be used. Also "getmatches" that does not take any clue from what the caller knows (the title of the series, for example) seems much less useful than ideal. The callee is not getting anything to work with to get sensible "matches". > @@ -1282,6 +1307,8 @@ X-Mailer: git-send-email $gitversion > } > } > > + msgid_cache_this($message_id, $subject, $date) if ($msgid_cache_file && !$dry_run); Is this caching each and every one, even for things like "[PATCH 23/47]"? > @@ -1508,6 +1535,8 @@ sub cleanup_compose_files { > > $smtp->quit if $smtp; > > +msgid_cache_write() if $msgid_cache_file; Is this done under --dry-run? > @@ -1556,3 +1585,65 @@ sub body_or_subject_has_nonascii { > } > return 0; > } > + > +my @msgid_new_entries; > + > +# For now, use a simple tab-separated format: > +# > +# $id\t$date\t$subject\n > +sub msgid_cache_read { > + my $fh; > + my $line; > + my @entries; > + if (not open ($fh, '<', $msgid_cache_file)) { > + # A non-existing cache file is ok, but should we warn if errno != ENOENT? It should not be a warning but an informational message, "creating a new cachefile", when bootstrapping, no? > + while ($line = <$fh>) { > + chomp($line); > + my ($id, $date, $subject) = split /\t/, $line; > + my $epoch = str2time($date); > + push @entries, {id=>$id, date=>$date, epoch=>$epoch, subject=>$subject}; > + } > + close($fh); > + return @entries; So all the old ones are read, without dropping ancient and possibly useless ones here... > +sub msgid_cache_write { > + my $fh; > + if (not open($fh, '>>', $msgid_cache_file)) { > + warn "cannot open $msgid_cache_file for appending: $!"; > + return; > + } > + printf $fh "%s\t%s\t%s\n", $_->{id}, $_->{date}, $_->{subject} for (@msgid_new_entries); > + close($fh); And the new ones are appended to the end without expiring anything. When does the cache shrink, and who is responsible for doing so? > +# Return an array of cached message-ids, ordered by "relevance". It > +# might make sense to take the Subject of the new mail as an extra > +# argument and do some kind of fuzzy matching against the old > +# subjects, but for now "more relevant" simply means "newer". > +sub msgid_cache_getmatches { > + my ($maxentries) = @_; > + $maxentries //= 10; The //= operator is mentioned in perl581delta.pod, it seems, and none of our Perl scripted Porcelains seems to use it yet. Is it safe to assume that everybody has it? > + my @list = msgid_cache_read(); > + @list = sort {$b->{epoch} <=> $a->{epoch}} @list; > + @list = @list[0 .. $maxentries-1] if (@list > $maxentries); > + return @list; > +} > + > +sub msgid_cache_this { > + my $msgid = shift; > + my $subject = shift; > + my $date = shift; > + # Make sure there are no tabs which will confuse us, and save > + # some valuable horizontal real-estate by removing redundant > + # whitespace. > + if ($subject) { > + $subject =~ s/^\s+|\s+$//g; > + $subject =~ s/\s+/ /g; > + } > + # Replace undef or the empty string by an actual string. Nobody uses "0" as the subject... That's a strange sloppiness for somebody who uses //=, no? > + $subject ||= '(none)'; > + $date //= format_2822_time(time()); > + $date =~ s/\s+/ /g; > + push @msgid_new_entries, {id => $msgid, subject => $subject, date => $date}; > +} I am personally not very interested, but that only means I will not be the one who will be enthusiastically rooting for inclusion; it does not mean I reject the general notion outright. But the behaviour the posted patch seems to implement seems to fall far short of how useful the general notion "save message ID of what we sent, so that we can reply to them" could be. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] git-send-email: Cache generated message-ids, use them when prompting 2013-08-18 21:08 ` Junio C Hamano @ 2013-08-18 22:24 ` brian m. carlson 0 siblings, 0 replies; 7+ messages in thread From: brian m. carlson @ 2013-08-18 22:24 UTC (permalink / raw) To: Junio C Hamano; +Cc: Rasmus Villemoes, git [-- Attachment #1: Type: text/plain, Size: 1032 bytes --] On Sun, Aug 18, 2013 at 02:08:00PM -0700, Junio C Hamano wrote: > Rasmus Villemoes <rv@rasmusvillemoes.dk> writes: > > +# Return an array of cached message-ids, ordered by "relevance". It > > +# might make sense to take the Subject of the new mail as an extra > > +# argument and do some kind of fuzzy matching against the old > > +# subjects, but for now "more relevant" simply means "newer". > > +sub msgid_cache_getmatches { > > + my ($maxentries) = @_; > > + $maxentries //= 10; > > The //= operator is mentioned in perl581delta.pod, it seems, and > none of our Perl scripted Porcelains seems to use it yet. Is it > safe to assume that everybody has it? This operator is new in Perl 5.10. If you want the code to work on RHEL/CentOS 5, you need to avoid it, since they only have 5.8 available. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187 [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 0/2] git-send-email: Message-ID caching 2013-08-17 0:58 [RFC] git-send-email: Cache generated message-ids, use them when prompting Rasmus Villemoes 2013-08-18 21:08 ` Junio C Hamano @ 2013-08-21 19:04 ` Rasmus Villemoes 2013-08-21 19:04 ` [PATCH 1/2] git-send-email: add optional 'choices' parameter to the ask sub Rasmus Villemoes ` (2 more replies) 1 sibling, 3 replies; 7+ messages in thread From: Rasmus Villemoes @ 2013-08-21 19:04 UTC (permalink / raw) To: gitster, sandals, git; +Cc: Rasmus Villemoes This is a more polished attempt at implementing the message-id caching. I apologize for the sloppiness (unrelated changes, unused variables etc.) in the first version. I have split the patch in two. The first half moves the choice-logic inside ask(): If ask() decides to return the default immediately (because the $term->{IN,OUT} checks fail), there's no reason to print all the choices. In my first attempt, I handled this by passing a huge prompt-string, but that made everything underlined (the prompt may be emphasized in some other way on other terminals). Passing a different valid_re and handling a numeric return is also slightly more cumbersome for the caller than making ask() take care of it. There might be other future uses of this 'choices' ability, but if the message-id-caching is ultimately rejected, [1/2] can of course also be dropped. [2/2] is the new implementation. The two main changes are that old entries are expunged when the file grows larger than, by default, 100 kB, and the old emails are scored according to a simple scheme (which might need improvement). The input to the scoring is {first subject in new batch, old subject, was the old email first in a batch?}. [*] I also now simply store the unix time stamp instead of storing the contents of the date header. This reduces processing time (no need to parse the date header when reading the file), and eliminates the Date::Parse dependency. The minor downside is that if the user has changed time zone since the old email was sent, the date in the prompt will not entirely match the Date:-header in the email that was sent. I had to rearrange the existing code a little to make certain global variables ($time, @files) contain the right thing at the right time, but hopefully I haven't broken anything in the process. [*] Suggestions for improving that heuristic are welcome. One thing that might be worthwhile is to give words ending in a colon higher weight. Rasmus Villemoes (2): git-send-email: add optional 'choices' parameter to the ask sub git-send-email: Cache generated message-ids, use them when prompting git-send-email.perl | 144 ++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 138 insertions(+), 6 deletions(-) Thanks, Junio, for the comments. A few replies: Junio C Hamano <gitster@pobox.com> writes: > Rasmus Villemoes <rv@rasmusvillemoes.dk> writes: >> my %config_path_settings = ( >> @@ -311,6 +314,7 @@ my $rc = GetOptions("h" => \$help, >> "8bit-encoding=s" => \$auto_8bit_encoding, >> "compose-encoding=s" => \$compose_encoding, >> "force" => \$force, >> + "msgid-cache-file=s" => \$msgid_cache_file, >> ); > > Is there a standard, recommended location we suggest users to store > this? I don't know. It is obviously a local, per-repository, thing. I don't know enough about git's internals to know if something breaks if one puts it in .git (say, ".git/msgid.cache"). If that's a bad idea, then I think it should be in the root of the repository, and one would then probably want to add it to .git/info/exclude. If storing it under .git is possible, one could consider making the option a boolean ('msgid-use-cache' ?) and always use ".git/msgid.cache". >> + my @choices; >> + if ($msgid_cache_file) { >> + @choices = msgid_cache_getmatches(); > > It is a bit strange that "filename is specified => we will find the > latest 10" before seeing if we even check to see if that file > exists. I would have expected that a two-step "filename is given => > try to read it" and "if we did read something => give choices" > process would be used. I'm not sure I understand how this is different from what I was or am doing. Sure, I don't do the test for existence before calling msgid_cache_getmatches(), but that will just return an empty list if the file does not exist. That will end up doing the same thing as if no cache file was given. > Also "getmatches" that does not take any clue from what the caller > knows (the title of the series, for example) seems much less useful > than ideal. Fixed in the new version. >> + msgid_cache_this($message_id, $subject, $date) if ($msgid_cache_file && !$dry_run); > > Is this caching each and every one, even for things like "[PATCH 23/47]"? Yes, but now I remember whether it was the first or not. >> +msgid_cache_write() if $msgid_cache_file; > > Is this done under --dry-run? Well, it was, but the msgid_cache_this() was not, so there was an empty list of new entries. Of course, better to be explicit and safe, so fixed. >> + if (not open ($fh, '<', $msgid_cache_file)) { >> + # A non-existing cache file is ok, but should we warn if errno != ENOENT? > > It should not be a warning but an informational message, "creating a > new cachefile", when bootstrapping, no? If so, it should probably be when writing the new file. What I'm trying to say is that errno == ENOENT is ok (and expected), but errno == EPERM or anything else might be a condition we should warn or even die on. >> + while ($line = <$fh>) { >> + chomp($line); >> + my ($id, $date, $subject) = split /\t/, $line; >> + my $epoch = str2time($date); >> + push @entries, {id=>$id, date=>$date, epoch=>$epoch, subject=>$subject}; >> + } >> + close($fh); >> + return @entries; > > So all the old ones are read, without dropping ancient and possibly > useless ones here... > >> +sub msgid_cache_write { >> + my $fh; >> + if (not open($fh, '>>', $msgid_cache_file)) { >> + warn "cannot open $msgid_cache_file for appending: $!"; >> + return; >> + } >> + printf $fh "%s\t%s\t%s\n", $_->{id}, $_->{date}, $_->{subject} for (@msgid_new_entries); >> + close($fh); > > And the new ones are appended to the end without expiring anything. > > When does the cache shrink, and who is responsible for doing so? Fixed. Whether it should be "cap when larger than $size" or "remove entries older than time()-$maxage" I don't know. > The //= operator is mentioned in perl581delta.pod, it seems, and > none of our Perl scripted Porcelains seems to use it yet. Is it > safe to assume that everybody has it? No, fixed. >> + if ($subject) { >> + $subject =~ s/^\s+|\s+$//g; >> + $subject =~ s/\s+/ /g; >> + } >> + # Replace undef or the empty string by an actual string. Nobody uses "0" as the subject... > > That's a strange sloppiness for somebody who uses //=, no? Fixed. -- 1.7.9.5 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] git-send-email: add optional 'choices' parameter to the ask sub 2013-08-21 19:04 ` [PATCH v2 0/2] git-send-email: Message-ID caching Rasmus Villemoes @ 2013-08-21 19:04 ` Rasmus Villemoes 2013-08-21 19:04 ` [PATCH 2/2] git-send-email: Cache generated message-ids, use them when prompting Rasmus Villemoes 2013-08-22 0:20 ` [PATCH v2 0/2] git-send-email: Message-ID caching Junio C Hamano 2 siblings, 0 replies; 7+ messages in thread From: Rasmus Villemoes @ 2013-08-21 19:04 UTC (permalink / raw) To: gitster, sandals, git; +Cc: Rasmus Villemoes Make it possible for callers of ask() to provide a list of choices. Entering an appropriate integer chooses from that list, otherwise the input is treated as usual. Each choice can either be a single string, which is used both for the prompt and for the return value, or a two-element array ref, where the zeroth element is the choice and the first is used for the prompt. Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk> --- git-send-email.perl | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/git-send-email.perl b/git-send-email.perl index 2162478..ac3b02d 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -680,11 +680,18 @@ sub ask { my $valid_re = $arg{valid_re}; my $default = $arg{default}; my $confirm_only = $arg{confirm_only}; + my $choices = $arg{choices} || []; my $resp; my $i = 0; return defined $default ? $default : undef unless defined $term->IN and defined fileno($term->IN) and defined $term->OUT and defined fileno($term->OUT); + for (@$choices) { + printf "(%d) %s\n", $i++, ref($_) eq 'ARRAY' ? $_->[1] : $_; + } + printf "Enter 0-%d to choose from the above list\n", $i-1 + if (@$choices); + $i = 0; while ($i++ < 10) { $resp = $term->readline($prompt); if (!defined $resp) { # EOF @@ -694,6 +701,10 @@ sub ask { if ($resp eq '' and defined $default) { return $default; } + if (@$choices && $resp =~ m/^[0-9]+$/ && $resp < @$choices) { + my $c = $choices->[$resp]; + return ref($c) eq 'ARRAY' ? $c->[0] : $c; + } if (!defined $valid_re or $resp =~ /$valid_re/) { return $resp; } -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] git-send-email: Cache generated message-ids, use them when prompting 2013-08-21 19:04 ` [PATCH v2 0/2] git-send-email: Message-ID caching Rasmus Villemoes 2013-08-21 19:04 ` [PATCH 1/2] git-send-email: add optional 'choices' parameter to the ask sub Rasmus Villemoes @ 2013-08-21 19:04 ` Rasmus Villemoes 2013-08-22 0:20 ` [PATCH v2 0/2] git-send-email: Message-ID caching Junio C Hamano 2 siblings, 0 replies; 7+ messages in thread From: Rasmus Villemoes @ 2013-08-21 19:04 UTC (permalink / raw) To: gitster, sandals, git; +Cc: Rasmus Villemoes Allow the user to specify a file (sendemail.msgidcachefile) in which to store the message-ids generated by git-send-email, along with time and subject information. When prompting for a Message-ID to be used in In-Reply-To, that file can be used to generate a list of options. When composing v2 or v3 of a patch or patch series, this avoids the need to get one's MUA to display the Message-ID of the earlier email (which is cumbersome in some MUAs) and then copy-paste that. Listing all previously sent emails is useless, so currently only the 10 most "relevant" emails. "Relevant" is based on a simple scoring, which might need to be revised: Count the words in the old subject which also appear in the subject of the first email to be sent; add a bonus if the old email was first in a batch (that is, [00/74] is more likely to be relevant than [43/74]). Resort to comparing timestamps (newer is more relevant) when the scores tie. To limit disk usage, the oldest half of the cached entries are expunged when the cache file exceeds sendemail.msgidcachemaxsize (default 100kB). This also ensures that we will never have to read, score, and sort 1000s of entries on each invocation of git-send-email. Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk> --- git-send-email.perl | 133 ++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 127 insertions(+), 6 deletions(-) diff --git a/git-send-email.perl b/git-send-email.perl index ac3b02d..5094267 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -203,6 +203,7 @@ my ($validate, $confirm); my (@suppress_cc); my ($auto_8bit_encoding); my ($compose_encoding); +my ($msgid_cache_file, $msgid_cache_maxsize); my ($debug_net_smtp) = 0; # Net::SMTP, see send_message() @@ -237,6 +238,8 @@ my %config_settings = ( "from" => \$sender, "assume8bitencoding" => \$auto_8bit_encoding, "composeencoding" => \$compose_encoding, + "msgidcachefile" => \$msgid_cache_file, + "msgidcachemaxsize" => \$msgid_cache_maxsize, ); my %config_path_settings = ( @@ -796,11 +799,23 @@ sub expand_one_alias { @bcclist = expand_aliases(@bcclist); @bcclist = validate_address_list(sanitize_address_list(@bcclist)); +if ($compose && $compose > 0) { + @files = ($compose_filename . ".final", @files); +} + if ($thread && !defined $initial_reply_to && $prompting) { + my @choices = (); + if ($msgid_cache_file) { + my $first_subject = get_patch_subject($files[0]); + $first_subject =~ s/^GIT: //; + @choices = msgid_cache_getmatches($first_subject, 10); + @choices = map {[$_->{id}, sprintf "[%s] %s", format_2822_time($_->{epoch}), $_->{subject}]} @choices; + } $initial_reply_to = ask( "Message-ID to be used as In-Reply-To for the first email (if any)? ", default => "", - valid_re => qr/\@.*\./, confirm_only => 1); + valid_re => qr/\@.*\./, confirm_only => 1, + choices => \@choices); } if (defined $initial_reply_to) { $initial_reply_to =~ s/^\s*<?//; @@ -818,10 +833,6 @@ if (!defined $smtp_server) { $smtp_server ||= 'localhost'; # could be 127.0.0.1, too... *shrug* } -if ($compose && $compose > 0) { - @files = ($compose_filename . ".final", @files); -} - # Variables we set as part of the loop over files our ($message_id, %mail, $subject, $reply_to, $references, $message, $needs_confirm, $message_num, $ask_default); @@ -1136,7 +1147,7 @@ sub send_message { my $to = join (",\n\t", @recipients); @recipients = unique_email_list(@recipients,@cc,@bcclist); @recipients = (map { extract_valid_address_or_die($_) } @recipients); - my $date = format_2822_time($time++); + my $date = format_2822_time($time); my $gitversion = '@@GIT_VERSION@@'; if ($gitversion =~ m/..GIT_VERSION../) { $gitversion = Git::version(); @@ -1477,6 +1488,11 @@ foreach my $t (@files) { my $message_was_sent = send_message(); + if ($message_was_sent && $msgid_cache_file && !$dry_run) { + msgid_cache_this($message_id, $message_num == 1 ? 1 : 0, , $time, $subject); + } + $time++; + # set up for the next message if ($thread && $message_was_sent && ($chain_reply_to || !defined $reply_to || length($reply_to) == 0 || @@ -1521,6 +1537,8 @@ sub cleanup_compose_files { $smtp->quit if $smtp; +msgid_cache_write() if $msgid_cache_file && !$dry_run; + sub unique_email_list { my %seen; my @emails; @@ -1569,3 +1587,106 @@ sub body_or_subject_has_nonascii { } return 0; } + +my @msgid_new_entries; +sub msgid_cache_this { + my $msgid = shift; + my $first = shift; + my $epoch = shift; + my $subject = shift; + # Make sure there are no tabs which will confuse us, and save + # some valuable horizontal real-estate by removing redundant + # whitespace. + if ($subject) { + $subject =~ s/^\s+|\s+$//g; + $subject =~ s/\s+/ /g; + } + # Replace undef or the empty string by an actual string. + $subject = '(none)' if (!defined $subject || $subject eq ''); + + push @msgid_new_entries, {id => $msgid, first => $first, subject => $subject, epoch => $epoch}; +} + + +# For now, use a simple tab-separated format: +# +# $id\t$wasfirst\t$unixtime\t$subject\n +sub msgid_cache_read { + my $fh; + my $line; + my @entries; + if (not open ($fh, '<', $msgid_cache_file)) { + # A non-existing cache file is ok, but should we warn if errno != ENOENT? + return (); + } + while ($line = <$fh>) { + chomp($line); + my ($id, $first, $epoch, $subject) = split /\t/, $line; + push @entries, {id=>$id, first=>$first, epoch=>$epoch, subject=>$subject}; + } + close($fh); + return @entries; +} + +sub msgid_cache_getmatches { + my ($first_subject, $maxentries) = @_; + my @list = msgid_cache_read(); + + # We need to find the message-ids which are most likely to be + # useful. There are probably better ways to do this, but for + # now we simply count how many words in the old subject also + # appear in $first_subject. + my %words = map {$_ => 1} msgid_subject_words($first_subject); + for my $item (@list) { + # Emails which were first in a batch are more likely + # to be used for followups (cf. the example in "man + # git-send-email"), so give those a head start. + my $score = $item->{first} ? 3 : 0; + for (msgid_subject_words($item->{subject})) { + $score++ if exists $words{$_}; + } + $item->{score} = $score; + } + @list = sort {$b->{score} <=> $a->{score} || + $b->{epoch} <=> $a->{epoch}} @list; + @list = @list[0 .. $maxentries-1] if (@list > $maxentries); + return @list; +} + +sub msgid_subject_words { + my $subject = shift; + # Ignore initial "[PATCH 02/47]" + $subject =~ s/^\s*\[.*?\]//; + my @words = split /\s+/, $subject; + # Ignore short words. + @words = grep { length > 3 } @words; + return @words; +} + +sub msgid_cache_write { + msgid_cache_do_write(1, \@msgid_new_entries); + + if (defined $msgid_cache_maxsize && $msgid_cache_maxsize =~ m/^\s*([0-9]+)\s*([kKmMgG]?)$/) { + my %SI = ('' => 1, 'k' => 1e3, 'm' => 1e6, 'g' => 1e9); + $msgid_cache_maxsize = $1 * $SI{lc($2)}; + } + else { + $msgid_cache_maxsize = 100000; + } + if (-s $msgid_cache_file > $msgid_cache_maxsize) { + my @entries = msgid_cache_read(); + splice @entries, 0, int(@entries/2); + msgid_cache_do_write(0, \@entries); + } +} + +sub msgid_cache_do_write { + my $append = shift; + my $entries = shift; + my $fh; + if (not open($fh, $append ? '>>' : '>', $msgid_cache_file)) { + die "cannot open $msgid_cache_file for writing: $!"; + } + printf $fh "%s\t%d\t%s\t%s\n", $_->{id}, $_->{first}, $_->{epoch}, $_->{subject} for (@$entries); + close($fh); +} -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2 0/2] git-send-email: Message-ID caching 2013-08-21 19:04 ` [PATCH v2 0/2] git-send-email: Message-ID caching Rasmus Villemoes 2013-08-21 19:04 ` [PATCH 1/2] git-send-email: add optional 'choices' parameter to the ask sub Rasmus Villemoes 2013-08-21 19:04 ` [PATCH 2/2] git-send-email: Cache generated message-ids, use them when prompting Rasmus Villemoes @ 2013-08-22 0:20 ` Junio C Hamano 2 siblings, 0 replies; 7+ messages in thread From: Junio C Hamano @ 2013-08-22 0:20 UTC (permalink / raw) To: Rasmus Villemoes; +Cc: sandals, git Rasmus Villemoes <rv@rasmusvillemoes.dk> writes: >> Rasmus Villemoes <rv@rasmusvillemoes.dk> writes: >>> my %config_path_settings = ( >>> @@ -311,6 +314,7 @@ my $rc = GetOptions("h" => \$help, >>> "8bit-encoding=s" => \$auto_8bit_encoding, >>> "compose-encoding=s" => \$compose_encoding, >>> "force" => \$force, >>> + "msgid-cache-file=s" => \$msgid_cache_file, >>> ); >> >> Is there a standard, recommended location we suggest users to store >> this? > > I don't know. It is obviously a local, per-repository, thing. I don't > know enough about git's internals to know if something breaks if one > puts it in .git (say, ".git/msgid.cache"). I think $GIT_DIR is OK, when we _know_ we are in a Git controlled directory. "git send-email" can however be invoked in a random directory that is _not_ a Git controlled directory, though. In any case, if we were to store it inside $GIT_DIR, I'd prefer to have "send-email" somewhere in the name of the file, as there are other Git programs that deal with things that have "msgid" (notably, "am") that will not have anything to do with this file. > If storing it under .git is possible, one could consider making the > option a boolean ('msgid-use-cache' ?) and always use > ".git/msgid.cache". Another possibility is to have it in the output directory specified via the "format-patch -o $dir" option. When you are rerolling a series multiple times, you will only look at the message ID from the previous round; you do not even need to look at old messages in an unrelated topic. I could imagine that git send-email $dir/0*.txt can notice that these input files are all in the same $dir directory, check to see if $dir/message-id file exists, read it to offer it as the suggested initial-reply-to. Similarly, when sending the _first_ message in such an invocation, it can just write the generated message-id to that file. Then we need no choices. It is sufficient to just keep a single message-id of the first message in the previous round and offer it as a possible initial-reply-to in a Yes/No question. Just a random thought. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-08-22 0:20 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-08-17 0:58 [RFC] git-send-email: Cache generated message-ids, use them when prompting Rasmus Villemoes 2013-08-18 21:08 ` Junio C Hamano 2013-08-18 22:24 ` brian m. carlson 2013-08-21 19:04 ` [PATCH v2 0/2] git-send-email: Message-ID caching Rasmus Villemoes 2013-08-21 19:04 ` [PATCH 1/2] git-send-email: add optional 'choices' parameter to the ask sub Rasmus Villemoes 2013-08-21 19:04 ` [PATCH 2/2] git-send-email: Cache generated message-ids, use them when prompting Rasmus Villemoes 2013-08-22 0:20 ` [PATCH v2 0/2] git-send-email: Message-ID caching Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).