public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC] send-email: UTF-8 encoding in subject line
@ 2026-02-20 14:50 Shreyansh Paliwal
  2026-02-21  2:28 ` Ben Knoble
  2026-02-24 14:33 ` [PATCH] send-email: validate charset name in 8bit encoding prompt Shreyansh Paliwal
  0 siblings, 2 replies; 27+ messages in thread
From: Shreyansh Paliwal @ 2026-02-20 14:50 UTC (permalink / raw)
  To: git

Hi,

While using git send-email I ran into some confusion around the prompt that
appears when any 8-bit (non-ASCII) content is detected.

When prompted with,

  Which 8bit encoding should I declare [UTF-8]? y
  Are you sure you want to use <y> [y/N]? y

I initially assumed this was a yes/no style confirmation and answered "y",
and ignored the 'which' part (this was due to my oversight). This resulted
in the charset being set to "y", which later produced a subject line like,

  =?y?q?...?=

Mail clients like Gmail still displayed the message correctly, but the
mailing list archive showed the raw encoded form[1].

Afterwards, I realized the prompt expects a charset name (e.g., "UTF-8")
rather than a yes/no answer, and pressing enter would have selected the
default (which is UTF-8).

I had also encountered this earlier when the non-ASCII character was in the
message body rather than the subject, in that case the result appeared to
work fine even with the mistaken input, which made the issue less obvious
to me at first.

This made me wonder whether the current UX around the prompts or input
validation could be improved in any way to reduce the chance of accidental
input being interpreted as a charset name.

Best,
Shreyansh

[1]- https://lore.kernel.org/git/20260219181154.66814-1-shreyanshpaliwalcmsmn@gmail.com/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] send-email: UTF-8 encoding in subject line
  2026-02-20 14:50 [RFC] send-email: UTF-8 encoding in subject line Shreyansh Paliwal
@ 2026-02-21  2:28 ` Ben Knoble
  2026-02-21 13:38   ` Shreyansh Paliwal
  2026-02-24 14:33 ` [PATCH] send-email: validate charset name in 8bit encoding prompt Shreyansh Paliwal
  1 sibling, 1 reply; 27+ messages in thread
From: Ben Knoble @ 2026-02-21  2:28 UTC (permalink / raw)
  To: Shreyansh Paliwal; +Cc: git


> Le 20 févr. 2026 à 09:51, Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com> a écrit :
> 
> Hi,
> 
> While using git send-email I ran into some confusion around the prompt that
> appears when any 8-bit (non-ASCII) content is detected.
> 
> When prompted with,
> 
>  Which 8bit encoding should I declare [UTF-8]? y
>  Are you sure you want to use <y> [y/N]? y

Yeah, that was a bit confusing for me until I got used to it. Maybe saying “[default: UTF-8]” would be a small and definite improvement?

> I initially assumed this was a yes/no style confirmation and answered "y",
> and ignored the 'which' part (this was due to my oversight). This resulted
> in the charset being set to "y", which later produced a subject line like,
> 
>  =?y?q?...?=
> 
> Mail clients like Gmail still displayed the message correctly, but the
> mailing list archive showed the raw encoded form[1].
> 
> Afterwards, I realized the prompt expects a charset name (e.g., "UTF-8")
> rather than a yes/no answer, and pressing enter would have selected the
> default (which is UTF-8).
> 
> I had also encountered this earlier when the non-ASCII character was in the
> message body rather than the subject, in that case the result appeared to
> work fine even with the mistaken input, which made the issue less obvious
> to me at first.
> 
> This made me wonder whether the current UX around the prompts or input
> validation could be improved in any way to reduce the chance of accidental
> input being interpreted as a charset name.
> 
> Best,
> Shreyansh
> 
> [1]- https://lore.kernel.org/git/20260219181154.66814-1-shreyanshpaliwalcmsmn@gmail.com/

Thanks for thinking on this; better that I never needed to get used to the oddity ;)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] send-email: UTF-8 encoding in subject line
  2026-02-21  2:28 ` Ben Knoble
@ 2026-02-21 13:38   ` Shreyansh Paliwal
  2026-02-21 17:30     ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Shreyansh Paliwal @ 2026-02-21 13:38 UTC (permalink / raw)
  To: git; +Cc: ben.knoble, gitster, Shreyansh Paliwal

> > Hi,
> >
> > While using git send-email I ran into some confusion around the prompt that
> > appears when any 8-bit (non-ASCII) content is detected.
> >
> > When prompted with,
> >
> >  Which 8bit encoding should I declare [UTF-8]? y
> >  Are you sure you want to use <y> [y/N]? y
>
> Yeah, that was a bit confusing for me until I got used to it. Maybe
> saying “[default: UTF-8]” would be a small and definite improvement?

That makes sense, I tried it below.
I also wondered whether, in addition to this, it might be helpful to warn on
an invalid charset, and/or possibly fall back to UTF-8.

Let me know what you think.

Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
---
 git-send-email.perl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index cd4b316ddc..12d0e7e6c9 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -1044,7 +1044,7 @@ sub file_declares_8bit_cte {
 	foreach my $f (sort keys %broken_encoding) {
 		print "    $f\n";
 	}
-	$auto_8bit_encoding = ask(__("Which 8bit encoding should I declare [UTF-8]? "),
+	$auto_8bit_encoding = ask(__("Which 8bit encoding should I declare [default: UTF-8]? "),
 				  valid_re => qr/.{4}/, confirm_only => 1,
 				  default => "UTF-8");
 }
--
2.53.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC] send-email: UTF-8 encoding in subject line
  2026-02-21 13:38   ` Shreyansh Paliwal
@ 2026-02-21 17:30     ` Junio C Hamano
  2026-02-22 14:03       ` Shreyansh Paliwal
  2026-02-22 14:53       ` [RFC] send-email: UTF-8 encoding in subject line D. Ben Knoble
  0 siblings, 2 replies; 27+ messages in thread
From: Junio C Hamano @ 2026-02-21 17:30 UTC (permalink / raw)
  To: Shreyansh Paliwal; +Cc: git, ben.knoble

Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com> writes:

>> Yeah, that was a bit confusing for me until I got used to it. Maybe
>> saying “[default: UTF-8]” would be a small and definite improvement?

The current message can be mistaken, if the reader does not READ, if
it is asking a yes/no question, but with the "default" label, you
cannot imagine answering "yes", which is clearly not one of the
things in the same class as "UTF-8" that is given as the default,
which also serves as an example.

This is indeed a clever hack (not hack on computer code but hack on
the mind of human who is reading the message).  

> That makes sense, I tried it below.
> I also wondered whether, in addition to this, it might be helpful to warn on
> an invalid charset, and/or possibly fall back to UTF-8.

Agreed on the first half of the statement, if we have an easy and
portable way to tell if a given random string names a valid charset.
I do not recommend to "fall back" to anything, if we are asking an
input from the user.

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] send-email: UTF-8 encoding in subject line
  2026-02-21 17:30     ` Junio C Hamano
@ 2026-02-22 14:03       ` Shreyansh Paliwal
  2026-02-22 14:53         ` Philip Oakley
  2026-02-22 15:00         ` D. Ben Knoble
  2026-02-22 14:53       ` [RFC] send-email: UTF-8 encoding in subject line D. Ben Knoble
  1 sibling, 2 replies; 27+ messages in thread
From: Shreyansh Paliwal @ 2026-02-22 14:03 UTC (permalink / raw)
  To: git; +Cc: ben.knoble, gitster, Shreyansh Paliwal

> > That makes sense, I tried it below.
> > I also wondered whether, in addition to this, it might be helpful to warn on
> > an invalid charset, and/or possibly fall back to UTF-8.
>
> Agreed on the first half of the statement, if we have an easy and
> portable way to tell if a given random string names a valid charset.
> I do not recommend to "fall back" to anything, if we are asking an
> input from the user.

Following up on this, I tried adding a warning when the provided charset
does not appear to be valid. Current flow is,

  Which 8bit encoding should I declare [UTF-8]? y
  Are you sure you want to use <y> [y/N]? y

With the additional check, it becomes,

  Which 8bit encoding should I declare [default: UTF-8]? y
  warning: 'y' does not appear to be a valid charset name.
  Are you sure you want to use <y> [y/N]?

This uses find_encoding() from Perl’s Encode module to detect any
unrecognized charset names.

Let me know what you think.
Also, is there any new test that should be added for this change?

Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
---
 git-send-email.perl | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index cd4b316ddc..e62fa259ba 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -23,6 +23,7 @@
 use Git::LoadCPAN::Error qw(:try);
 use Git;
 use Git::I18N;
+use Encode qw(find_encoding);
 
 Getopt::Long::Configure qw/ pass_through /;
 
@@ -1044,9 +1045,25 @@ sub file_declares_8bit_cte {
 	foreach my $f (sort keys %broken_encoding) {
 		print "    $f\n";
 	}
-	$auto_8bit_encoding = ask(__("Which 8bit encoding should I declare [UTF-8]? "),
-				  valid_re => qr/.{4}/, confirm_only => 1,
-				  default => "UTF-8");
+	while (1) {
+		my $encoding = ask(__("Which 8bit encoding should I declare [default: UTF-8]? "),
+			valid_re => qr/^\S+$/,
+			default  => "UTF-8");
+		next unless defined $encoding;
+		if (find_encoding($encoding)) {
+			$auto_8bit_encoding = $encoding;
+			last;
+		}
+		printf STDERR __("warning: '%s' does not appear to be a valid charset name.\n"), $encoding;
+		my $yesno = ask(
+			sprintf(__("Are you sure you want to use <%s> [y/N]? "), $encoding),
+			valid_re => qr/^(?:y|n)/i,
+			default  => 'n');
+		if (defined $yesno && $yesno =~ /^y/i) {
+			$auto_8bit_encoding = $encoding;
+			last;
+		}
+	}
 }
 
 if (!$force) {
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC] send-email: UTF-8 encoding in subject line
  2026-02-21 17:30     ` Junio C Hamano
  2026-02-22 14:03       ` Shreyansh Paliwal
@ 2026-02-22 14:53       ` D. Ben Knoble
  1 sibling, 0 replies; 27+ messages in thread
From: D. Ben Knoble @ 2026-02-22 14:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shreyansh Paliwal, git

On Sat, Feb 21, 2026 at 12:30 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com> writes:
>
> >> Yeah, that was a bit confusing for me until I got used to it. Maybe
> >> saying “[default: UTF-8]” would be a small and definite improvement?
>
> The current message can be mistaken, if the reader does not READ, if
> it is asking a yes/no question, but with the "default" label, you
> cannot imagine answering "yes", which is clearly not one of the
> things in the same class as "UTF-8" that is given as the default,
> which also serves as an example.
>
> This is indeed a clever hack (not hack on computer code but hack on
> the mind of human who is reading the message).

Yep. I've seen it somewhere before; I thought it was from Portage's
emerge in "ask" mode, but checking that doesn't seem to be the case.

-- 
D. Ben Knoble

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] send-email: UTF-8 encoding in subject line
  2026-02-22 14:03       ` Shreyansh Paliwal
@ 2026-02-22 14:53         ` Philip Oakley
  2026-02-22 15:00         ` D. Ben Knoble
  1 sibling, 0 replies; 27+ messages in thread
From: Philip Oakley @ 2026-02-22 14:53 UTC (permalink / raw)
  To: Shreyansh Paliwal, git; +Cc: ben.knoble, gitster

On 22/02/2026 14:03, Shreyansh Paliwal wrote:
>>> That makes sense, I tried it below.
>>> I also wondered whether, in addition to this, it might be helpful to warn on
>>> an invalid charset, and/or possibly fall back to UTF-8.
>>
>> Agreed on the first half of the statement, if we have an easy and
>> portable way to tell if a given random string names a valid charset.
>> I do not recommend to "fall back" to anything, if we are asking an
>> input from the user.
> 
> Following up on this, I tried adding a warning when the provided charset
> does not appear to be valid. Current flow is,
> 
>   Which 8bit encoding should I declare [UTF-8]? y

Perhaps swap around the 'Which-declare' to "Declare which' to to get
away from the obviousness of 'Which' being the classic y/n binary
question. Action first?

	Declare which 8bit encoding to use [default:UTF-8]?

Checking validity of the encoding is a reasonable follow on.

Philip

>   Are you sure you want to use <y> [y/N]? y
> 
> With the additional check, it becomes,
> 
>   Which 8bit encoding should I declare [default: UTF-8]? y
>   warning: 'y' does not appear to be a valid charset name.
>   Are you sure you want to use <y> [y/N]?
> 
> This uses find_encoding() from Perl’s Encode module to detect any
> unrecognized charset names.
> 
> Let me know what you think.
> Also, is there any new test that should be added for this change?
> 
> Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
> ---
>  git-send-email.perl | 23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/git-send-email.perl b/git-send-email.perl
> index cd4b316ddc..e62fa259ba 100755
> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -23,6 +23,7 @@
>  use Git::LoadCPAN::Error qw(:try);
>  use Git;
>  use Git::I18N;
> +use Encode qw(find_encoding);
>  
>  Getopt::Long::Configure qw/ pass_through /;
>  
> @@ -1044,9 +1045,25 @@ sub file_declares_8bit_cte {
>  	foreach my $f (sort keys %broken_encoding) {
>  		print "    $f\n";
>  	}
> -	$auto_8bit_encoding = ask(__("Which 8bit encoding should I declare [UTF-8]? "),
> -				  valid_re => qr/.{4}/, confirm_only => 1,
> -				  default => "UTF-8");
> +	while (1) {
> +		my $encoding = ask(__("Which 8bit encoding should I declare [default: UTF-8]? "),
> +			valid_re => qr/^\S+$/,
> +			default  => "UTF-8");
> +		next unless defined $encoding;
> +		if (find_encoding($encoding)) {
> +			$auto_8bit_encoding = $encoding;
> +			last;
> +		}
> +		printf STDERR __("warning: '%s' does not appear to be a valid charset name.\n"), $encoding;
> +		my $yesno = ask(
> +			sprintf(__("Are you sure you want to use <%s> [y/N]? "), $encoding),
> +			valid_re => qr/^(?:y|n)/i,
> +			default  => 'n');
> +		if (defined $yesno && $yesno =~ /^y/i) {
> +			$auto_8bit_encoding = $encoding;
> +			last;
> +		}
> +	}
>  }
>  
>  if (!$force) {


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] send-email: UTF-8 encoding in subject line
  2026-02-22 14:03       ` Shreyansh Paliwal
  2026-02-22 14:53         ` Philip Oakley
@ 2026-02-22 15:00         ` D. Ben Knoble
  2026-02-22 15:52           ` Shreyansh Paliwal
  1 sibling, 1 reply; 27+ messages in thread
From: D. Ben Knoble @ 2026-02-22 15:00 UTC (permalink / raw)
  To: Shreyansh Paliwal; +Cc: git, gitster

On Sun, Feb 22, 2026 at 9:07 AM Shreyansh Paliwal
<shreyanshpaliwalcmsmn@gmail.com> wrote:
>
> > > That makes sense, I tried it below.
> > > I also wondered whether, in addition to this, it might be helpful to warn on
> > > an invalid charset, and/or possibly fall back to UTF-8.
> >
> > Agreed on the first half of the statement, if we have an easy and
> > portable way to tell if a given random string names a valid charset.
> > I do not recommend to "fall back" to anything, if we are asking an
> > input from the user.
>
> Following up on this, I tried adding a warning when the provided charset
> does not appear to be valid. Current flow is,
>
>   Which 8bit encoding should I declare [UTF-8]? y
>   Are you sure you want to use <y> [y/N]? y
>
> With the additional check, it becomes,
>
>   Which 8bit encoding should I declare [default: UTF-8]? y
>   warning: 'y' does not appear to be a valid charset name.
>   Are you sure you want to use <y> [y/N]?
>
> This uses find_encoding() from Perl’s Encode module to detect any
> unrecognized charset names.
>
> Let me know what you think.
> Also, is there any new test that should be added for this change?
>
> Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
> ---
>  git-send-email.perl | 23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/git-send-email.perl b/git-send-email.perl
> index cd4b316ddc..e62fa259ba 100755
> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -23,6 +23,7 @@
>  use Git::LoadCPAN::Error qw(:try);
>  use Git;
>  use Git::I18N;
> +use Encode qw(find_encoding);
>
>  Getopt::Long::Configure qw/ pass_through /;
>
> @@ -1044,9 +1045,25 @@ sub file_declares_8bit_cte {
>         foreach my $f (sort keys %broken_encoding) {
>                 print "    $f\n";
>         }
> -       $auto_8bit_encoding = ask(__("Which 8bit encoding should I declare [UTF-8]? "),
> -                                 valid_re => qr/.{4}/, confirm_only => 1,
> -                                 default => "UTF-8");
> +       while (1) {
> +               my $encoding = ask(__("Which 8bit encoding should I declare [default: UTF-8]? "),
> +                       valid_re => qr/^\S+$/,
> +                       default  => "UTF-8");

Here we change things, right?

- The original validation is "at least 4 characters", the new
validation is "at least one non-blank." I'm not sure why we'd prefer
one or the other, frankly. The original goes to 852a15d748
(send-email: ask confirmation if given encoding name is very short,
2015-02-13), which is motivated by the same problem we're discussing
here!
- We get rid of confirm_only, since we're about to roll our own
confirmation below:

> +               next unless defined $encoding;
> +               if (find_encoding($encoding)) {
> +                       $auto_8bit_encoding = $encoding;
> +                       last;
> +               }
> +               printf STDERR __("warning: '%s' does not appear to be a valid charset name.\n"), $encoding;
> +               my $yesno = ask(
> +                       sprintf(__("Are you sure you want to use <%s> [y/N]? "), $encoding),
> +                       valid_re => qr/^(?:y|n)/i,
> +                       default  => 'n');

…which might want refactored a bit so it can stay close to the original? idk.

> +               if (defined $yesno && $yesno =~ /^y/i) {
> +                       $auto_8bit_encoding = $encoding;
> +                       last;
> +               }
> +       }
>  }
>
>  if (!$force) {
> --
> 2.53.0



-- 
D. Ben Knoble

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] send-email: UTF-8 encoding in subject line
  2026-02-22 15:00         ` D. Ben Knoble
@ 2026-02-22 15:52           ` Shreyansh Paliwal
  2026-02-23 21:38             ` Ben Knoble
  0 siblings, 1 reply; 27+ messages in thread
From: Shreyansh Paliwal @ 2026-02-22 15:52 UTC (permalink / raw)
  To: git; +Cc: ben.knoble, gitster

> On Sun, Feb 22, 2026 at 9:07 AM Shreyansh Paliwal
> <shreyanshpaliwalcmsmn@gmail.com> wrote:
> >
> > > > That makes sense, I tried it below.
> > > > I also wondered whether, in addition to this, it might be helpful to warn on
> > > > an invalid charset, and/or possibly fall back to UTF-8.
> > >
> > > Agreed on the first half of the statement, if we have an easy and
> > > portable way to tell if a given random string names a valid charset.
> > > I do not recommend to "fall back" to anything, if we are asking an
> > > input from the user.
> >
> > Following up on this, I tried adding a warning when the provided charset
> > does not appear to be valid. Current flow is,
> >
> >   Which 8bit encoding should I declare [UTF-8]? y
> >   Are you sure you want to use <y> [y/N]? y
> >
> > With the additional check, it becomes,
> >
> >   Which 8bit encoding should I declare [default: UTF-8]? y
> >   warning: 'y' does not appear to be a valid charset name.
> >   Are you sure you want to use <y> [y/N]?
> >
> > This uses find_encoding() from Perl’s Encode module to detect any
> > unrecognized charset names.
> >
> > Let me know what you think.
> > Also, is there any new test that should be added for this change?
> >
> > Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
> > ---
> >  git-send-email.perl | 23 ++++++++++++++++++++---
> >  1 file changed, 20 insertions(+), 3 deletions(-)
> >
> > diff --git a/git-send-email.perl b/git-send-email.perl
> > index cd4b316ddc..e62fa259ba 100755
> > --- a/git-send-email.perl
> > +++ b/git-send-email.perl
> > @@ -23,6 +23,7 @@
> >  use Git::LoadCPAN::Error qw(:try);
> >  use Git;
> >  use Git::I18N;
> > +use Encode qw(find_encoding);
> >
> >  Getopt::Long::Configure qw/ pass_through /;
> >
> > @@ -1044,9 +1045,25 @@ sub file_declares_8bit_cte {
> >         foreach my $f (sort keys %broken_encoding) {
> >                 print "    $f\n";
> >         }
> > -       $auto_8bit_encoding = ask(__("Which 8bit encoding should I declare [UTF-8]? "),
> > -                                 valid_re => qr/.{4}/, confirm_only => 1,
> > -                                 default => "UTF-8");
> > +       while (1) {
> > +               my $encoding = ask(__("Which 8bit encoding should I declare [default: UTF-8]? "),
> > +                       valid_re => qr/^\S+$/,
> > +                       default  => "UTF-8");
>
> Here we change things, right?
>
> - The original validation is "at least 4 characters", the new
> validation is "at least one non-blank." I'm not sure why we'd prefer
> one or the other, frankly. The original goes to 852a15d748
> (send-email: ask confirmation if given encoding name is very short,
> 2015-02-13), which is motivated by the same problem we're discussing
> here!

I see.
My understanding of the earlier change (852a15d748) is that the
length check was intended as a heuristic check to catch obviously invalid
inputs like "y" and trigger an extra confirmation based on the fact that
charset names would be at least 4 letters.

With the additional find_encoding() check, the validation becomes semantic
rather than length-based, recognized charset names are accepted directly,
while unrecognized ones trigger a warning and still require explicit
confirmation. The relaxed regex (at least one non-blank) is only meant to
ensure we receive some non-empty input before passing it to find_encoding().

> - We get rid of confirm_only, since we're about to roll our own
> confirmation below:
>
> > +               next unless defined $encoding;
> > +               if (find_encoding($encoding)) {
> > +                       $auto_8bit_encoding = $encoding;
> > +                       last;
> > +               }
> > +               printf STDERR __("warning: '%s' does not appear to be a valid charset name.\n"), $encoding;
> > +               my $yesno = ask(
> > +                       sprintf(__("Are you sure you want to use <%s> [y/N]? "), $encoding),
> > +                       valid_re => qr/^(?:y|n)/i,
> > +                       default  => 'n');
>
> …which might want refactored a bit so it can stay close to the original? idk.
>

Actually the flow needed to change slightly to insert the validity warning
before the final confirmation step. Since ask() handles confirmation internally
using confrim_only and is used in multiple places, it seemed simpler to keep the
additional confirmation local here rather than modifying ask() itself.

Let me know what you think.

Best,
Shreyansh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] send-email: UTF-8 encoding in subject line
  2026-02-22 15:52           ` Shreyansh Paliwal
@ 2026-02-23 21:38             ` Ben Knoble
  2026-02-24  7:55               ` [GSOC] Discuss: Refactoring in order to reduce global state Shreyansh Paliwal
  0 siblings, 1 reply; 27+ messages in thread
From: Ben Knoble @ 2026-02-23 21:38 UTC (permalink / raw)
  To: Shreyansh Paliwal; +Cc: git, gitster


> Le 22 févr. 2026 à 10:56, Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com> a écrit :
> 
> 
>> 
>>> On Sun, Feb 22, 2026 at 9:07 AM Shreyansh Paliwal
>>> <shreyanshpaliwalcmsmn@gmail.com> wrote:
>>> 
>>>>> That makes sense, I tried it below.
>>>>> I also wondered whether, in addition to this, it might be helpful to warn on
>>>>> an invalid charset, and/or possibly fall back to UTF-8.
>>>> 
>>>> Agreed on the first half of the statement, if we have an easy and
>>>> portable way to tell if a given random string names a valid charset.
>>>> I do not recommend to "fall back" to anything, if we are asking an
>>>> input from the user.
>>> 
>>> Following up on this, I tried adding a warning when the provided charset
>>> does not appear to be valid. Current flow is,
>>> 
>>>  Which 8bit encoding should I declare [UTF-8]? y
>>>  Are you sure you want to use <y> [y/N]? y
>>> 
>>> With the additional check, it becomes,
>>> 
>>>  Which 8bit encoding should I declare [default: UTF-8]? y
>>>  warning: 'y' does not appear to be a valid charset name.
>>>  Are you sure you want to use <y> [y/N]?
>>> 
>>> This uses find_encoding() from Perl’s Encode module to detect any
>>> unrecognized charset names.
>>> 
>>> Let me know what you think.
>>> Also, is there any new test that should be added for this change?
>>> 
>>> Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
>>> ---
>>> git-send-email.perl | 23 ++++++++++++++++++++---
>>> 1 file changed, 20 insertions(+), 3 deletions(-)
>>> 
>>> diff --git a/git-send-email.perl b/git-send-email.perl
>>> index cd4b316ddc..e62fa259ba 100755
>>> --- a/git-send-email.perl
>>> +++ b/git-send-email.perl
>>> @@ -23,6 +23,7 @@
>>> use Git::LoadCPAN::Error qw(:try);
>>> use Git;
>>> use Git::I18N;
>>> +use Encode qw(find_encoding);
>>> 
>>> Getopt::Long::Configure qw/ pass_through /;
>>> 
>>> @@ -1044,9 +1045,25 @@ sub file_declares_8bit_cte {
>>>        foreach my $f (sort keys %broken_encoding) {
>>>                print "    $f\n";
>>>        }
>>> -       $auto_8bit_encoding = ask(__("Which 8bit encoding should I declare [UTF-8]? "),
>>> -                                 valid_re => qr/.{4}/, confirm_only => 1,
>>> -                                 default => "UTF-8");
>>> +       while (1) {
>>> +               my $encoding = ask(__("Which 8bit encoding should I declare [default: UTF-8]? "),
>>> +                       valid_re => qr/^\S+$/,
>>> +                       default  => "UTF-8");
>> 
>> Here we change things, right?
>> 
>> - The original validation is "at least 4 characters", the new
>> validation is "at least one non-blank." I'm not sure why we'd prefer
>> one or the other, frankly. The original goes to 852a15d748
>> (send-email: ask confirmation if given encoding name is very short,
>> 2015-02-13), which is motivated by the same problem we're discussing
>> here!
> 
> I see.
> My understanding of the earlier change (852a15d748) is that the
> length check was intended as a heuristic check to catch obviously invalid
> inputs like "y" and trigger an extra confirmation based on the fact that
> charset names would be at least 4 letters.
> 
> With the additional find_encoding() check, the validation becomes semantic
> rather than length-based, recognized charset names are accepted directly,
> while unrecognized ones trigger a warning and still require explicit
> confirmation. The relaxed regex (at least one non-blank) is only meant to
> ensure we receive some non-empty input before passing it to find_encoding().
> 
>> - We get rid of confirm_only, since we're about to roll our own
>> confirmation below:
>> 
>>> +               next unless defined $encoding;
>>> +               if (find_encoding($encoding)) {
>>> +                       $auto_8bit_encoding = $encoding;
>>> +                       last;
>>> +               }
>>> +               printf STDERR __("warning: '%s' does not appear to be a valid charset name.\n"), $encoding;
>>> +               my $yesno = ask(
>>> +                       sprintf(__("Are you sure you want to use <%s> [y/N]? "), $encoding),
>>> +                       valid_re => qr/^(?:y|n)/i,
>>> +                       default  => 'n');
>> 
>> …which might want refactored a bit so it can stay close to the original? idk.
>> 
> 
> Actually the flow needed to change slightly to insert the validity warning
> before the final confirmation step. Since ask() handles confirmation internally
> using confrim_only and is used in multiple places, it seemed simpler to keep the
> additional confirmation local here rather than modifying ask() itself.
> 
> Let me know what you think.
> 
> Best,
> Shreyansh

Ah, my mistake for being ambiguous. I meant:

The code is similar enough to the original that perhaps a helper can be introduced, or at least we should keep the equivalent strings together to help those who change one. 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GSOC] Discuss: Refactoring in order to reduce global state
  2026-02-23 21:38             ` Ben Knoble
@ 2026-02-24  7:55               ` Shreyansh Paliwal
  0 siblings, 0 replies; 27+ messages in thread
From: Shreyansh Paliwal @ 2026-02-24  7:55 UTC (permalink / raw)
  To: git; +Cc: ben.knoble, gitster

> > Le 22 févr. 2026 à 10:56, Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com> a écrit :
> >
> > 
> >>
> >>> On Sun, Feb 22, 2026 at 9:07 AM Shreyansh Paliwal
> >>> <shreyanshpaliwalcmsmn@gmail.com> wrote:
> >>>
> >>>>> That makes sense, I tried it below.
> >>>>> I also wondered whether, in addition to this, it might be helpful to warn on
> >>>>> an invalid charset, and/or possibly fall back to UTF-8.
> >>>>
> >>>> Agreed on the first half of the statement, if we have an easy and
> >>>> portable way to tell if a given random string names a valid charset.
> >>>> I do not recommend to "fall back" to anything, if we are asking an
> >>>> input from the user.
> >>>
> >>> Following up on this, I tried adding a warning when the provided charset
> >>> does not appear to be valid. Current flow is,
> >>>
> >>>  Which 8bit encoding should I declare [UTF-8]? y
> >>>  Are you sure you want to use <y> [y/N]? y
> >>>
> >>> With the additional check, it becomes,
> >>>
> >>>  Which 8bit encoding should I declare [default: UTF-8]? y
> >>>  warning: 'y' does not appear to be a valid charset name.
> >>>  Are you sure you want to use <y> [y/N]?
> >>>
> >>> This uses find_encoding() from Perl’s Encode module to detect any
> >>> unrecognized charset names.
> >>>
> >>> Let me know what you think.
> >>> Also, is there any new test that should be added for this change?
> >>>
> >>> Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
> >>> ---
> >>> git-send-email.perl | 23 ++++++++++++++++++++---
> >>> 1 file changed, 20 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/git-send-email.perl b/git-send-email.perl
> >>> index cd4b316ddc..e62fa259ba 100755
> >>> --- a/git-send-email.perl
> >>> +++ b/git-send-email.perl
> >>> @@ -23,6 +23,7 @@
> >>> use Git::LoadCPAN::Error qw(:try);
> >>> use Git;
> >>> use Git::I18N;
> >>> +use Encode qw(find_encoding);
> >>>
> >>> Getopt::Long::Configure qw/ pass_through /;
> >>>
> >>> @@ -1044,9 +1045,25 @@ sub file_declares_8bit_cte {
> >>>        foreach my $f (sort keys %broken_encoding) {
> >>>                print "    $f\n";
> >>>        }
> >>> -       $auto_8bit_encoding = ask(__("Which 8bit encoding should I declare [UTF-8]? "),
> >>> -                                 valid_re => qr/.{4}/, confirm_only => 1,
> >>> -                                 default => "UTF-8");
> >>> +       while (1) {
> >>> +               my $encoding = ask(__("Which 8bit encoding should I declare [default: UTF-8]? "),
> >>> +                       valid_re => qr/^\S+$/,
> >>> +                       default  => "UTF-8");
> >>
> >> Here we change things, right?
> >>
> >> - The original validation is "at least 4 characters", the new
> >> validation is "at least one non-blank." I'm not sure why we'd prefer
> >> one or the other, frankly. The original goes to 852a15d748
> >> (send-email: ask confirmation if given encoding name is very short,
> >> 2015-02-13), which is motivated by the same problem we're discussing
> >> here!
> >
> > I see.
> > My understanding of the earlier change (852a15d748) is that the
> > length check was intended as a heuristic check to catch obviously invalid
> > inputs like "y" and trigger an extra confirmation based on the fact that
> > charset names would be at least 4 letters.
> >
> > With the additional find_encoding() check, the validation becomes semantic
> > rather than length-based, recognized charset names are accepted directly,
> > while unrecognized ones trigger a warning and still require explicit
> > confirmation. The relaxed regex (at least one non-blank) is only meant to
> > ensure we receive some non-empty input before passing it to find_encoding().
> >
> >> - We get rid of confirm_only, since we're about to roll our own
> >> confirmation below:
> >>
> >>> +               next unless defined $encoding;
> >>> +               if (find_encoding($encoding)) {
> >>> +                       $auto_8bit_encoding = $encoding;
> >>> +                       last;
> >>> +               }
> >>> +               printf STDERR __("warning: '%s' does not appear to be a valid charset name.\n"), $encoding;
> >>> +               my $yesno = ask(
> >>> +                       sprintf(__("Are you sure you want to use <%s> [y/N]? "), $encoding),
> >>> +                       valid_re => qr/^(?:y|n)/i,
> >>> +                       default  => 'n');
> >>
> >> …which might want refactored a bit so it can stay close to the original? idk.
> >>
> >
> > Actually the flow needed to change slightly to insert the validity warning
> > before the final confirmation step. Since ask() handles confirmation internally
> > using confrim_only and is used in multiple places, it seemed simpler to keep the
> > additional confirmation local here rather than modifying ask() itself.
> >
> > Let me know what you think.
> >
> > Best,
> > Shreyansh
>
> Ah, my mistake for being ambiguous. I meant:
>
> The code is similar enough to the original that perhaps a helper can be
> introduced, or at least we should keep the equivalent strings together to
> help those who change one.

Thanks for clarifying, that makes sense.
I'll refactor and send a revised patch on this.

Best,
Shreyansh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] send-email: validate charset name in 8bit encoding prompt
  2026-02-20 14:50 [RFC] send-email: UTF-8 encoding in subject line Shreyansh Paliwal
  2026-02-21  2:28 ` Ben Knoble
@ 2026-02-24 14:33 ` Shreyansh Paliwal
  2026-02-24 21:11   ` Junio C Hamano
                     ` (3 more replies)
  1 sibling, 4 replies; 27+ messages in thread
From: Shreyansh Paliwal @ 2026-02-24 14:33 UTC (permalink / raw)
  To: git; +Cc: ben.knoble, gitster, philipoakley, Shreyansh Paliwal

When a non-ASCII character is detected in the body or subject of the email
the user is prompted with,

  Which 8bit encoding should I declare [UTF-8]? foo

After this the input string is validated by the regex, based on the fact
that the charset string will be minimum 4 characters [1]. If the string is
more than 4 letters the email is sent, if not then a second prompt to
confirm is asked to the user,

  Are you sure you want to use <foo> [y/N]? y

This relies on a length based regex heuristic check to validate the user
input, and can allow invalid charset names to pass if the input is greater
than 4 characters.

Add a semantic validation of the charset name using the
Encode::find_encoding() module of perl. If the encoding is not recognized,
warn the user and ask for confirmation before proceeding. After this
validation the lenght based validation becomes redundant and also breaks
flow, so change the regex of valid input to any non blank string.

Additionally, the wording of the first prompt can confuse the user if not
read properly or under any default assumptions for a yes/no prompt. Change
the wording to make it explicitly clear to the user that the prompt needs a
string input, UTF-8 being the default.

The intended flow is,

  Declare which 8bit encoding to use [default: UTF-8]? foobar
  warning: 'foobar' does not appear to be a valid charset name.
  Are you sure you want to use <foobar> [y/N]?

[1]- https://github.com/git/git/commit/852a15d748034eec87adbee73a72689c8936fb8b

Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
---
 git-send-email.perl   | 15 ++++++++++++---
 t/t9001-send-email.sh |  2 +-
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index cd4b316ddc..dc4e5418d3 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -23,6 +23,7 @@
 use Git::LoadCPAN::Error qw(:try);
 use Git;
 use Git::I18N;
+use Encode qw(find_encoding);
 
 Getopt::Long::Configure qw/ pass_through /;
 
@@ -987,6 +988,7 @@ sub get_patch_subject {
 sub ask {
 	my ($prompt, %arg) = @_;
 	my $valid_re = $arg{valid_re};
+	my $warn_invalid = $arg{warn_invalid};
 	my $default = $arg{default};
 	my $confirm_only = $arg{confirm_only};
 	my $resp;
@@ -1005,7 +1007,13 @@ sub ask {
 			return $default;
 		}
 		if (!defined $valid_re or $resp =~ /$valid_re/) {
-			return $resp;
+			if ($warn_invalid) {
+				if (find_encoding($resp))
+					return $resp;
+				else
+					printf STDERR __("warning: '%s' does not appear to be a valid charset name.\n"), $resp;
+			} else
+				return $resp;
 		}
 		if ($confirm_only) {
 			my $yesno = $term->readline(
@@ -1044,8 +1052,9 @@ sub file_declares_8bit_cte {
 	foreach my $f (sort keys %broken_encoding) {
 		print "    $f\n";
 	}
-	$auto_8bit_encoding = ask(__("Which 8bit encoding should I declare [UTF-8]? "),
-				  valid_re => qr/.{4}/, confirm_only => 1,
+	$auto_8bit_encoding = ask(__("Declare which 8bit encoding to use [default: UTF-8]? "),
+				  valid_re => qr/^\S+$/, confirm_only => 1,
+				  warn_invalid => 1,
 				  default => "UTF-8");
 }
 
diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index e56e0c8d77..24f6c76aee 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -1691,7 +1691,7 @@ test_expect_success $PREREQ 'asks about and fixes 8bit encodings' '
 			email-using-8bit >stdout &&
 	grep "do not declare a Content-Transfer-Encoding" stdout &&
 	grep email-using-8bit stdout &&
-	grep "Which 8bit encoding" stdout &&
+	grep "Declare which 8bit encoding to use" stdout &&
 	grep -E "Content|MIME" msgtxt1 >actual &&
 	test_cmp content-type-decl actual
 '
-- 
2.53.0.155.g35e93594f7.dirty

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] send-email: validate charset name in 8bit encoding prompt
  2026-02-24 14:33 ` [PATCH] send-email: validate charset name in 8bit encoding prompt Shreyansh Paliwal
@ 2026-02-24 21:11   ` Junio C Hamano
  2026-02-24 21:37   ` [PATCH v2] " Shreyansh Paliwal
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2026-02-24 21:11 UTC (permalink / raw)
  To: Shreyansh Paliwal; +Cc: git, ben.knoble, philipoakley

Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com> writes:

> +			if ($warn_invalid) {
> +				if (find_encoding($resp))
> +					return $resp;
> +				else
> +					printf STDERR __("warning: '%s' does not appear to be a valid charset name.\n"), $resp;
> +			} else
> +				return $resp;

This is not C but Perl.

 git-send-email.perl | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git i/git-send-email.perl w/git-send-email.perl
index dc4e5418d3..15387ac377 100755
--- i/git-send-email.perl
+++ w/git-send-email.perl
@@ -1008,12 +1008,14 @@ sub ask {
 		}
 		if (!defined $valid_re or $resp =~ /$valid_re/) {
 			if ($warn_invalid) {
-				if (find_encoding($resp))
+				if (find_encoding($resp)) {
 					return $resp;
-				else
+				} else {
 					printf STDERR __("warning: '%s' does not appear to be a valid charset name.\n"), $resp;
-			} else
+				}
+			} else {
 				return $resp;
+			}
 		}
 		if ($confirm_only) {
 			my $yesno = $term->readline(

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2] send-email: validate charset name in 8bit encoding prompt
  2026-02-24 14:33 ` [PATCH] send-email: validate charset name in 8bit encoding prompt Shreyansh Paliwal
  2026-02-24 21:11   ` Junio C Hamano
@ 2026-02-24 21:37   ` Shreyansh Paliwal
  2026-02-24 22:06     ` Junio C Hamano
  2026-02-25 16:37     ` D. Ben Knoble
  2026-02-26 16:16   ` [PATCH v3] " Shreyansh Paliwal
  2026-02-28 11:20   ` [PATCH v4] " Shreyansh Paliwal
  3 siblings, 2 replies; 27+ messages in thread
From: Shreyansh Paliwal @ 2026-02-24 21:37 UTC (permalink / raw)
  To: git; +Cc: ben.knoble, gitster, philipoakley, Shreyansh Paliwal

When a non-ASCII character is detected in the body or subject of the email
the user is prompted with,

  Which 8bit encoding should I declare [UTF-8]? foo

After this the input string is validated by the regex, based on the fact
that the charset string will be minimum 4 characters [1]. If the string is
more than 4 letters the email is sent, if not then a second prompt to
confirm is asked to the user,

  Are you sure you want to use <foo> [y/N]? y

This relies on a length based regex heuristic check to validate the user
input, and can allow clearly invalid charset names to pass if the input is
greater than 4 characters.

Add a semantic validation of the charset name using the
Encode::find_encoding() module of perl. If the encoding is not recognized,
warn the user and ask for confirmation before proceeding. After this
validation the lenght based validation becomes redundant and also breaks
flow, so change the regex of valid input to any non blank string.

Additionally, the wording of the first prompt can confuse the user if not
read properly or under any default assumptions for a yes/no prompt. Change
the wording to make it explicitly clear to the user that the prompt needs a
string input, UTF-8 being the default.

The intended flow is,

  Declare which 8bit encoding to use [default: UTF-8]? foobar
  warning: 'foobar' does not appear to be a valid charset name.
  Are you sure you want to use <foobar> [y/N]?

[1]- https://github.com/git/git/commit/852a15d748034eec87adbee73a72689c8936fb8b

Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
---
Changes in v2:
 - Added braces in if-else block.

 git-send-email.perl   | 17 ++++++++++++++---
 t/t9001-send-email.sh |  2 +-
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index cd4b316ddc..15387ac377 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -23,6 +23,7 @@
 use Git::LoadCPAN::Error qw(:try);
 use Git;
 use Git::I18N;
+use Encode qw(find_encoding);

 Getopt::Long::Configure qw/ pass_through /;

@@ -987,6 +988,7 @@ sub get_patch_subject {
 sub ask {
 	my ($prompt, %arg) = @_;
 	my $valid_re = $arg{valid_re};
+	my $warn_invalid = $arg{warn_invalid};
 	my $default = $arg{default};
 	my $confirm_only = $arg{confirm_only};
 	my $resp;
@@ -1005,7 +1007,15 @@ sub ask {
 			return $default;
 		}
 		if (!defined $valid_re or $resp =~ /$valid_re/) {
-			return $resp;
+			if ($warn_invalid) {
+				if (find_encoding($resp)) {
+					return $resp;
+				} else {
+					printf STDERR __("warning: '%s' does not appear to be a valid charset name.\n"), $resp;
+				}
+			} else {
+				return $resp;
+			}
 		}
 		if ($confirm_only) {
 			my $yesno = $term->readline(
@@ -1044,8 +1054,9 @@ sub file_declares_8bit_cte {
 	foreach my $f (sort keys %broken_encoding) {
 		print "    $f\n";
 	}
-	$auto_8bit_encoding = ask(__("Which 8bit encoding should I declare [UTF-8]? "),
-				  valid_re => qr/.{4}/, confirm_only => 1,
+	$auto_8bit_encoding = ask(__("Declare which 8bit encoding to use [default: UTF-8]? "),
+				  valid_re => qr/^\S+$/, confirm_only => 1,
+				  warn_invalid => 1,
 				  default => "UTF-8");
 }

diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index e56e0c8d77..24f6c76aee 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -1691,7 +1691,7 @@ test_expect_success $PREREQ 'asks about and fixes 8bit encodings' '
 			email-using-8bit >stdout &&
 	grep "do not declare a Content-Transfer-Encoding" stdout &&
 	grep email-using-8bit stdout &&
-	grep "Which 8bit encoding" stdout &&
+	grep "Declare which 8bit encoding to use" stdout &&
 	grep -E "Content|MIME" msgtxt1 >actual &&
 	test_cmp content-type-decl actual
 '

Range-diff against v1:
1:  70fa4d2899 ! 1:  954c1dae9f send-email: validate charset name in 8bit encoding prompt
    @@ git-send-email.perl: sub ask {
      		if (!defined $valid_re or $resp =~ /$valid_re/) {
     -			return $resp;
     +			if ($warn_invalid) {
    -+				if (find_encoding($resp))
    ++				if (find_encoding($resp)) {
     +					return $resp;
    -+				else
    ++				} else {
     +					printf STDERR __("warning: '%s' does not appear to be a valid charset name.\n"), $resp;
    -+			} else
    ++				}
    ++			} else {
     +				return $resp;
    ++			}
      		}
      		if ($confirm_only) {
      			my $yesno = $term->readline(
--
2.53.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] send-email: validate charset name in 8bit encoding prompt
  2026-02-24 21:37   ` [PATCH v2] " Shreyansh Paliwal
@ 2026-02-24 22:06     ` Junio C Hamano
  2026-02-24 22:20       ` Shreyansh Paliwal
  2026-02-25 16:37     ` D. Ben Knoble
  1 sibling, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2026-02-24 22:06 UTC (permalink / raw)
  To: Shreyansh Paliwal; +Cc: git, ben.knoble, philipoakley

Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com> writes:

> When a non-ASCII character is detected in the body or subject of the email
> the user is prompted with,
>
>   Which 8bit encoding should I declare [UTF-8]? foo
>
> After this the input string is validated by the regex, based on the fact
> that the charset string will be minimum 4 characters [1]. If the string is
> more than 4 letters the email is sent, if not then a second prompt to
> confirm is asked to the user,
>
>   Are you sure you want to use <foo> [y/N]? y
>
> This relies on a length based regex heuristic check to validate the user
> input, and can allow clearly invalid charset names to pass if the input is
> greater than 4 characters.
>
> Add a semantic validation of the charset name using the
> Encode::find_encoding() module of perl. If the encoding is not recognized,
> warn the user and ask for confirmation before proceeding. After this
> validation the lenght based validation becomes redundant and also breaks
> flow, so change the regex of valid input to any non blank string.
>
> Additionally, the wording of the first prompt can confuse the user if not
> read properly or under any default assumptions for a yes/no prompt. Change
> the wording to make it explicitly clear to the user that the prompt needs a
> string input, UTF-8 being the default.
>
> The intended flow is,
>
>   Declare which 8bit encoding to use [default: UTF-8]? foobar
>   warning: 'foobar' does not appear to be a valid charset name.
>   Are you sure you want to use <foobar> [y/N]?
>
> [1]- https://github.com/git/git/commit/852a15d748034eec87adbee73a72689c8936fb8b
>
> Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
> ---
> Changes in v2:
>  - Added braces in if-else block.
>
>  git-send-email.perl   | 17 ++++++++++++++---
>  t/t9001-send-email.sh |  2 +-
>  2 files changed, 15 insertions(+), 4 deletions(-)

Curious.  This change to t9001 was there even in the previous
iteration that did not even work.  How did you test it?

Will replace.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] send-email: validate charset name in 8bit encoding prompt
  2026-02-24 22:06     ` Junio C Hamano
@ 2026-02-24 22:20       ` Shreyansh Paliwal
  0 siblings, 0 replies; 27+ messages in thread
From: Shreyansh Paliwal @ 2026-02-24 22:20 UTC (permalink / raw)
  To: git; +Cc: ben.knoble, gitster, philipoakley

> Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com> writes:
>
> > When a non-ASCII character is detected in the body or subject of the email
> > the user is prompted with,
> >
> >   Which 8bit encoding should I declare [UTF-8]? foo
> >
> > After this the input string is validated by the regex, based on the fact
> > that the charset string will be minimum 4 characters [1]. If the string is
> > more than 4 letters the email is sent, if not then a second prompt to
> > confirm is asked to the user,
> >
> >   Are you sure you want to use <foo> [y/N]? y
> >
> > This relies on a length based regex heuristic check to validate the user
> > input, and can allow clearly invalid charset names to pass if the input is
> > greater than 4 characters.
> >
> > Add a semantic validation of the charset name using the
> > Encode::find_encoding() module of perl. If the encoding is not recognized,
> > warn the user and ask for confirmation before proceeding. After this
> > validation the lenght based validation becomes redundant and also breaks
> > flow, so change the regex of valid input to any non blank string.
> >
> > Additionally, the wording of the first prompt can confuse the user if not
> > read properly or under any default assumptions for a yes/no prompt. Change
> > the wording to make it explicitly clear to the user that the prompt needs a
> > string input, UTF-8 being the default.
> >
> > The intended flow is,
> >
> >   Declare which 8bit encoding to use [default: UTF-8]? foobar
> >   warning: 'foobar' does not appear to be a valid charset name.
> >   Are you sure you want to use <foobar> [y/N]?
> >
> > [1]- https://github.com/git/git/commit/852a15d748034eec87adbee73a72689c8936fb8b
> >
> > Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
> > ---
> > Changes in v2:
> >  - Added braces in if-else block.
> >
> >  git-send-email.perl   | 17 ++++++++++++++---
> >  t/t9001-send-email.sh |  2 +-
> >  2 files changed, 15 insertions(+), 4 deletions(-)
>
> Curious.  This change to t9001 was there even in the previous
> iteration that did not even work.  How did you test it?

Initially I had the braces in place, when I made the wording change to the failing
test, so all the tests passed. But just before sending the patch, I saw that the
indentation looked a bit off in the annotated git send-email, so I fixed that and
also removed what I thought were unnecessary braces, but I neglected to rerun the
tests after that. My bad.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] send-email: validate charset name in 8bit encoding prompt
  2026-02-24 21:37   ` [PATCH v2] " Shreyansh Paliwal
  2026-02-24 22:06     ` Junio C Hamano
@ 2026-02-25 16:37     ` D. Ben Knoble
  2026-02-26 17:32       ` Shreyansh Paliwal
  1 sibling, 1 reply; 27+ messages in thread
From: D. Ben Knoble @ 2026-02-25 16:37 UTC (permalink / raw)
  To: Shreyansh Paliwal; +Cc: git, gitster, philipoakley

On Tue, Feb 24, 2026 at 4:39 PM Shreyansh Paliwal
<shreyanshpaliwalcmsmn@gmail.com> wrote:
>
> When a non-ASCII character is detected in the body or subject of the email
> the user is prompted with,
>
>   Which 8bit encoding should I declare [UTF-8]? foo
>
> After this the input string is validated by the regex, based on the fact
> that the charset string will be minimum 4 characters [1]. If the string is
> more than 4 letters the email is sent, if not then a second prompt to
> confirm is asked to the user,
>
>   Are you sure you want to use <foo> [y/N]? y
>
> This relies on a length based regex heuristic check to validate the user
> input, and can allow clearly invalid charset names to pass if the input is
> greater than 4 characters.
>
> Add a semantic validation of the charset name using the
> Encode::find_encoding() module of perl. If the encoding is not recognized,
> warn the user and ask for confirmation before proceeding. After this
> validation the lenght based validation becomes redundant and also breaks
> flow, so change the regex of valid input to any non blank string.
>
> Additionally, the wording of the first prompt can confuse the user if not
> read properly or under any default assumptions for a yes/no prompt. Change
> the wording to make it explicitly clear to the user that the prompt needs a
> string input, UTF-8 being the default.
>
> The intended flow is,
>
>   Declare which 8bit encoding to use [default: UTF-8]? foobar
>   warning: 'foobar' does not appear to be a valid charset name.
>   Are you sure you want to use <foobar> [y/N]?
>
> [1]- https://github.com/git/git/commit/852a15d748034eec87adbee73a72689c8936fb8b
>
> Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
> ---
> Changes in v2:
>  - Added braces in if-else block.
>
>  git-send-email.perl   | 17 ++++++++++++++---
>  t/t9001-send-email.sh |  2 +-
>  2 files changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/git-send-email.perl b/git-send-email.perl
> index cd4b316ddc..15387ac377 100755
> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -23,6 +23,7 @@
>  use Git::LoadCPAN::Error qw(:try);
>  use Git;
>  use Git::I18N;
> +use Encode qw(find_encoding);
>
>  Getopt::Long::Configure qw/ pass_through /;
>
> @@ -987,6 +988,7 @@ sub get_patch_subject {
>  sub ask {
>         my ($prompt, %arg) = @_;
>         my $valid_re = $arg{valid_re};
> +       my $warn_invalid = $arg{warn_invalid};
>         my $default = $arg{default};
>         my $confirm_only = $arg{confirm_only};
>         my $resp;
> @@ -1005,7 +1007,15 @@ sub ask {
>                         return $default;
>                 }
>                 if (!defined $valid_re or $resp =~ /$valid_re/) {
> -                       return $resp;
> +                       if ($warn_invalid) {
> +                               if (find_encoding($resp)) {
> +                                       return $resp;
> +                               } else {
> +                                       printf STDERR __("warning: '%s' does not appear to be a valid charset name.\n"), $resp;
> +                               }
> +                       } else {
> +                               return $resp;
> +                       }

I think this is asking "ask" to do too much, since only encoding
askers can use warn_invalid.

What I rather meant was to extract relevant helper procedures so that
open-coding ask around the encoding question would be easier to
maintain.

>                 }
>                 if ($confirm_only) {
>                         my $yesno = $term->readline(
> @@ -1044,8 +1054,9 @@ sub file_declares_8bit_cte {
>         foreach my $f (sort keys %broken_encoding) {
>                 print "    $f\n";
>         }
> -       $auto_8bit_encoding = ask(__("Which 8bit encoding should I declare [UTF-8]? "),
> -                                 valid_re => qr/.{4}/, confirm_only => 1,
> +       $auto_8bit_encoding = ask(__("Declare which 8bit encoding to use [default: UTF-8]? "),
> +                                 valid_re => qr/^\S+$/, confirm_only => 1,
> +                                 warn_invalid => 1,
>                                   default => "UTF-8");
>  }
>
> diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
> index e56e0c8d77..24f6c76aee 100755
> --- a/t/t9001-send-email.sh
> +++ b/t/t9001-send-email.sh
> @@ -1691,7 +1691,7 @@ test_expect_success $PREREQ 'asks about and fixes 8bit encodings' '
>                         email-using-8bit >stdout &&
>         grep "do not declare a Content-Transfer-Encoding" stdout &&
>         grep email-using-8bit stdout &&
> -       grep "Which 8bit encoding" stdout &&
> +       grep "Declare which 8bit encoding to use" stdout &&
>         grep -E "Content|MIME" msgtxt1 >actual &&
>         test_cmp content-type-decl actual
>  '
>
> Range-diff against v1:
> 1:  70fa4d2899 ! 1:  954c1dae9f send-email: validate charset name in 8bit encoding prompt
>     @@ git-send-email.perl: sub ask {
>                 if (!defined $valid_re or $resp =~ /$valid_re/) {
>      -                  return $resp;
>      +                  if ($warn_invalid) {
>     -+                          if (find_encoding($resp))
>     ++                          if (find_encoding($resp)) {
>      +                                  return $resp;
>     -+                          else
>     ++                          } else {
>      +                                  printf STDERR __("warning: '%s' does not appear to be a valid charset name.\n"), $resp;
>     -+                  } else
>     ++                          }
>     ++                  } else {
>      +                          return $resp;
>     ++                  }
>                 }
>                 if ($confirm_only) {
>                         my $yesno = $term->readline(
> --
> 2.53.0



-- 
D. Ben Knoble

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3] send-email: validate charset name in 8bit encoding prompt
  2026-02-24 14:33 ` [PATCH] send-email: validate charset name in 8bit encoding prompt Shreyansh Paliwal
  2026-02-24 21:11   ` Junio C Hamano
  2026-02-24 21:37   ` [PATCH v2] " Shreyansh Paliwal
@ 2026-02-26 16:16   ` Shreyansh Paliwal
  2026-02-26 18:45     ` Junio C Hamano
  2026-02-28 11:20   ` [PATCH v4] " Shreyansh Paliwal
  3 siblings, 1 reply; 27+ messages in thread
From: Shreyansh Paliwal @ 2026-02-26 16:16 UTC (permalink / raw)
  To: git; +Cc: ben.knoble, gitster, philipoakley, Shreyansh Paliwal

When a non-ASCII character is detected in the body or subject of the email
the user is prompted with,

        Which 8bit encoding should I declare [UTF-8]? foo

After this the input string is validated by the regex, based on the fact
that the charset string will be minimum 4 characters [1]. If the string is
more than 4 letters the email is sent, if not then a second prompt to
confirm is asked to the user,

        Are you sure you want to use <foo> [y/N]? y

This relies on a length based regex heuristic check to validate the user
input, and can allow clearly invalid charset names to pass if the input is
greater than 4 characters.

Add a semantic validation of the charset name using the
Encode::find_encoding() module of perl. If the encoding is not recognized,
warn the user and ask for confirmation before proceeding. After this
validation the lenght based validation becomes redundant and also breaks
flow, so change the regex of valid input to any non blank string.

Introduce a dedicated helper for confirmation handling that can be reused
both by ask() and the custom 8bit prompt flow. Make the encoding warning
logic specific to the 8bit prompt, this reduces the load on ask(), and
improves maintainability.

Additionally, the wording of the first prompt can confuse the user if not
read properly or under any default assumptions for a yes/no prompt. Change
the wording to make it explicitly clear to the user that the prompt needs a
string input, UTF-8 being the default.

The intended flow is,

        Declare which 8bit encoding to use [default: UTF-8]? foobar
        warning: 'foobar' does not appear to be a valid charset name.
        Are you sure you want to use <foobar> [y/N]?

[1]- https://github.com/git/git/commit/852a15d748034eec87adbee73a72689c8936fb8b

Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
---
Changes in v2:
 - Added a helper function confirm_ask() to handle yes/no confirmation prompts.
 - Added the validation and warning logic in the 8bit prompt instead of ask().

 git-send-email.perl   | 36 +++++++++++++++++++++++++++++-------
 t/t9001-send-email.sh |  2 +-
 2 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index cd4b316ddc..3230b80701 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -23,6 +23,7 @@
 use Git::LoadCPAN::Error qw(:try);
 use Git;
 use Git::I18N;
+use Encode qw(find_encoding);

 Getopt::Long::Configure qw/ pass_through /;

@@ -984,6 +985,18 @@ sub get_patch_subject {
 	}
 }

+sub confirm_ask {
+	my ($resp) = @_;
+	my $term = term();
+	return 0
+		unless defined $term->IN and defined fileno($term->IN) and
+		       defined $term->OUT and defined fileno($term->OUT);
+	my $yesno = $term->readline(
+		# TRANSLATORS: please keep [y/N] as is.
+		sprintf(__("Are you sure you want to use <%s> [y/N]? "), $resp));
+	return defined $yesno && $yesno =~ /y/i;
+}
+
 sub ask {
 	my ($prompt, %arg) = @_;
 	my $valid_re = $arg{valid_re};
@@ -1008,10 +1021,7 @@ sub ask {
 			return $resp;
 		}
 		if ($confirm_only) {
-			my $yesno = $term->readline(
-				# TRANSLATORS: please keep [y/N] as is.
-				sprintf(__("Are you sure you want to use <%s> [y/N]? "), $resp));
-			if (defined $yesno && $yesno =~ /y/i) {
+			if (confirm_ask($resp)) {
 				return $resp;
 			}
 		}
@@ -1044,9 +1054,21 @@ sub file_declares_8bit_cte {
 	foreach my $f (sort keys %broken_encoding) {
 		print "    $f\n";
 	}
-	$auto_8bit_encoding = ask(__("Which 8bit encoding should I declare [UTF-8]? "),
-				  valid_re => qr/.{4}/, confirm_only => 1,
-				  default => "UTF-8");
+	while(1) {
+		my $encoding = ask(__("Declare which 8bit encoding to use [default: UTF-8]? "),
+		valid_re => qr/^\S+$/,
+		default  => "UTF-8");
+		next unless defined $encoding;
+		if (find_encoding($encoding)) {
+			$auto_8bit_encoding = $encoding;
+			last;
+		}
+		printf STDERR __("warning: '%s' does not appear to be a valid charset name.\n"), $encoding;
+		if (confirm_ask($encoding)) {
+			$auto_8bit_encoding = $encoding;
+			last;
+		}
+	}
 }

 if (!$force) {
diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index e56e0c8d77..24f6c76aee 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -1691,7 +1691,7 @@ test_expect_success $PREREQ 'asks about and fixes 8bit encodings' '
 			email-using-8bit >stdout &&
 	grep "do not declare a Content-Transfer-Encoding" stdout &&
 	grep email-using-8bit stdout &&
-	grep "Which 8bit encoding" stdout &&
+	grep "Declare which 8bit encoding to use" stdout &&
 	grep -E "Content|MIME" msgtxt1 >actual &&
 	test_cmp content-type-decl actual
 '

Range-diff against v2:
1:  954c1dae9f ! 1:  748bb03a00 send-email: validate charset name in 8bit encoding prompt
    @@ Commit message
         When a non-ASCII character is detected in the body or subject of the email
         the user is prompted with,

    -      Which 8bit encoding should I declare [UTF-8]? foo
    +            Which 8bit encoding should I declare [UTF-8]? foo

         After this the input string is validated by the regex, based on the fact
         that the charset string will be minimum 4 characters [1]. If the string is
         more than 4 letters the email is sent, if not then a second prompt to
         confirm is asked to the user,

    -      Are you sure you want to use <foo> [y/N]? y
    +            Are you sure you want to use <foo> [y/N]? y

         This relies on a length based regex heuristic check to validate the user
         input, and can allow clearly invalid charset names to pass if the input is
    @@ Commit message
         validation the lenght based validation becomes redundant and also breaks
         flow, so change the regex of valid input to any non blank string.

    +    Introduce a dedicated helper for confirmation handling that can be reused
    +    both by ask() and the custom 8bit prompt flow. This makes the encoding
    +    warning logic specific to the 8bit prompt, reduces the load on ask(), and
    +    improves maintainability.
    +
         Additionally, the wording of the first prompt can confuse the user if not
         read properly or under any default assumptions for a yes/no prompt. Change
         the wording to make it explicitly clear to the user that the prompt needs a
         string input, UTF-8 being the default.

    +    The intended flow is,
    +
    +            Declare which 8bit encoding to use [default: UTF-8]? foobar
    +            warning: 'foobar' does not appear to be a valid charset name.
    +            Are you sure you want to use <foobar> [y/N]?
    +
    +    [1]- https://github.com/git/git/commit/852a15d748034eec87adbee73a72689c8936fb8b
    +
         Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>

      ## git-send-email.perl ##
    @@ git-send-email.perl
      Getopt::Long::Configure qw/ pass_through /;

     @@ git-send-email.perl: sub get_patch_subject {
    + 	}
    + }
    +
    ++sub confirm_ask {
    ++	my ($resp) = @_;
    ++	my $term = term();
    ++	return 0
    ++		unless defined $term->IN and defined fileno($term->IN) and
    ++		       defined $term->OUT and defined fileno($term->OUT);
    ++	my $yesno = $term->readline(
    ++		# TRANSLATORS: please keep [y/N] as is.
    ++		sprintf(__("Are you sure you want to use <%s> [y/N]? "), $resp));
    ++	return defined $yesno && $yesno =~ /y/i;
    ++}
    ++
      sub ask {
      	my ($prompt, %arg) = @_;
      	my $valid_re = $arg{valid_re};
    -+	my $warn_invalid = $arg{warn_invalid};
    - 	my $default = $arg{default};
    - 	my $confirm_only = $arg{confirm_only};
    - 	my $resp;
     @@ git-send-email.perl: sub ask {
    - 			return $default;
    - 		}
    - 		if (!defined $valid_re or $resp =~ /$valid_re/) {
    --			return $resp;
    -+			if ($warn_invalid) {
    -+				if (find_encoding($resp)) {
    -+					return $resp;
    -+				} else {
    -+					printf STDERR __("warning: '%s' does not appear to be a valid charset name.\n"), $resp;
    -+				}
    -+			} else {
    -+				return $resp;
    -+			}
    + 			return $resp;
      		}
      		if ($confirm_only) {
    - 			my $yesno = $term->readline(
    +-			my $yesno = $term->readline(
    +-				# TRANSLATORS: please keep [y/N] as is.
    +-				sprintf(__("Are you sure you want to use <%s> [y/N]? "), $resp));
    +-			if (defined $yesno && $yesno =~ /y/i) {
    ++			if (confirm_ask($resp)) {
    + 				return $resp;
    + 			}
    + 		}
     @@ git-send-email.perl: sub file_declares_8bit_cte {
      	foreach my $f (sort keys %broken_encoding) {
      		print "    $f\n";
      	}
     -	$auto_8bit_encoding = ask(__("Which 8bit encoding should I declare [UTF-8]? "),
     -				  valid_re => qr/.{4}/, confirm_only => 1,
    -+	$auto_8bit_encoding = ask(__("Declare which 8bit encoding to use [default: UTF-8]? "),
    -+				  valid_re => qr/^\S+$/, confirm_only => 1,
    -+				  warn_invalid => 1,
    - 				  default => "UTF-8");
    +-				  default => "UTF-8");
    ++	while(1) {
    ++		my $encoding = ask(__("Declare which 8bit encoding to use [default: UTF-8]? "),
    ++		valid_re => qr/^\S+$/,
    ++		default  => "UTF-8");
    ++		next unless defined $encoding;
    ++		if (find_encoding($encoding)) {
    ++			$auto_8bit_encoding = $encoding;
    ++			last;
    ++		}
    ++		printf STDERR __("warning: '%s' does not appear to be a valid charset name.\n"), $encoding;
    ++		if (confirm_ask($encoding)) {
    ++			$auto_8bit_encoding = $encoding;
    ++			last;
    ++		}
    ++	}
      }

    + if (!$force) {

      ## t/t9001-send-email.sh ##
     @@ t/t9001-send-email.sh: test_expect_success $PREREQ 'asks about and fixes 8bit encodings' '
--
2.53.0.154.g7c02d39fc2.dirty

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] send-email: validate charset name in 8bit encoding prompt
  2026-02-25 16:37     ` D. Ben Knoble
@ 2026-02-26 17:32       ` Shreyansh Paliwal
  0 siblings, 0 replies; 27+ messages in thread
From: Shreyansh Paliwal @ 2026-02-26 17:32 UTC (permalink / raw)
  To: git; +Cc: ben.knoble, gitster, philipoakley

> On Tue, Feb 24, 2026 at 4:39 PM Shreyansh Paliwal
> <shreyanshpaliwalcmsmn@gmail.com> wrote:
> >
> > When a non-ASCII character is detected in the body or subject of the email
> > the user is prompted with,
> >
> >   Which 8bit encoding should I declare [UTF-8]? foo
> >
> > After this the input string is validated by the regex, based on the fact
> > that the charset string will be minimum 4 characters [1]. If the string is
> > more than 4 letters the email is sent, if not then a second prompt to
> > confirm is asked to the user,
> >
> >   Are you sure you want to use <foo> [y/N]? y
> >
> > This relies on a length based regex heuristic check to validate the user
> > input, and can allow clearly invalid charset names to pass if the input is
> > greater than 4 characters.
> >
> > Add a semantic validation of the charset name using the
> > Encode::find_encoding() module of perl. If the encoding is not recognized,
> > warn the user and ask for confirmation before proceeding. After this
> > validation the lenght based validation becomes redundant and also breaks
> > flow, so change the regex of valid input to any non blank string.
> >
> > Additionally, the wording of the first prompt can confuse the user if not
> > read properly or under any default assumptions for a yes/no prompt. Change
> > the wording to make it explicitly clear to the user that the prompt needs a
> > string input, UTF-8 being the default.
> >
> > The intended flow is,
> >
> >   Declare which 8bit encoding to use [default: UTF-8]? foobar
> >   warning: 'foobar' does not appear to be a valid charset name.
> >   Are you sure you want to use <foobar> [y/N]?
> >
> > [1]- https://github.com/git/git/commit/852a15d748034eec87adbee73a72689c8936fb8b
> >
> > Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
> > ---
> > Changes in v2:
> >  - Added braces in if-else block.
> >
> >  git-send-email.perl   | 17 ++++++++++++++---
> >  t/t9001-send-email.sh |  2 +-
> >  2 files changed, 15 insertions(+), 4 deletions(-)
> >
> > diff --git a/git-send-email.perl b/git-send-email.perl
> > index cd4b316ddc..15387ac377 100755
> > --- a/git-send-email.perl
> > +++ b/git-send-email.perl
> > @@ -23,6 +23,7 @@
> >  use Git::LoadCPAN::Error qw(:try);
> >  use Git;
> >  use Git::I18N;
> > +use Encode qw(find_encoding);
> >
> >  Getopt::Long::Configure qw/ pass_through /;
> >
> > @@ -987,6 +988,7 @@ sub get_patch_subject {
> >  sub ask {
> >         my ($prompt, %arg) = @_;
> >         my $valid_re = $arg{valid_re};
> > +       my $warn_invalid = $arg{warn_invalid};
> >         my $default = $arg{default};
> >         my $confirm_only = $arg{confirm_only};
> >         my $resp;
> > @@ -1005,7 +1007,15 @@ sub ask {
> >                         return $default;
> >                 }
> >                 if (!defined $valid_re or $resp =~ /$valid_re/) {
> > -                       return $resp;
> > +                       if ($warn_invalid) {
> > +                               if (find_encoding($resp)) {
> > +                                       return $resp;
> > +                               } else {
> > +                                       printf STDERR __("warning: '%s' does not appear to be a valid charset name.\n"), $resp;
> > +                               }
> > +                       } else {
> > +                               return $resp;
> > +                       }
>
> I think this is asking "ask" to do too much, since only encoding
> askers can use warn_invalid.
>
> What I rather meant was to extract relevant helper procedures so that
> open-coding ask around the encoding question would be easier to
> maintain.

Hi,

I have sent a v3 on this, in which I introduced a helper for the confirmation
prompt and also made validation logic specific to the 8bit prompt.
Do you think it would also be better to move the encoding validation
into a separate helper, or does the current split look reasonable?

Best,
Shreyansh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3] send-email: validate charset name in 8bit encoding prompt
  2026-02-26 16:16   ` [PATCH v3] " Shreyansh Paliwal
@ 2026-02-26 18:45     ` Junio C Hamano
  2026-02-26 19:06       ` Junio C Hamano
  2026-02-28  8:36       ` Shreyansh Paliwal
  0 siblings, 2 replies; 27+ messages in thread
From: Junio C Hamano @ 2026-02-26 18:45 UTC (permalink / raw)
  To: Shreyansh Paliwal; +Cc: git, ben.knoble, philipoakley

Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com> writes:

> diff --git a/git-send-email.perl b/git-send-email.perl
> index cd4b316ddc..3230b80701 100755
> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -23,6 +23,7 @@
>  use Git::LoadCPAN::Error qw(:try);
>  use Git;
>  use Git::I18N;
> +use Encode qw(find_encoding);

I wonder how common is this module already installed on users'
systems (not asking "how widely available"---which is "can users
easily make it work?", but asking "would this work out of box with
what users already have?").

> +sub confirm_ask {
> +	my ($resp) = @_;
> +	my $term = term();
> +	return 0
> +		unless defined $term->IN and defined fileno($term->IN) and
> +		       defined $term->OUT and defined fileno($term->OUT);
> +	my $yesno = $term->readline(
> +		# TRANSLATORS: please keep [y/N] as is.
> +		sprintf(__("Are you sure you want to use <%s> [y/N]? "), $resp));
> +	return defined $yesno && $yesno =~ /y/i;
> +}

This is a bit incosistent with what "sub ask" (the only caller of
this sub) does, isn't it?  Before entering the loop that makes a
call into this, it does this:

        sub ask {
                my ($prompt, %arg) = @_;
                my $valid_re = $arg{valid_re};
                my $default = $arg{default};
                my $confirm_only = $arg{confirm_only};
                my $resp;
                my $i = 0;
                my $term = term();
                return defined $default ? $default : undef
                        unless defined $term->IN and defined fileno($term->IN) and
                               defined $term->OUT and defined fileno($term->OUT);

If $term is not usable for interactive prompt, it uses the default
setting.  But the new confirm_ask always says "no".

confirm_ask does its own "check term() to see it is usable" because
it is called from another code path which does not have its own
logic, but it may be a wrong abstraction to give uneven interface.
It would make it more clear what is going on if you just do the
interactive $term->readline() thing in "sub ask", instead of calling
"sub confirm_ask" that does tghe $term thing redundantly.

Can't the other confirm_ask() caller call a normal "sub ask"?  

I am not sure why we want to add a dedicated sub, just to ask "are
you sure you want to use X [y/N]? ".

> The intended flow is,
>
>         Declare which 8bit encoding to use [default: UTF-8]? foobar
>         warning: 'foobar' does not appear to be a valid charset name.
>         Are you sure you want to use <foobar> [y/N]?

It somehow looks uneven to have three lines, two of them
capitalizing their first word while the other one is all lowercase.
I wonder if this would be simpler?

    Declare which 8bit encoding to use [default: UTF-8]?  foobar<RET>
    Do you really mean 'foobar', not a valid charset name [y/N]?



So, taking all of the above together, perhaps:

 * Discard changes to "sub ask" and addition of "sub confirm_ask".

 * Tweak this part a bit to call ask().

> +	while(1) {

Style.  missing SP before "(".

> +		my $encoding = ask(__("Declare which 8bit encoding to use [default: UTF-8]? "),

Overly long line.

> +		valid_re => qr/^\S+$/,
> +		default  => "UTF-8");
> +		next unless defined $encoding;
> +		if (find_encoding($encoding)) {
> +			$auto_8bit_encoding = $encoding;
> +			last;
> +		}

> +		printf STDERR __("warning: '%s' does not appear to be a valid charset name.\n"), $encoding;
> +		if (confirm_ask($encoding)) {

Use ask() to ask 

    Do you really mean 'foobar', not a valid charset name [y/N]?

here, perhaps?

> +			$auto_8bit_encoding = $encoding;
> +			last;
> +		}
> +	}
>  }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3] send-email: validate charset name in 8bit encoding prompt
  2026-02-26 18:45     ` Junio C Hamano
@ 2026-02-26 19:06       ` Junio C Hamano
  2026-02-28  8:41         ` Shreyansh Paliwal
  2026-02-28  8:36       ` Shreyansh Paliwal
  1 sibling, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2026-02-26 19:06 UTC (permalink / raw)
  To: Shreyansh Paliwal; +Cc: git, ben.knoble, philipoakley

Junio C Hamano <gitster@pobox.com> writes:

> Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com> writes:
>
>> diff --git a/git-send-email.perl b/git-send-email.perl
>> index cd4b316ddc..3230b80701 100755
>> --- a/git-send-email.perl
>> +++ b/git-send-email.perl
>> @@ -23,6 +23,7 @@
>>  use Git::LoadCPAN::Error qw(:try);
>>  use Git;
>>  use Git::I18N;
>> +use Encode qw(find_encoding);
>
> I wonder how common is this module already installed on users'
> systems (not asking "how widely available"---which is "can users
> easily make it work?", but asking "would this work out of box with
> what users already have?").

Answering my own question: "yes".

We use Encode::find_encoding as well as Encode::{de,en}code in
gitweb and git-svn, so it is very likely that anybody who has a full
installation of Git would already have it on their system.  Also
Encode.pm is distributed as part of Perl itself, if I am not
mistaken.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3] send-email: validate charset name in 8bit encoding prompt
  2026-02-26 18:45     ` Junio C Hamano
  2026-02-26 19:06       ` Junio C Hamano
@ 2026-02-28  8:36       ` Shreyansh Paliwal
  1 sibling, 0 replies; 27+ messages in thread
From: Shreyansh Paliwal @ 2026-02-28  8:36 UTC (permalink / raw)
  To: git; +Cc: ben.knoble, gitster, philipoakley

[...]
> > +sub confirm_ask {
> > +	my ($resp) = @_;
> > +	my $term = term();
> > +	return 0
> > +		unless defined $term->IN and defined fileno($term->IN) and
> > +		       defined $term->OUT and defined fileno($term->OUT);
> > +	my $yesno = $term->readline(
> > +		# TRANSLATORS: please keep [y/N] as is.
> > +		sprintf(__("Are you sure you want to use <%s> [y/N]? "), $resp));
> > +	return defined $yesno && $yesno =~ /y/i;
> > +}
>
> This is a bit incosistent with what "sub ask" (the only caller of
> this sub) does, isn't it?  Before entering the loop that makes a
> call into this, it does this:
>
>         sub ask {
>                 my ($prompt, %arg) = @_;
>                 my $valid_re = $arg{valid_re};
>                 my $default = $arg{default};
>                 my $confirm_only = $arg{confirm_only};
>                 my $resp;
>                 my $i = 0;
>                 my $term = term();
>                 return defined $default ? $default : undef
>                         unless defined $term->IN and defined fileno($term->IN) and
>                                defined $term->OUT and defined fileno($term->OUT);
>
> If $term is not usable for interactive prompt, it uses the default
> setting.  But the new confirm_ask always says "no".
>
> confirm_ask does its own "check term() to see it is usable" because
> it is called from another code path which does not have its own
> logic, but it may be a wrong abstraction to give uneven interface.
> It would make it more clear what is going on if you just do the
> interactive $term->readline() thing in "sub ask", instead of calling
> "sub confirm_ask" that does tghe $term thing redundantly.
>
> Can't the other confirm_ask() caller call a normal "sub ask"?
>
> I am not sure why we want to add a dedicated sub, just to ask "are
> you sure you want to use X [y/N]? ".
>
> > The intended flow is,
> >
> >         Declare which 8bit encoding to use [default: UTF-8]? foobar
> >         warning: 'foobar' does not appear to be a valid charset name.
> >         Are you sure you want to use <foobar> [y/N]?
>
> It somehow looks uneven to have three lines, two of them
> capitalizing their first word while the other one is all lowercase.
> I wonder if this would be simpler?
>
>     Declare which 8bit encoding to use [default: UTF-8]?  foobar<RET>
>     Do you really mean 'foobar', not a valid charset name [y/N]?
>

Actually that makes sense, because if we need to add a special warning
in between the two prompts (what I was aiming for), either we need to
modify ask() to add the warning into the flow, or we had to seperate the
confirm_ask because we have to change the flow in any case, but if we
drop the additional warning, and instead warn/confirm together in the
second prompt we dont need this abstraction.

>
>
> So, taking all of the above together, perhaps:
>
>  * Discard changes to "sub ask" and addition of "sub confirm_ask".
>
>  * Tweak this part a bit to call ask().
>
> > +	while(1) {
>
> Style.  missing SP before "(".
>
> > +		my $encoding = ask(__("Declare which 8bit encoding to use [default: UTF-8]? "),
>
> Overly long line.
>

my bad. will fix.

> > +		valid_re => qr/^\S+$/,
> > +		default  => "UTF-8");
> > +		next unless defined $encoding;
> > +		if (find_encoding($encoding)) {
> > +			$auto_8bit_encoding = $encoding;
> > +			last;
> > +		}
>
> > +		printf STDERR __("warning: '%s' does not appear to be a valid charset name.\n"), $encoding;
> > +		if (confirm_ask($encoding)) {
>
> Use ask() to ask
>
>     Do you really mean 'foobar', not a valid charset name [y/N]?
>
> here, perhaps?
>

Understood. I am hoping now this doesn't need any additional
abstraction as Ben suggested.
Sorry for the delay in response to the review.
I will send a reroll.

Thanks,
Shreyansh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3] send-email: validate charset name in 8bit encoding prompt
  2026-02-26 19:06       ` Junio C Hamano
@ 2026-02-28  8:41         ` Shreyansh Paliwal
  0 siblings, 0 replies; 27+ messages in thread
From: Shreyansh Paliwal @ 2026-02-28  8:41 UTC (permalink / raw)
  To: git; +Cc: ben.knoble, gitster, philipoakley

> Junio C Hamano <gitster@pobox.com> writes:
>
> > Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com> writes:
> >
> >> diff --git a/git-send-email.perl b/git-send-email.perl
> >> index cd4b316ddc..3230b80701 100755
> >> --- a/git-send-email.perl
> >> +++ b/git-send-email.perl
> >> @@ -23,6 +23,7 @@
> >>  use Git::LoadCPAN::Error qw(:try);
> >>  use Git;
> >>  use Git::I18N;
> >> +use Encode qw(find_encoding);
> >
> > I wonder how common is this module already installed on users'
> > systems (not asking "how widely available"---which is "can users
> > easily make it work?", but asking "would this work out of box with
> > what users already have?").
>
> Answering my own question: "yes".
>
> We use Encode::find_encoding as well as Encode::{de,en}code in
> gitweb and git-svn, so it is very likely that anybody who has a full
> installation of Git would already have it on their system.  Also
> Encode.pm is distributed as part of Perl itself, if I am not
> mistaken.

That's right, Encode is bundled with Perl, so users do not need to
install anything extra, other than what is already required for
building Git.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v4] send-email: validate charset name in 8bit encoding prompt
  2026-02-24 14:33 ` [PATCH] send-email: validate charset name in 8bit encoding prompt Shreyansh Paliwal
                     ` (2 preceding siblings ...)
  2026-02-26 16:16   ` [PATCH v3] " Shreyansh Paliwal
@ 2026-02-28 11:20   ` Shreyansh Paliwal
  2026-02-28 21:16     ` D. Ben Knoble
  2026-03-02 16:10     ` Junio C Hamano
  3 siblings, 2 replies; 27+ messages in thread
From: Shreyansh Paliwal @ 2026-02-28 11:20 UTC (permalink / raw)
  To: git; +Cc: ben.knoble, gitster, philipoakley, Shreyansh Paliwal

When a non-ASCII character is detected in the body or subject of the email
the user is prompted with,

        Which 8bit encoding should I declare [UTF-8]? foo

After this the input string is validated by the regex, based on the fact
that the charset string will be minimum 4 characters [1]. If the string is
more than 4 letters the email is sent, if not then a second prompt to
confirm is asked to the user,

        Are you sure you want to use <foo> [y/N]? y

This relies on a length based regex heuristic check to validate the user
input, and can allow clearly invalid charset names to pass if the input is
greater than 4 characters.

Add a semantic validation of the charset name using the
Encode::find_encoding() which is a bundled module of perl. If the encoding
is not recognized, warn the user and ask for confirmation before proceeding.
After this validation the lenght based validation becomes redundant and also
breaks flow, so change the regex of valid input to any non blank string.

Make the encoding warning logic specific to the 8bit prompt, also add a
unique confirmation prompt which  reduces the load on ask(), and improves
maintainability.

Additionally, the wording of the first prompt can confuse the user if not
read properly or under any default assumptions for a yes/no prompt. Change
the wording to make it explicitly clear to the user that the prompt needs a
string input, UTF-8 being the default.

The intended flow is,

        Declare which 8bit encoding to use [default: UTF-8]? foobar
        <foobar> does not appear to be a valid charset name. Use it anyway [y/N]?

[1]- https://github.com/git/git/commit/852a15d748034eec87adbee73a72689c8936fb8b

Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
---
Changes in v4:
 - removed the confirm_ask() helper and changes to ask().
 - make a new warning/confirmation prompt specific to the 8bit encoding flow.

 git-send-email.perl   | 25 ++++++++++++++++++++++---
 t/t9001-send-email.sh |  2 +-
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index cd4b316ddc..3186104709 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -23,6 +23,7 @@
 use Git::LoadCPAN::Error qw(:try);
 use Git;
 use Git::I18N;
+use Encode qw(find_encoding);

 Getopt::Long::Configure qw/ pass_through /;

@@ -1044,9 +1045,27 @@ sub file_declares_8bit_cte {
 	foreach my $f (sort keys %broken_encoding) {
 		print "    $f\n";
 	}
-	$auto_8bit_encoding = ask(__("Which 8bit encoding should I declare [UTF-8]? "),
-				  valid_re => qr/.{4}/, confirm_only => 1,
-				  default => "UTF-8");
+	while (1) {
+		my $encoding = ask(
+			__("Declare which 8bit encoding to use [default: UTF-8]? "),
+			valid_re => qr/^\S+$/,
+			default  => "UTF-8");
+		next unless defined $encoding;
+		if (find_encoding($encoding)) {
+			$auto_8bit_encoding = $encoding;
+			last;
+		}
+		my $yesno = ask(
+			sprintf(
+			__("'%s' does not appear to be a valid charset name. Use it anyway [y/N]? "),
+			$encoding),
+			valid_re => qr/^(?:y|n)/i,
+			default => "n");
+		if (defined $yesno && $yesno =~ /^y/i) {
+			$auto_8bit_encoding = $encoding;
+			last;
+		}
+	}
 }

 if (!$force) {
diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index e56e0c8d77..24f6c76aee 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -1691,7 +1691,7 @@ test_expect_success $PREREQ 'asks about and fixes 8bit encodings' '
 			email-using-8bit >stdout &&
 	grep "do not declare a Content-Transfer-Encoding" stdout &&
 	grep email-using-8bit stdout &&
-	grep "Which 8bit encoding" stdout &&
+	grep "Declare which 8bit encoding to use" stdout &&
 	grep -E "Content|MIME" msgtxt1 >actual &&
 	test_cmp content-type-decl actual
 '

Range-diff against v3:
1:  748bb03a00 ! 1:  37e17eac68 send-email: validate charset name in 8bit encoding prompt
    @@ Commit message
         validation the lenght based validation becomes redundant and also breaks
         flow, so change the regex of valid input to any non blank string.

    -    Introduce a dedicated helper for confirmation handling that can be reused
    -    both by ask() and the custom 8bit prompt flow. This makes the encoding
    -    warning logic specific to the 8bit prompt, reduces the load on ask(), and
    -    improves maintainability.
    +    Make the encoding warning logic specific to the 8bit prompt, also add a
    +    unique confirmation prompt which  reduces the load on ask(), and improves
    +    maintainability.

         Additionally, the wording of the first prompt can confuse the user if not
         read properly or under any default assumptions for a yes/no prompt. Change
    @@ Commit message
         The intended flow is,

                 Declare which 8bit encoding to use [default: UTF-8]? foobar
    -            warning: 'foobar' does not appear to be a valid charset name.
    -            Are you sure you want to use <foobar> [y/N]?
    +            <foobar> does not appear to be a valid charset name. Use it anyway [y/N]?

         [1]- https://github.com/git/git/commit/852a15d748034eec87adbee73a72689c8936fb8b

    @@ git-send-email.perl

      Getopt::Long::Configure qw/ pass_through /;

    -@@ git-send-email.perl: sub get_patch_subject {
    - 	}
    - }
    -
    -+sub confirm_ask {
    -+	my ($resp) = @_;
    -+	my $term = term();
    -+	return 0
    -+		unless defined $term->IN and defined fileno($term->IN) and
    -+		       defined $term->OUT and defined fileno($term->OUT);
    -+	my $yesno = $term->readline(
    -+		# TRANSLATORS: please keep [y/N] as is.
    -+		sprintf(__("Are you sure you want to use <%s> [y/N]? "), $resp));
    -+	return defined $yesno && $yesno =~ /y/i;
    -+}
    -+
    - sub ask {
    - 	my ($prompt, %arg) = @_;
    - 	my $valid_re = $arg{valid_re};
    -@@ git-send-email.perl: sub ask {
    - 			return $resp;
    - 		}
    - 		if ($confirm_only) {
    --			my $yesno = $term->readline(
    --				# TRANSLATORS: please keep [y/N] as is.
    --				sprintf(__("Are you sure you want to use <%s> [y/N]? "), $resp));
    --			if (defined $yesno && $yesno =~ /y/i) {
    -+			if (confirm_ask($resp)) {
    - 				return $resp;
    - 			}
    - 		}
     @@ git-send-email.perl: sub file_declares_8bit_cte {
      	foreach my $f (sort keys %broken_encoding) {
      		print "    $f\n";
    @@ git-send-email.perl: sub file_declares_8bit_cte {
     -	$auto_8bit_encoding = ask(__("Which 8bit encoding should I declare [UTF-8]? "),
     -				  valid_re => qr/.{4}/, confirm_only => 1,
     -				  default => "UTF-8");
    -+	while(1) {
    -+		my $encoding = ask(__("Declare which 8bit encoding to use [default: UTF-8]? "),
    -+		valid_re => qr/^\S+$/,
    -+		default  => "UTF-8");
    ++	while (1) {
    ++		my $encoding = ask(
    ++			__("Declare which 8bit encoding to use [default: UTF-8]? "),
    ++			valid_re => qr/^\S+$/,
    ++			default  => "UTF-8");
     +		next unless defined $encoding;
     +		if (find_encoding($encoding)) {
     +			$auto_8bit_encoding = $encoding;
     +			last;
     +		}
    -+		printf STDERR __("warning: '%s' does not appear to be a valid charset name.\n"), $encoding;
    -+		if (confirm_ask($encoding)) {
    ++		my $yesno = ask(
    ++			sprintf(
    ++			__("'%s' does not appear to be a valid charset name. Use it anyway [y/N]? "),
    ++			$encoding),
    ++			valid_re => qr/^(?:y|n)/i,
    ++			default => "n");
    ++		if (defined $yesno && $yesno =~ /^y/i) {
     +			$auto_8bit_encoding = $encoding;
     +			last;
     +		}
--
2.53.0.155.g748bb03a00.dirty

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] send-email: validate charset name in 8bit encoding prompt
  2026-02-28 11:20   ` [PATCH v4] " Shreyansh Paliwal
@ 2026-02-28 21:16     ` D. Ben Knoble
  2026-03-02 16:10     ` Junio C Hamano
  1 sibling, 0 replies; 27+ messages in thread
From: D. Ben Knoble @ 2026-02-28 21:16 UTC (permalink / raw)
  To: Shreyansh Paliwal; +Cc: git, gitster, philipoakley

On Sat, Feb 28, 2026 at 6:22 AM Shreyansh Paliwal
<shreyanshpaliwalcmsmn@gmail.com> wrote:
>
> When a non-ASCII character is detected in the body or subject of the email
> the user is prompted with,
>
>         Which 8bit encoding should I declare [UTF-8]? foo
>
> After this the input string is validated by the regex, based on the fact
> that the charset string will be minimum 4 characters [1]. If the string is
> more than 4 letters the email is sent, if not then a second prompt to
> confirm is asked to the user,
>
>         Are you sure you want to use <foo> [y/N]? y
>
> This relies on a length based regex heuristic check to validate the user
> input, and can allow clearly invalid charset names to pass if the input is
> greater than 4 characters.
>
> Add a semantic validation of the charset name using the
> Encode::find_encoding() which is a bundled module of perl. If the encoding
> is not recognized, warn the user and ask for confirmation before proceeding.
> After this validation the lenght based validation becomes redundant and also
> breaks flow, so change the regex of valid input to any non blank string.
>
> Make the encoding warning logic specific to the 8bit prompt, also add a
> unique confirmation prompt which  reduces the load on ask(), and improves
> maintainability.
>
> Additionally, the wording of the first prompt can confuse the user if not
> read properly or under any default assumptions for a yes/no prompt. Change
> the wording to make it explicitly clear to the user that the prompt needs a
> string input, UTF-8 being the default.
>
> The intended flow is,
>
>         Declare which 8bit encoding to use [default: UTF-8]? foobar
>         <foobar> does not appear to be a valid charset name. Use it anyway [y/N]?
>
> [1]- https://github.com/git/git/commit/852a15d748034eec87adbee73a72689c8936fb8b
>
> Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
> ---
> Changes in v4:
>  - removed the confirm_ask() helper and changes to ask().
>  - make a new warning/confirmation prompt specific to the 8bit encoding flow.
>
>  git-send-email.perl   | 25 ++++++++++++++++++++++---
>  t/t9001-send-email.sh |  2 +-
>  2 files changed, 23 insertions(+), 4 deletions(-)
>
> diff --git a/git-send-email.perl b/git-send-email.perl
> index cd4b316ddc..3186104709 100755
> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -23,6 +23,7 @@
>  use Git::LoadCPAN::Error qw(:try);
>  use Git;
>  use Git::I18N;
> +use Encode qw(find_encoding);
>
>  Getopt::Long::Configure qw/ pass_through /;
>
> @@ -1044,9 +1045,27 @@ sub file_declares_8bit_cte {
>         foreach my $f (sort keys %broken_encoding) {
>                 print "    $f\n";
>         }
> -       $auto_8bit_encoding = ask(__("Which 8bit encoding should I declare [UTF-8]? "),
> -                                 valid_re => qr/.{4}/, confirm_only => 1,
> -                                 default => "UTF-8");
> +       while (1) {
> +               my $encoding = ask(
> +                       __("Declare which 8bit encoding to use [default: UTF-8]? "),
> +                       valid_re => qr/^\S+$/,
> +                       default  => "UTF-8");
> +               next unless defined $encoding;
> +               if (find_encoding($encoding)) {
> +                       $auto_8bit_encoding = $encoding;
> +                       last;
> +               }
> +               my $yesno = ask(
> +                       sprintf(
> +                       __("'%s' does not appear to be a valid charset name. Use it anyway [y/N]? "),
> +                       $encoding),
> +                       valid_re => qr/^(?:y|n)/i,
> +                       default => "n");
> +               if (defined $yesno && $yesno =~ /^y/i) {
> +                       $auto_8bit_encoding = $encoding;
> +                       last;
> +               }
> +       }
>  }
>
>  if (!$force) {
> diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
> index e56e0c8d77..24f6c76aee 100755
> --- a/t/t9001-send-email.sh
> +++ b/t/t9001-send-email.sh
> @@ -1691,7 +1691,7 @@ test_expect_success $PREREQ 'asks about and fixes 8bit encodings' '
>                         email-using-8bit >stdout &&
>         grep "do not declare a Content-Transfer-Encoding" stdout &&
>         grep email-using-8bit stdout &&
> -       grep "Which 8bit encoding" stdout &&
> +       grep "Declare which 8bit encoding to use" stdout &&
>         grep -E "Content|MIME" msgtxt1 >actual &&
>         test_cmp content-type-decl actual
>  '
>
> Range-diff against v3:
> 1:  748bb03a00 ! 1:  37e17eac68 send-email: validate charset name in 8bit encoding prompt
>     @@ Commit message
>          validation the lenght based validation becomes redundant and also breaks
>          flow, so change the regex of valid input to any non blank string.
>
>     -    Introduce a dedicated helper for confirmation handling that can be reused
>     -    both by ask() and the custom 8bit prompt flow. This makes the encoding
>     -    warning logic specific to the 8bit prompt, reduces the load on ask(), and
>     -    improves maintainability.
>     +    Make the encoding warning logic specific to the 8bit prompt, also add a
>     +    unique confirmation prompt which  reduces the load on ask(), and improves
>     +    maintainability.
>
>          Additionally, the wording of the first prompt can confuse the user if not
>          read properly or under any default assumptions for a yes/no prompt. Change
>     @@ Commit message
>          The intended flow is,
>
>                  Declare which 8bit encoding to use [default: UTF-8]? foobar
>     -            warning: 'foobar' does not appear to be a valid charset name.
>     -            Are you sure you want to use <foobar> [y/N]?
>     +            <foobar> does not appear to be a valid charset name. Use it anyway [y/N]?
>
>          [1]- https://github.com/git/git/commit/852a15d748034eec87adbee73a72689c8936fb8b
>
>     @@ git-send-email.perl
>
>       Getopt::Long::Configure qw/ pass_through /;
>
>     -@@ git-send-email.perl: sub get_patch_subject {
>     -   }
>     - }
>     -
>     -+sub confirm_ask {
>     -+  my ($resp) = @_;
>     -+  my $term = term();
>     -+  return 0
>     -+          unless defined $term->IN and defined fileno($term->IN) and
>     -+                 defined $term->OUT and defined fileno($term->OUT);
>     -+  my $yesno = $term->readline(
>     -+          # TRANSLATORS: please keep [y/N] as is.
>     -+          sprintf(__("Are you sure you want to use <%s> [y/N]? "), $resp));
>     -+  return defined $yesno && $yesno =~ /y/i;
>     -+}
>     -+
>     - sub ask {
>     -   my ($prompt, %arg) = @_;
>     -   my $valid_re = $arg{valid_re};
>     -@@ git-send-email.perl: sub ask {
>     -                   return $resp;
>     -           }
>     -           if ($confirm_only) {
>     --                  my $yesno = $term->readline(
>     --                          # TRANSLATORS: please keep [y/N] as is.
>     --                          sprintf(__("Are you sure you want to use <%s> [y/N]? "), $resp));
>     --                  if (defined $yesno && $yesno =~ /y/i) {
>     -+                  if (confirm_ask($resp)) {
>     -                           return $resp;
>     -                   }
>     -           }
>      @@ git-send-email.perl: sub file_declares_8bit_cte {
>         foreach my $f (sort keys %broken_encoding) {
>                 print "    $f\n";
>     @@ git-send-email.perl: sub file_declares_8bit_cte {
>      -  $auto_8bit_encoding = ask(__("Which 8bit encoding should I declare [UTF-8]? "),
>      -                            valid_re => qr/.{4}/, confirm_only => 1,
>      -                            default => "UTF-8");
>     -+  while(1) {
>     -+          my $encoding = ask(__("Declare which 8bit encoding to use [default: UTF-8]? "),
>     -+          valid_re => qr/^\S+$/,
>     -+          default  => "UTF-8");
>     ++  while (1) {
>     ++          my $encoding = ask(
>     ++                  __("Declare which 8bit encoding to use [default: UTF-8]? "),
>     ++                  valid_re => qr/^\S+$/,
>     ++                  default  => "UTF-8");
>      +          next unless defined $encoding;
>      +          if (find_encoding($encoding)) {
>      +                  $auto_8bit_encoding = $encoding;
>      +                  last;
>      +          }
>     -+          printf STDERR __("warning: '%s' does not appear to be a valid charset name.\n"), $encoding;
>     -+          if (confirm_ask($encoding)) {
>     ++          my $yesno = ask(
>     ++                  sprintf(
>     ++                  __("'%s' does not appear to be a valid charset name. Use it anyway [y/N]? "),
>     ++                  $encoding),
>     ++                  valid_re => qr/^(?:y|n)/i,
>     ++                  default => "n");
>     ++          if (defined $yesno && $yesno =~ /^y/i) {
>      +                  $auto_8bit_encoding = $encoding;
>      +                  last;
>      +          }
> --
> 2.53.0.155.g748bb03a00.dirty

This version looks nice to my eyes. Thanks!

-- 
D. Ben Knoble

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] send-email: validate charset name in 8bit encoding prompt
  2026-02-28 11:20   ` [PATCH v4] " Shreyansh Paliwal
  2026-02-28 21:16     ` D. Ben Knoble
@ 2026-03-02 16:10     ` Junio C Hamano
  2026-03-03 19:06       ` Shreyansh Paliwal
  1 sibling, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2026-03-02 16:10 UTC (permalink / raw)
  To: Shreyansh Paliwal; +Cc: git, ben.knoble, philipoakley

Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com> writes:

> Additionally, the wording of the first prompt can confuse the user if not
> read properly or under any default assumptions for a yes/no prompt. Change
> the wording to make it explicitly clear to the user that the prompt needs a
> string input, UTF-8 being the default.
>
> The intended flow is,
>
>         Declare which 8bit encoding to use [default: UTF-8]? foobar
>         <foobar> does not appear to be a valid charset name. Use it anyway [y/N]?
>
> [1]- https://github.com/git/git/commit/852a15d748034eec87adbee73a72689c8936fb8b
>
> Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
> ---
> Changes in v4:
>  - removed the confirm_ask() helper and changes to ask().
>  - make a new warning/confirmation prompt specific to the 8bit encoding flow.

Looking quite straight-forward.  Will replace.

Shall we declare victory and mark the topic for 'next'?

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4] send-email: validate charset name in 8bit encoding prompt
  2026-03-02 16:10     ` Junio C Hamano
@ 2026-03-03 19:06       ` Shreyansh Paliwal
  0 siblings, 0 replies; 27+ messages in thread
From: Shreyansh Paliwal @ 2026-03-03 19:06 UTC (permalink / raw)
  To: git; +Cc: ben.knoble, gitster, philipoakley

> > Additionally, the wording of the first prompt can confuse the user if not
> > read properly or under any default assumptions for a yes/no prompt. Change
> > the wording to make it explicitly clear to the user that the prompt needs a
> > string input, UTF-8 being the default.
> >
> > The intended flow is,
> >
> >         Declare which 8bit encoding to use [default: UTF-8]? foobar
> >         <foobar> does not appear to be a valid charset name. Use it anyway [y/N]?
> >
> > [1]- https://github.com/git/git/commit/852a15d748034eec87adbee73a72689c8936fb8b
> >
> > Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
> > ---
> > Changes in v4:
> >  - removed the confirm_ask() helper and changes to ask().
> >  - make a new warning/confirmation prompt specific to the 8bit encoding flow.
>
> Looking quite straight-forward.  Will replace.
>
> Shall we declare victory and mark the topic for 'next'?
>
> Thanks.

Yup, it is good to go from my side, presumably from Ben too.

Best,
Shreyansh

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2026-03-03 19:07 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-20 14:50 [RFC] send-email: UTF-8 encoding in subject line Shreyansh Paliwal
2026-02-21  2:28 ` Ben Knoble
2026-02-21 13:38   ` Shreyansh Paliwal
2026-02-21 17:30     ` Junio C Hamano
2026-02-22 14:03       ` Shreyansh Paliwal
2026-02-22 14:53         ` Philip Oakley
2026-02-22 15:00         ` D. Ben Knoble
2026-02-22 15:52           ` Shreyansh Paliwal
2026-02-23 21:38             ` Ben Knoble
2026-02-24  7:55               ` [GSOC] Discuss: Refactoring in order to reduce global state Shreyansh Paliwal
2026-02-22 14:53       ` [RFC] send-email: UTF-8 encoding in subject line D. Ben Knoble
2026-02-24 14:33 ` [PATCH] send-email: validate charset name in 8bit encoding prompt Shreyansh Paliwal
2026-02-24 21:11   ` Junio C Hamano
2026-02-24 21:37   ` [PATCH v2] " Shreyansh Paliwal
2026-02-24 22:06     ` Junio C Hamano
2026-02-24 22:20       ` Shreyansh Paliwal
2026-02-25 16:37     ` D. Ben Knoble
2026-02-26 17:32       ` Shreyansh Paliwal
2026-02-26 16:16   ` [PATCH v3] " Shreyansh Paliwal
2026-02-26 18:45     ` Junio C Hamano
2026-02-26 19:06       ` Junio C Hamano
2026-02-28  8:41         ` Shreyansh Paliwal
2026-02-28  8:36       ` Shreyansh Paliwal
2026-02-28 11:20   ` [PATCH v4] " Shreyansh Paliwal
2026-02-28 21:16     ` D. Ben Knoble
2026-03-02 16:10     ` Junio C Hamano
2026-03-03 19:06       ` Shreyansh Paliwal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox