public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Ben Knoble <ben.knoble@gmail.com>
To: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
Cc: git@vger.kernel.org, gitster@pobox.com
Subject: Re: [RFC] send-email: UTF-8 encoding in subject line
Date: Mon, 23 Feb 2026 16:38:31 -0500	[thread overview]
Message-ID: <43DCEEB9-33C4-4EE2-9FF3-49DCB9B837E0@gmail.com> (raw)
In-Reply-To: <20260222155559.1777883-1-shreyanshpaliwalcmsmn@gmail.com>


> Le 22 févr. 2026 à 10:56, Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com> a écrit :
> 
> 
>> 
>>> On Sun, Feb 22, 2026 at 9:07 AM Shreyansh Paliwal
>>> <shreyanshpaliwalcmsmn@gmail.com> wrote:
>>> 
>>>>> That makes sense, I tried it below.
>>>>> I also wondered whether, in addition to this, it might be helpful to warn on
>>>>> an invalid charset, and/or possibly fall back to UTF-8.
>>>> 
>>>> Agreed on the first half of the statement, if we have an easy and
>>>> portable way to tell if a given random string names a valid charset.
>>>> I do not recommend to "fall back" to anything, if we are asking an
>>>> input from the user.
>>> 
>>> Following up on this, I tried adding a warning when the provided charset
>>> does not appear to be valid. Current flow is,
>>> 
>>>  Which 8bit encoding should I declare [UTF-8]? y
>>>  Are you sure you want to use <y> [y/N]? y
>>> 
>>> With the additional check, it becomes,
>>> 
>>>  Which 8bit encoding should I declare [default: UTF-8]? y
>>>  warning: 'y' does not appear to be a valid charset name.
>>>  Are you sure you want to use <y> [y/N]?
>>> 
>>> This uses find_encoding() from Perl’s Encode module to detect any
>>> unrecognized charset names.
>>> 
>>> Let me know what you think.
>>> Also, is there any new test that should be added for this change?
>>> 
>>> Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
>>> ---
>>> git-send-email.perl | 23 ++++++++++++++++++++---
>>> 1 file changed, 20 insertions(+), 3 deletions(-)
>>> 
>>> diff --git a/git-send-email.perl b/git-send-email.perl
>>> index cd4b316ddc..e62fa259ba 100755
>>> --- a/git-send-email.perl
>>> +++ b/git-send-email.perl
>>> @@ -23,6 +23,7 @@
>>> use Git::LoadCPAN::Error qw(:try);
>>> use Git;
>>> use Git::I18N;
>>> +use Encode qw(find_encoding);
>>> 
>>> Getopt::Long::Configure qw/ pass_through /;
>>> 
>>> @@ -1044,9 +1045,25 @@ sub file_declares_8bit_cte {
>>>        foreach my $f (sort keys %broken_encoding) {
>>>                print "    $f\n";
>>>        }
>>> -       $auto_8bit_encoding = ask(__("Which 8bit encoding should I declare [UTF-8]? "),
>>> -                                 valid_re => qr/.{4}/, confirm_only => 1,
>>> -                                 default => "UTF-8");
>>> +       while (1) {
>>> +               my $encoding = ask(__("Which 8bit encoding should I declare [default: UTF-8]? "),
>>> +                       valid_re => qr/^\S+$/,
>>> +                       default  => "UTF-8");
>> 
>> Here we change things, right?
>> 
>> - The original validation is "at least 4 characters", the new
>> validation is "at least one non-blank." I'm not sure why we'd prefer
>> one or the other, frankly. The original goes to 852a15d748
>> (send-email: ask confirmation if given encoding name is very short,
>> 2015-02-13), which is motivated by the same problem we're discussing
>> here!
> 
> I see.
> My understanding of the earlier change (852a15d748) is that the
> length check was intended as a heuristic check to catch obviously invalid
> inputs like "y" and trigger an extra confirmation based on the fact that
> charset names would be at least 4 letters.
> 
> With the additional find_encoding() check, the validation becomes semantic
> rather than length-based, recognized charset names are accepted directly,
> while unrecognized ones trigger a warning and still require explicit
> confirmation. The relaxed regex (at least one non-blank) is only meant to
> ensure we receive some non-empty input before passing it to find_encoding().
> 
>> - We get rid of confirm_only, since we're about to roll our own
>> confirmation below:
>> 
>>> +               next unless defined $encoding;
>>> +               if (find_encoding($encoding)) {
>>> +                       $auto_8bit_encoding = $encoding;
>>> +                       last;
>>> +               }
>>> +               printf STDERR __("warning: '%s' does not appear to be a valid charset name.\n"), $encoding;
>>> +               my $yesno = ask(
>>> +                       sprintf(__("Are you sure you want to use <%s> [y/N]? "), $encoding),
>>> +                       valid_re => qr/^(?:y|n)/i,
>>> +                       default  => 'n');
>> 
>> …which might want refactored a bit so it can stay close to the original? idk.
>> 
> 
> Actually the flow needed to change slightly to insert the validity warning
> before the final confirmation step. Since ask() handles confirmation internally
> using confrim_only and is used in multiple places, it seemed simpler to keep the
> additional confirmation local here rather than modifying ask() itself.
> 
> Let me know what you think.
> 
> Best,
> Shreyansh

Ah, my mistake for being ambiguous. I meant:

The code is similar enough to the original that perhaps a helper can be introduced, or at least we should keep the equivalent strings together to help those who change one. 

  reply	other threads:[~2026-02-23 21:38 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-20 14:50 [RFC] send-email: UTF-8 encoding in subject line Shreyansh Paliwal
2026-02-21  2:28 ` Ben Knoble
2026-02-21 13:38   ` Shreyansh Paliwal
2026-02-21 17:30     ` Junio C Hamano
2026-02-22 14:03       ` Shreyansh Paliwal
2026-02-22 14:53         ` Philip Oakley
2026-02-22 15:00         ` D. Ben Knoble
2026-02-22 15:52           ` Shreyansh Paliwal
2026-02-23 21:38             ` Ben Knoble [this message]
2026-02-24  7:55               ` [GSOC] Discuss: Refactoring in order to reduce global state Shreyansh Paliwal
2026-02-22 14:53       ` [RFC] send-email: UTF-8 encoding in subject line D. Ben Knoble
2026-02-24 14:33 ` [PATCH] send-email: validate charset name in 8bit encoding prompt Shreyansh Paliwal
2026-02-24 21:11   ` Junio C Hamano
2026-02-24 21:37   ` [PATCH v2] " Shreyansh Paliwal
2026-02-24 22:06     ` Junio C Hamano
2026-02-24 22:20       ` Shreyansh Paliwal
2026-02-25 16:37     ` D. Ben Knoble
2026-02-26 17:32       ` Shreyansh Paliwal
2026-02-26 16:16   ` [PATCH v3] " Shreyansh Paliwal
2026-02-26 18:45     ` Junio C Hamano
2026-02-26 19:06       ` Junio C Hamano
2026-02-28  8:41         ` Shreyansh Paliwal
2026-02-28  8:36       ` Shreyansh Paliwal
2026-02-28 11:20   ` [PATCH v4] " Shreyansh Paliwal
2026-02-28 21:16     ` D. Ben Knoble
2026-03-02 16:10     ` Junio C Hamano
2026-03-03 19:06       ` Shreyansh Paliwal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43DCEEB9-33C4-4EE2-9FF3-49DCB9B837E0@gmail.com \
    --to=ben.knoble@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=shreyanshpaliwalcmsmn@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox