git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Erik Faye-Lund <kusmabite@gmail.com>
To: Matthieu Moy <Matthieu.Moy@grenoble-inp.fr>
Cc: git@vger.kernel.org
Subject: Re: [PATH/RFC] parse-options: report invalid UTF-8 switches
Date: Mon, 11 Feb 2013 14:57:17 +0100	[thread overview]
Message-ID: <CABPQNSbYCrdSP5rWbfLX==u--bJpQo6A6sNE46a1RuU-fMDiWg@mail.gmail.com> (raw)
In-Reply-To: <vpqobfr9da7.fsf@grenoble-inp.fr>

On Mon, Feb 11, 2013 at 2:43 PM, Matthieu Moy
<Matthieu.Moy@grenoble-inp.fr> wrote:
> Erik Faye-Lund <kusmabite@gmail.com> writes:
>
>> --- a/parse-options.c
>> +++ b/parse-options.c
>> @@ -3,6 +3,7 @@
>>  #include "cache.h"
>>  #include "commit.h"
>>  #include "color.h"
>> +#include "utf8.h"
>>
>>  static int parse_options_usage(struct parse_opt_ctx_t *ctx,
>>                              const char * const *usagestr,
>> @@ -462,7 +463,9 @@ int parse_options(int argc, const char **argv, const char *prefix,
>>               if (ctx.argv[0][1] == '-') {
>>                       error("unknown option `%s'", ctx.argv[0] + 2);
>>               } else {
>> -                     error("unknown switch `%c'", *ctx.opt);
>> +                     const char *next = ctx.opt;
>> +                     utf8_width(&next, NULL);
>> +                     error("unknown switch `%.*s'", (int)(next - ctx.opt), ctx.opt);
>>               }
>>               usage_with_options(usagestr, options);
>>       }
>
> You should be careful with the case where the user has a non-UTF8
> environment, and entered a non-ascii sequence. I can see two cases:
>
> 1) The non-ascii sequence is valid UTF-8, then I guess your patch would
>    show two characters instead of one. Not really correct, but not really
>    serious either.

Hm. So we would end up trading some form of corruption for some other.
Not the biggest problem in the world, but perhaps there's a way of
fixing it?

I'm not entirely sure how to correctly know what encoding stdin is
supposed to be. On Windows, that's easy; it's UTF-16, we re-encode it
to UTF-8 on startup in Git for Windows. But on other platforms, I have
no clue.

But isn't UTF-8 constructed to be very unlikely to clash with existing
encodings? If so, I could add a case for non-ascii and non-UTF-8, that
simply writes the byte as a hex-tuple?

> 2) The non-ascii sequence is NOT valid UTF-8, then if I read correctly
>    (I didn't test) utf8_width would set next to NULL, and then you are
>    in big trouble.

Outch. Yeah, you are right; this is not good at all :)

But I guess the solution above should fix this as well, no?

  reply	other threads:[~2013-02-11 13:58 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-11 13:34 [PATH/RFC] parse-options: report invalid UTF-8 switches Erik Faye-Lund
2013-02-11 13:43 ` Matthieu Moy
2013-02-11 13:57   ` Erik Faye-Lund [this message]
2013-02-11 14:05     ` Matthieu Moy
2013-02-11 14:27       ` Erik Faye-Lund
2013-02-11 16:28 ` Torsten Bögershausen
2013-02-11 16:36   ` Erik Faye-Lund
2013-02-11 17:04     ` Torsten Bögershausen
2013-02-11 17:07 ` Junio C Hamano
2013-02-11 17:15   ` Erik Faye-Lund
2013-02-11 17:19   ` Jeff King
2013-02-11 17:21     ` Erik Faye-Lund
2013-02-11 17:54     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPQNSbYCrdSP5rWbfLX==u--bJpQo6A6sNE46a1RuU-fMDiWg@mail.gmail.com' \
    --to=kusmabite@gmail.com \
    --cc=Matthieu.Moy@grenoble-inp.fr \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).