All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] utf8.c: fix strbuf_utf8_replace copying the last NUL to dst string
Date: Wed, 30 Jul 2014 11:23:24 -0700	[thread overview]
Message-ID: <xmqqha1yzowz.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <20140730102022.GA5653@lanh> (Duy Nguyen's message of "Wed, 30 Jul 2014 17:20:22 +0700")

Duy Nguyen <pclouds@gmail.com> writes:

>> > it returns 0 and steps 'src' by one. 
>> 
>> Here "it" refers to utf8_width()?  Who steps 'src' by one?
>
> utf8_width() steps 'src'.
>
>> 
>> Ahh, did you mean *src == NUL, i.e. "already at the end of the
>> string"?
>
> Yes.. I guess you have a better commit message prepared for me now :)

Heh.  At least you realized that the log message was uninformative ;-)

>> I think utf8_width() called with an empty string should not move the
>> pointer past that end-of-string NUL in the first place.  It makes me
>> wonder if it would be a better fix to make it not to do that (and
>> return 0), but if we declare it is the caller's fault, perhaps we
>> may want to add
>> 
>> 	if (!**start)
>>         	die("BUG: empty string to utf8_width()???");
>> 
>> at the very beginning of utf8_width(), even before it calls
>> pick-one-utf8-char.
>> 
>> Still puzzled...
>
> I don't know. I mean, in a UTF-8 byte sequence, NUL is a valid
> character (part of the ASCII subset), so advancing '*start' by one
> makes sense.

Dubious.  NUL may be valid but it is valid only in the sense that it
will mark the end of the string, i.e. the caller is expected to do:

        const char *string = ...;

        while (not reached the end of the string) {
                this_char = pick_one_char(&string);
                do something to this_char;
        }

and there are two ways to structure such a loop:

 (1) Make pick-one-char signal the termination condition.  i.e.

	while ((this_char = pick_one(&string)) != EOF)
		do something to this_char;

     That's a typical getchar()-style loop.  It would better not
     advance string when it returns EOF.

 (2) Make it the responsibility of the caller not to call beyond the
     end. i.e.

	while (string < end) {
        	this_char = pick_one_char(&string);
		do something to this char;
	}

and your patch takes the latter, which I think is in line with the
way how existing callchain works.  So the addition of "BUG" merely
is to make sure that everybody follows that same convention.

We cannot declare that it is caller's responsibility to feed
sensible input to the function only at one callsite and allow other
callsites feed garbage to the same function, after all, no?

  reply	other threads:[~2014-07-30 18:23 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-29 13:10 [PATCH] utf8.c: fix strbuf_utf8_replace copying the last NUL to dst string Nguyễn Thái Ngọc Duy
2014-07-29 19:56 ` Junio C Hamano
2014-07-30 10:20   ` Duy Nguyen
2014-07-30 18:23     ` Junio C Hamano [this message]
2014-08-10  7:05 ` [PATCH v2] utf8.c: fix strbuf_utf8_replace() consuming data beyond input string Nguyễn Thái Ngọc Duy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqha1yzowz.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.