git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Masahiro Yamada <masahiroy@kernel.org>
Cc: git@vger.kernel.org, "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: [PATCH 4/5] wildmatch: use char instead of uchar
Date: Fri, 10 Feb 2023 14:09:34 +0100	[thread overview]
Message-ID: <230210.86a61lwtq7.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <20230210075939.44949-5-masahiroy@kernel.org>


On Fri, Feb 10 2023, Masahiro Yamada wrote:

> dowild() casts (char *) and (uchar *) back-and-forth, which is
> ugly.
>
> This file was imported from rsync, which started to use (unsigned char)
> since the following commit:
>
>  | commit e11c42511903adc6d27cf1671cc76fa711ea37e5
>  | Author: Wayne Davison <wayned@samba.org>
>  | Date:   Sun Jul 6 04:33:54 2003 +0000
>  |
>  |     - Added [:class:] handling to the character-class code.
>  |     - Use explicit unsigned characters for proper set checks.
>  |     - Made the character-class code honor backslash escapes.
>  |     - Accept '^' as a class-negation character in addition to '!'.
>
> Perhaps, it was needed because rsync relies on is*() from <ctypes.h>.
>
> GIT has its own implementations, so the behavior is clear.
>
> In fact, commit 4546738b58a0 ("Unlocalized isspace and friends")
> says one of the motivations is "we want the right signed behaviour".
>
> sane_istest() casts the given character to (unsigned char) anyway
> before sane_ctype[] table lookup, so dowild() can use 'char'.
>
> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> ---
>
>  wildmatch.c | 24 +++++++++++-------------
>  1 file changed, 11 insertions(+), 13 deletions(-)
>
> diff --git a/wildmatch.c b/wildmatch.c
> index 93800b8eac..7dffd783cb 100644
> --- a/wildmatch.c
> +++ b/wildmatch.c
> @@ -12,21 +12,19 @@
>  #include "cache.h"
>  #include "wildmatch.h"
>  
> -typedef unsigned char uchar;
> -
>  #define CC_EQ(class, len, litmatch) ((len) == sizeof (litmatch)-1 \
>  				    && *(class) == *(litmatch) \
> -				    && strncmp((char*)class, litmatch, len) == 0)
> +				    && strncmp(class, litmatch, len) == 0)
>  
>  /* Match pattern "p" against "text" */
> -static int dowild(const uchar *p, const uchar *text, unsigned int flags)
> +static int dowild(const char *p, const char *text, unsigned int flags)
>  {
> -	uchar p_ch;
> -	const uchar *pattern = p;
> +	char p_ch;
> +	const char *pattern = p;
>  
>  	for ( ; (p_ch = *p) != '\0'; text++, p++) {
>  		int matched, match_slash, negated;
> -		uchar t_ch, prev_ch;
> +		char t_ch, prev_ch;
>  		if ((t_ch = *text) == '\0' && p_ch != '*')
>  			return WM_ABORT_ALL;
>  		if ((flags & WM_CASEFOLD) && isupper(t_ch))
> @@ -50,7 +48,7 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
>  			continue;
>  		case '*':
>  			if (*++p == '*') {
> -				const uchar *prev_p = p - 2;
> +				const char *prev_p = p - 2;
>  				while (*++p == '*') {}
>  				if (!(flags & WM_PATHNAME))
>  					/* without WM_PATHNAME, '*' == '**' */
> @@ -90,10 +88,10 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
>  				 * with WM_PATHNAME matches the next
>  				 * directory
>  				 */
> -				const char *slash = strchr((char*)text, '/');
> +				const char *slash = strchr(text, '/');
>  				if (!slash)
>  					return WM_NOMATCH;
> -				text = (const uchar*)slash;
> +				text = slash;
>  				/* the slash is consumed by the top-level for loop */
>  				break;
>  			}
> @@ -160,13 +158,13 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
>  					if (t_ch <= p_ch && t_ch >= prev_ch)
>  						matched = 1;
>  					else if ((flags & WM_CASEFOLD) && islower(t_ch)) {
> -						uchar t_ch_upper = toupper(t_ch);
> +						char t_ch_upper = toupper(t_ch);
>  						if (t_ch_upper <= p_ch && t_ch_upper >= prev_ch)
>  							matched = 1;
>  					}
>  					p_ch = 0; /* This makes "prev_ch" get set to 0. */
>  				} else if (p_ch == '[' && p[1] == ':') {
> -					const uchar *s;
> +					const char *s;
>  					int i;
>  					for (s = p += 2; (p_ch = *p) && p_ch != ']'; p++) {} /*SHARED ITERATOR*/
>  					if (!p_ch)
> @@ -237,5 +235,5 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
>  /* Match the "pattern" against the "text" string. */
>  int wildmatch(const char *pattern, const char *text, unsigned int flags)
>  {
> -	return dowild((const uchar*)pattern, (const uchar*)text, flags);
> +	return dowild(pattern, text, flags);
>  }

This looks good to me. I independently wrote much the same a while ago
for another reason, in: https://github.com/avar/git/commit/079f555375a

I.e. this happens to be the only bit in-tree that's stopping us from
running the xlc compiler in the c99 mode.

My solution was different, but I like yours better. I had not done your
analysis to discover that we didn't need this to be unsigned in the
first place, I merly converted the "uchar" to an "unsigned char".


  reply	other threads:[~2023-02-10 13:11 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-10  7:59 [PATCH 0/5] Clean up wildmatch.c Masahiro Yamada
2023-02-10  7:59 ` [PATCH 1/5] git-compat-util: add isblank() and isgraph() Masahiro Yamada
2023-02-10 13:16   ` Ævar Arnfjörð Bjarmason
2023-02-10 16:56     ` Masahiro Yamada
2023-02-10 19:10   ` Junio C Hamano
2023-02-10 19:25     ` Masahiro Yamada
2023-02-10 22:03   ` René Scharfe
2023-02-11  7:01     ` Masahiro Yamada
2023-02-11 13:48       ` René Scharfe
2023-02-11 14:11         ` René Scharfe
2023-02-10  7:59 ` [PATCH 2/5] wildmatch: remove IS*() macros Masahiro Yamada
2023-02-10  7:59 ` [PATCH 3/5] wildmatch: remove NEGATE_CLASS and NEGATE_CLASS2 macros Masahiro Yamada
2023-02-10 13:11   ` Ævar Arnfjörð Bjarmason
2023-02-10 17:03     ` Masahiro Yamada
2023-02-10  7:59 ` [PATCH 4/5] wildmatch: use char instead of uchar Masahiro Yamada
2023-02-10 13:09   ` Ævar Arnfjörð Bjarmason [this message]
2023-02-10  7:59 ` [PATCH 5/5] wildmatch: more cleanups after killing uchar Masahiro Yamada

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=230210.86a61lwtq7.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=masahiroy@kernel.org \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).