All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Masahiro Yamada <masahiroy@kernel.org>
Cc: git@vger.kernel.org, "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: [PATCH 4/5] wildmatch: use char instead of uchar
Date: Fri, 10 Feb 2023 14:09:34 +0100	[thread overview]
Message-ID: <230210.86a61lwtq7.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <20230210075939.44949-5-masahiroy@kernel.org>


On Fri, Feb 10 2023, Masahiro Yamada wrote:

> dowild() casts (char *) and (uchar *) back-and-forth, which is
> ugly.
>
> This file was imported from rsync, which started to use (unsigned char)
> since the following commit:
>
>  | commit e11c42511903adc6d27cf1671cc76fa711ea37e5
>  | Author: Wayne Davison <wayned@samba.org>
>  | Date:   Sun Jul 6 04:33:54 2003 +0000
>  |
>  |     - Added [:class:] handling to the character-class code.
>  |     - Use explicit unsigned characters for proper set checks.
>  |     - Made the character-class code honor backslash escapes.
>  |     - Accept '^' as a class-negation character in addition to '!'.
>
> Perhaps, it was needed because rsync relies on is*() from <ctypes.h>.
>
> GIT has its own implementations, so the behavior is clear.
>
> In fact, commit 4546738b58a0 ("Unlocalized isspace and friends")
> says one of the motivations is "we want the right signed behaviour".
>
> sane_istest() casts the given character to (unsigned char) anyway
> before sane_ctype[] table lookup, so dowild() can use 'char'.
>
> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> ---
>
>  wildmatch.c | 24 +++++++++++-------------
>  1 file changed, 11 insertions(+), 13 deletions(-)
>
> diff --git a/wildmatch.c b/wildmatch.c
> index 93800b8eac..7dffd783cb 100644
> --- a/wildmatch.c
> +++ b/wildmatch.c
> @@ -12,21 +12,19 @@
>  #include "cache.h"
>  #include "wildmatch.h"
>  
> -typedef unsigned char uchar;
> -
>  #define CC_EQ(class, len, litmatch) ((len) == sizeof (litmatch)-1 \
>  				    && *(class) == *(litmatch) \
> -				    && strncmp((char*)class, litmatch, len) == 0)
> +				    && strncmp(class, litmatch, len) == 0)
>  
>  /* Match pattern "p" against "text" */
> -static int dowild(const uchar *p, const uchar *text, unsigned int flags)
> +static int dowild(const char *p, const char *text, unsigned int flags)
>  {
> -	uchar p_ch;
> -	const uchar *pattern = p;
> +	char p_ch;
> +	const char *pattern = p;
>  
>  	for ( ; (p_ch = *p) != '\0'; text++, p++) {
>  		int matched, match_slash, negated;
> -		uchar t_ch, prev_ch;
> +		char t_ch, prev_ch;
>  		if ((t_ch = *text) == '\0' && p_ch != '*')
>  			return WM_ABORT_ALL;
>  		if ((flags & WM_CASEFOLD) && isupper(t_ch))
> @@ -50,7 +48,7 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
>  			continue;
>  		case '*':
>  			if (*++p == '*') {
> -				const uchar *prev_p = p - 2;
> +				const char *prev_p = p - 2;
>  				while (*++p == '*') {}
>  				if (!(flags & WM_PATHNAME))
>  					/* without WM_PATHNAME, '*' == '**' */
> @@ -90,10 +88,10 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
>  				 * with WM_PATHNAME matches the next
>  				 * directory
>  				 */
> -				const char *slash = strchr((char*)text, '/');
> +				const char *slash = strchr(text, '/');
>  				if (!slash)
>  					return WM_NOMATCH;
> -				text = (const uchar*)slash;
> +				text = slash;
>  				/* the slash is consumed by the top-level for loop */
>  				break;
>  			}
> @@ -160,13 +158,13 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
>  					if (t_ch <= p_ch && t_ch >= prev_ch)
>  						matched = 1;
>  					else if ((flags & WM_CASEFOLD) && islower(t_ch)) {
> -						uchar t_ch_upper = toupper(t_ch);
> +						char t_ch_upper = toupper(t_ch);
>  						if (t_ch_upper <= p_ch && t_ch_upper >= prev_ch)
>  							matched = 1;
>  					}
>  					p_ch = 0; /* This makes "prev_ch" get set to 0. */
>  				} else if (p_ch == '[' && p[1] == ':') {
> -					const uchar *s;
> +					const char *s;
>  					int i;
>  					for (s = p += 2; (p_ch = *p) && p_ch != ']'; p++) {} /*SHARED ITERATOR*/
>  					if (!p_ch)
> @@ -237,5 +235,5 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
>  /* Match the "pattern" against the "text" string. */
>  int wildmatch(const char *pattern, const char *text, unsigned int flags)
>  {
> -	return dowild((const uchar*)pattern, (const uchar*)text, flags);
> +	return dowild(pattern, text, flags);
>  }

This looks good to me. I independently wrote much the same a while ago
for another reason, in: https://github.com/avar/git/commit/079f555375a

I.e. this happens to be the only bit in-tree that's stopping us from
running the xlc compiler in the c99 mode.

My solution was different, but I like yours better. I had not done your
analysis to discover that we didn't need this to be unsigned in the
first place, I merly converted the "uchar" to an "unsigned char".


  reply	other threads:[~2023-02-10 13:11 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-10  7:59 [PATCH 0/5] Clean up wildmatch.c Masahiro Yamada
2023-02-10  7:59 ` [PATCH 1/5] git-compat-util: add isblank() and isgraph() Masahiro Yamada
2023-02-10 13:16   ` Ævar Arnfjörð Bjarmason
2023-02-10 16:56     ` Masahiro Yamada
2023-02-10 19:10   ` Junio C Hamano
2023-02-10 19:25     ` Masahiro Yamada
2023-02-10 22:03   ` René Scharfe
2023-02-11  7:01     ` Masahiro Yamada
2023-02-11 13:48       ` René Scharfe
2023-02-11 14:11         ` René Scharfe
2023-02-10  7:59 ` [PATCH 2/5] wildmatch: remove IS*() macros Masahiro Yamada
2023-02-10  7:59 ` [PATCH 3/5] wildmatch: remove NEGATE_CLASS and NEGATE_CLASS2 macros Masahiro Yamada
2023-02-10 13:11   ` Ævar Arnfjörð Bjarmason
2023-02-10 17:03     ` Masahiro Yamada
2023-02-10  7:59 ` [PATCH 4/5] wildmatch: use char instead of uchar Masahiro Yamada
2023-02-10 13:09   ` Ævar Arnfjörð Bjarmason [this message]
2023-02-10  7:59 ` [PATCH 5/5] wildmatch: more cleanups after killing uchar Masahiro Yamada

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=230210.86a61lwtq7.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=masahiroy@kernel.org \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.