git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arnav Bhate <bhatearnav@gmail.com>
To: Karthik Nayak <karthik.188@gmail.com>, git@vger.kernel.org
Subject: Re: [GSoC PATCH] pathspec: fix sign comparison warnings
Date: Thu, 13 Mar 2025 19:50:52 +0530	[thread overview]
Message-ID: <0946673c-72ce-4848-b7f6-edd7a1cc7ff4@gmail.com> (raw)
In-Reply-To: <CAOLa=ZRNJD7NqceGQ4B8ox+50m2q1nU1t29x7O0roc=-_4ww0w@mail.gmail.com>

Karthik Nayak <karthik.188@gmail.com> writes:
> Arnav Bhate <bhatearnav@gmail.com> writes:
> 
>> There are multiple places, especially in loops, where a signed and an
>> unsigned data type are compared. Git uses a mix of signed and unsigned
>> types to store lengths of arrays. This sometimes leads to using a signed
>> index for an array whose length is stored in an unsigned variable or
>> vice versa.
>>
>> Replace signed data types with unsigned data types and vice versa
>> wherever necessary. In some cases, introduce a new variable, where both
>> signed and unsigned data types have been used to store lengths of arrays
>> in the same function, where previously only one variable was used to
>> iterate over both types. In cases where this is not possible, add
>> appropriate cast. Remove #define DISABLE_SIGN_COMPARE_WARNINGS.
>>
>> Signed-off-by: Arnav Bhate <bhatearnav@gmail.com>
>> ---
>>  pathspec.c | 26 +++++++++++++-------------
>>  1 file changed, 13 insertions(+), 13 deletions(-)
>>
>> diff --git a/pathspec.c b/pathspec.c
>> index 89663645e1..fd7dfdfd84 100644
>> --- a/pathspec.c
>> +++ b/pathspec.c
>> @@ -1,5 +1,4 @@
>>  #define USE_THE_REPOSITORY_VARIABLE
>> -#define DISABLE_SIGN_COMPARE_WARNINGS
>>
>>  #include "git-compat-util.h"
>>  #include "abspath.h"
>> @@ -36,6 +35,7 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
>>  					enum ps_skip_worktree_action sw_action)
>>  {
>>  	int num_unmatched = 0, i;
>> +	unsigned int j;
>>
>>  	/*
>>  	 * Since we are walking the index as if we were walking the directory,
>> @@ -48,8 +48,8 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
>>  			num_unmatched++;
>>  	if (!num_unmatched)
>>  		return;
>> -	for (i = 0; i < istate->cache_nr; i++) {
>> -		const struct cache_entry *ce = istate->cache[i];
>> +	for (j = 0; j < istate->cache_nr; j++) {
>> +		const struct cache_entry *ce = istate->cache[j];
>>  		if (sw_action == PS_IGNORE_SKIP_WORKTREE &&
>>  		    (ce_skip_worktree(ce) || !path_in_sparse_checkout(ce->name, istate)))
>>  			continue;
> 
> While this is correct, now we have 'i' & 'j' as iteration variables,
> generally this is used in O(n^2) loops to define the outer and inner
> loops. Here, however, we use it to simply define two different types. I
> find this deviation from convention a little confusing.
> 
> Perhaps, we could simply utilize the option of intializing loop
> variables in the loop itself?
> 
>   diff --git a/pathspec.c b/pathspec.c
>   index 89663645e1..ff8854afb8 100644
>   --- a/pathspec.c
>   +++ b/pathspec.c
>   @@ -35,7 +35,7 @@ void add_pathspec_matches_against_index(const
> struct pathspec *pathspec,
>    					char *seen,
>    					enum ps_skip_worktree_action sw_action)
>    {
>   -	int num_unmatched = 0, i;
>   +	int num_unmatched = 0;
> 
>    	/*
>    	 * Since we are walking the index as if we were walking the directory,
>   @@ -43,12 +43,12 @@ void add_pathspec_matches_against_index(const
> struct pathspec *pathspec,
>    	 * mistakenly think that the user gave a pathspec that did not match
>    	 * anything.
>    	 */
>   -	for (i = 0; i < pathspec->nr; i++)
>   +	for (int i = 0; i < pathspec->nr; i++)
>    		if (!seen[i])
>    			num_unmatched++;
>    	if (!num_unmatched)
>    		return;
>   -	for (i = 0; i < istate->cache_nr; i++) {
>   +	for (unsigned int i = 0; i < istate->cache_nr; i++) {
>    		const struct cache_entry *ce = istate->cache[i];
>    		if (sw_action == PS_IGNORE_SKIP_WORKTREE &&
>    		    (ce_skip_worktree(ce) || !path_in_sparse_checkout(ce->name, istate)))
> 
> This would read much cleaner and also avoid two different loop
> variables. WDYT?

We could certainly do that. My impression was that the convention was
not to do so.

> Also a bigger question is, shouldn't the type of `pathspec.nr` and
> 'istate.cache_nr' be the actual change required? Shouldn't they be set
> to 'size_t'?

I tried that first and found that it required making a large number of
changes spread over many files. As noted in my commit message, both
signed and unsigned types are used at different places for this
purpose.

>> @@ -78,7 +78,7 @@ char *find_pathspecs_matching_skip_worktree(const struct pathspec *pathspec)
>>  {
>>  	struct index_state *istate = the_repository->index;
>>  	char *seen = xcalloc(pathspec->nr, 1);
>> -	int i;
>> +	unsigned int i;
>>
> 
> Nit: We could also drop this and move the initialization to the line
> below.
> 
>>  	for (i = 0; i < istate->cache_nr; i++) {
>>  		struct cache_entry *ce = istate->cache[i];
>> @@ -130,7 +130,7 @@ static void prefix_magic(struct strbuf *sb, int prefixlen,
>>  	if (element[1] != '(') {
>>  		/* Process an element in shorthand form (e.g. ":!/<match>") */
>>  		strbuf_addstr(sb, ":(");
>> -		for (int i = 0; i < ARRAY_SIZE(pathspec_magic); i++) {
>> +		for (unsigned int i = 0; i < ARRAY_SIZE(pathspec_magic); i++) {
>>  			if ((magic & pathspec_magic[i].bit) &&
>>  			    pathspec_magic[i].mnemonic) {
>>  				if (sb->buf[sb->len - 1] != '(')
> 
> Shouldn't we use 'size_t' for this, since we're iterating over the
> elements of an array?

We can use size_t there.

> 
>> @@ -341,7 +341,7 @@ static const char *parse_long_magic(unsigned *magic, int *prefix_len,
>>
>>  	for (pos = elem + 2; *pos && *pos != ')'; pos = nextat) {
>>  		size_t len = strcspn_escaped(pos, ",)");
>> -		int i;
>> +		unsigned int i;
>>
> 
> This too should be 'size_t'.
> 
>>  		if (pos[len] == ',')
>>  			nextat = pos + len + 1; /* handle ',' */
>> @@ -354,7 +354,7 @@ static const char *parse_long_magic(unsigned *magic, int *prefix_len,
>>  		if (starts_with(pos, "prefix:")) {
>>  			char *endptr;
>>  			*prefix_len = strtol(pos + 7, &endptr, 10);
>> -			if (endptr - pos != len)
>> +			if ((size_t)(endptr - pos) != len)
>>  				die(_("invalid parameter for pathspec magic 'prefix'"));
>>  			continue;
>>  		}
> 
> This makes sense. But is it guaranteed that `endptr - pos` is greater
> than 0?

endptr - pos will be greater than or equal to zero, as endptr is set by
strtol

>> @@ -400,7 +400,7 @@ static const char *parse_short_magic(unsigned *magic, const char *elem)
>>
>>  	for (pos = elem + 1; *pos && *pos != ':'; pos++) {
>>  		char ch = *pos;
>> -		int i;
>> +		unsigned int i;
>>
> 
> This too, should be 'size_t'
> 
>>  		/* Special case alias for '!' */
>>  		if (ch == '^') {
>> @@ -564,7 +564,7 @@ static int pathspec_item_cmp(const void *a_, const void *b_)
>>
>>  void pathspec_magic_names(unsigned magic, struct strbuf *out)
>>  {
>> -	int i;
>> +	unsigned int i;
> 
> This can be inlined and made 'size_t'.
> 
>>  	for (i = 0; i < ARRAY_SIZE(pathspec_magic); i++) {
>>  		const struct pathspec_magic *m = pathspec_magic + i;
>>  		if (!(magic & m->bit))
>> @@ -803,8 +803,8 @@ int match_pathspec_attrs(struct index_state *istate,
>>  int pathspec_needs_expanded_index(struct index_state *istate,
>>  				  const struct pathspec *pathspec)
>>  {
>> -	unsigned int i, pos;
>> -	int res = 0;
>> +	unsigned int pos;
>> +	int i, res = 0;
>>  	char *skip_worktree_seen = NULL;
>>
> 
> This can be inlined, but this change is done to match 'pathspec.nr''s
> type. This goes to my earlier question, I would say we first need to
> modify 'pathspec.nr' itself to be 'size_t'.
> 
>>  	/*
>> @@ -845,7 +845,7 @@ int pathspec_needs_expanded_index(struct index_state *istate,
>>  			 * - not-in-cone/bar*: may need expanded index
>>  			 * - **.c: may need expanded index
>>  			 */
>> -			if (strspn(item.original + item.nowildcard_len, "*") == item.len - item.nowildcard_len &&
>> +			if (strspn(item.original + item.nowildcard_len, "*") == (unsigned int)(item.len - item.nowildcard_len) &&
>>  			    path_in_cone_mode_sparse_checkout(item.original, istate))
>>  				continue;
>>
> 
> Similar here, I see the types of 'item.len' and 'item.nowwildcard_len'
> are 'int'. Do they need to be 'size_t'?

Same as above, will require a large number of changes.
 
>> @@ -860,7 +860,7 @@ int pathspec_needs_expanded_index(struct index_state *istate,
>>  				 * directory name and the sparse directory is the first
>>  				 * component of the pathspec, need to expand the index.
>>  				 */
>> -				if (item.nowildcard_len > ce_namelen(ce) &&
>> +				if ((unsigned int)item.nowildcard_len > ce_namelen(ce) &&
>>  				    !strncmp(item.original, ce->name, ce_namelen(ce))) {
>>  					res = 1;
>>  					break;
>> --
>> 2.48.1
> 
> Same question as above!

-- 
Regards,
Arnav Bhate
(He/Him)


  reply	other threads:[~2025-03-13 14:20 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-12 20:20 [GSoC PATCH] pathspec: fix sign comparison warnings Arnav Bhate
2025-03-13 12:52 ` Karthik Nayak
2025-03-13 14:20   ` Arnav Bhate [this message]
2025-03-13 14:44     ` Karthik Nayak
2025-03-13 15:33   ` Junio C Hamano
2025-03-13 15:50     ` Karthik Nayak
2025-03-16 10:14     ` Arnav Bhate
2025-03-29  6:17 ` [GSoC PATCH v2] " Arnav Bhate
2025-03-29  6:18   ` Arnav Bhate
2025-03-29 18:36   ` Junio C Hamano
2025-03-30 17:45     ` [GSoC PATCH v3] " Arnav Bhate

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0946673c-72ce-4848-b7f6-edd7a1cc7ff4@gmail.com \
    --to=bhatearnav@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=karthik.188@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).