All of lore.kernel.org
 help / color / mirror / Atom feed
From: Arnav Bhate <bhatearnav@gmail.com>
To: Karthik Nayak <karthik.188@gmail.com>, git@vger.kernel.org
Subject: Re: [GSoC PATCH] pathspec: fix sign comparison warnings
Date: Thu, 13 Mar 2025 19:50:52 +0530	[thread overview]
Message-ID: <0946673c-72ce-4848-b7f6-edd7a1cc7ff4@gmail.com> (raw)
In-Reply-To: <CAOLa=ZRNJD7NqceGQ4B8ox+50m2q1nU1t29x7O0roc=-_4ww0w@mail.gmail.com>

Karthik Nayak <karthik.188@gmail.com> writes:
> Arnav Bhate <bhatearnav@gmail.com> writes:
> 
>> There are multiple places, especially in loops, where a signed and an
>> unsigned data type are compared. Git uses a mix of signed and unsigned
>> types to store lengths of arrays. This sometimes leads to using a signed
>> index for an array whose length is stored in an unsigned variable or
>> vice versa.
>>
>> Replace signed data types with unsigned data types and vice versa
>> wherever necessary. In some cases, introduce a new variable, where both
>> signed and unsigned data types have been used to store lengths of arrays
>> in the same function, where previously only one variable was used to
>> iterate over both types. In cases where this is not possible, add
>> appropriate cast. Remove #define DISABLE_SIGN_COMPARE_WARNINGS.
>>
>> Signed-off-by: Arnav Bhate <bhatearnav@gmail.com>
>> ---
>>  pathspec.c | 26 +++++++++++++-------------
>>  1 file changed, 13 insertions(+), 13 deletions(-)
>>
>> diff --git a/pathspec.c b/pathspec.c
>> index 89663645e1..fd7dfdfd84 100644
>> --- a/pathspec.c
>> +++ b/pathspec.c
>> @@ -1,5 +1,4 @@
>>  #define USE_THE_REPOSITORY_VARIABLE
>> -#define DISABLE_SIGN_COMPARE_WARNINGS
>>
>>  #include "git-compat-util.h"
>>  #include "abspath.h"
>> @@ -36,6 +35,7 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
>>  					enum ps_skip_worktree_action sw_action)
>>  {
>>  	int num_unmatched = 0, i;
>> +	unsigned int j;
>>
>>  	/*
>>  	 * Since we are walking the index as if we were walking the directory,
>> @@ -48,8 +48,8 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
>>  			num_unmatched++;
>>  	if (!num_unmatched)
>>  		return;
>> -	for (i = 0; i < istate->cache_nr; i++) {
>> -		const struct cache_entry *ce = istate->cache[i];
>> +	for (j = 0; j < istate->cache_nr; j++) {
>> +		const struct cache_entry *ce = istate->cache[j];
>>  		if (sw_action == PS_IGNORE_SKIP_WORKTREE &&
>>  		    (ce_skip_worktree(ce) || !path_in_sparse_checkout(ce->name, istate)))
>>  			continue;
> 
> While this is correct, now we have 'i' & 'j' as iteration variables,
> generally this is used in O(n^2) loops to define the outer and inner
> loops. Here, however, we use it to simply define two different types. I
> find this deviation from convention a little confusing.
> 
> Perhaps, we could simply utilize the option of intializing loop
> variables in the loop itself?
> 
>   diff --git a/pathspec.c b/pathspec.c
>   index 89663645e1..ff8854afb8 100644
>   --- a/pathspec.c
>   +++ b/pathspec.c
>   @@ -35,7 +35,7 @@ void add_pathspec_matches_against_index(const
> struct pathspec *pathspec,
>    					char *seen,
>    					enum ps_skip_worktree_action sw_action)
>    {
>   -	int num_unmatched = 0, i;
>   +	int num_unmatched = 0;
> 
>    	/*
>    	 * Since we are walking the index as if we were walking the directory,
>   @@ -43,12 +43,12 @@ void add_pathspec_matches_against_index(const
> struct pathspec *pathspec,
>    	 * mistakenly think that the user gave a pathspec that did not match
>    	 * anything.
>    	 */
>   -	for (i = 0; i < pathspec->nr; i++)
>   +	for (int i = 0; i < pathspec->nr; i++)
>    		if (!seen[i])
>    			num_unmatched++;
>    	if (!num_unmatched)
>    		return;
>   -	for (i = 0; i < istate->cache_nr; i++) {
>   +	for (unsigned int i = 0; i < istate->cache_nr; i++) {
>    		const struct cache_entry *ce = istate->cache[i];
>    		if (sw_action == PS_IGNORE_SKIP_WORKTREE &&
>    		    (ce_skip_worktree(ce) || !path_in_sparse_checkout(ce->name, istate)))
> 
> This would read much cleaner and also avoid two different loop
> variables. WDYT?

We could certainly do that. My impression was that the convention was
not to do so.

> Also a bigger question is, shouldn't the type of `pathspec.nr` and
> 'istate.cache_nr' be the actual change required? Shouldn't they be set
> to 'size_t'?

I tried that first and found that it required making a large number of
changes spread over many files. As noted in my commit message, both
signed and unsigned types are used at different places for this
purpose.

>> @@ -78,7 +78,7 @@ char *find_pathspecs_matching_skip_worktree(const struct pathspec *pathspec)
>>  {
>>  	struct index_state *istate = the_repository->index;
>>  	char *seen = xcalloc(pathspec->nr, 1);
>> -	int i;
>> +	unsigned int i;
>>
> 
> Nit: We could also drop this and move the initialization to the line
> below.
> 
>>  	for (i = 0; i < istate->cache_nr; i++) {
>>  		struct cache_entry *ce = istate->cache[i];
>> @@ -130,7 +130,7 @@ static void prefix_magic(struct strbuf *sb, int prefixlen,
>>  	if (element[1] != '(') {
>>  		/* Process an element in shorthand form (e.g. ":!/<match>") */
>>  		strbuf_addstr(sb, ":(");
>> -		for (int i = 0; i < ARRAY_SIZE(pathspec_magic); i++) {
>> +		for (unsigned int i = 0; i < ARRAY_SIZE(pathspec_magic); i++) {
>>  			if ((magic & pathspec_magic[i].bit) &&
>>  			    pathspec_magic[i].mnemonic) {
>>  				if (sb->buf[sb->len - 1] != '(')
> 
> Shouldn't we use 'size_t' for this, since we're iterating over the
> elements of an array?

We can use size_t there.

> 
>> @@ -341,7 +341,7 @@ static const char *parse_long_magic(unsigned *magic, int *prefix_len,
>>
>>  	for (pos = elem + 2; *pos && *pos != ')'; pos = nextat) {
>>  		size_t len = strcspn_escaped(pos, ",)");
>> -		int i;
>> +		unsigned int i;
>>
> 
> This too should be 'size_t'.
> 
>>  		if (pos[len] == ',')
>>  			nextat = pos + len + 1; /* handle ',' */
>> @@ -354,7 +354,7 @@ static const char *parse_long_magic(unsigned *magic, int *prefix_len,
>>  		if (starts_with(pos, "prefix:")) {
>>  			char *endptr;
>>  			*prefix_len = strtol(pos + 7, &endptr, 10);
>> -			if (endptr - pos != len)
>> +			if ((size_t)(endptr - pos) != len)
>>  				die(_("invalid parameter for pathspec magic 'prefix'"));
>>  			continue;
>>  		}
> 
> This makes sense. But is it guaranteed that `endptr - pos` is greater
> than 0?

endptr - pos will be greater than or equal to zero, as endptr is set by
strtol

>> @@ -400,7 +400,7 @@ static const char *parse_short_magic(unsigned *magic, const char *elem)
>>
>>  	for (pos = elem + 1; *pos && *pos != ':'; pos++) {
>>  		char ch = *pos;
>> -		int i;
>> +		unsigned int i;
>>
> 
> This too, should be 'size_t'
> 
>>  		/* Special case alias for '!' */
>>  		if (ch == '^') {
>> @@ -564,7 +564,7 @@ static int pathspec_item_cmp(const void *a_, const void *b_)
>>
>>  void pathspec_magic_names(unsigned magic, struct strbuf *out)
>>  {
>> -	int i;
>> +	unsigned int i;
> 
> This can be inlined and made 'size_t'.
> 
>>  	for (i = 0; i < ARRAY_SIZE(pathspec_magic); i++) {
>>  		const struct pathspec_magic *m = pathspec_magic + i;
>>  		if (!(magic & m->bit))
>> @@ -803,8 +803,8 @@ int match_pathspec_attrs(struct index_state *istate,
>>  int pathspec_needs_expanded_index(struct index_state *istate,
>>  				  const struct pathspec *pathspec)
>>  {
>> -	unsigned int i, pos;
>> -	int res = 0;
>> +	unsigned int pos;
>> +	int i, res = 0;
>>  	char *skip_worktree_seen = NULL;
>>
> 
> This can be inlined, but this change is done to match 'pathspec.nr''s
> type. This goes to my earlier question, I would say we first need to
> modify 'pathspec.nr' itself to be 'size_t'.
> 
>>  	/*
>> @@ -845,7 +845,7 @@ int pathspec_needs_expanded_index(struct index_state *istate,
>>  			 * - not-in-cone/bar*: may need expanded index
>>  			 * - **.c: may need expanded index
>>  			 */
>> -			if (strspn(item.original + item.nowildcard_len, "*") == item.len - item.nowildcard_len &&
>> +			if (strspn(item.original + item.nowildcard_len, "*") == (unsigned int)(item.len - item.nowildcard_len) &&
>>  			    path_in_cone_mode_sparse_checkout(item.original, istate))
>>  				continue;
>>
> 
> Similar here, I see the types of 'item.len' and 'item.nowwildcard_len'
> are 'int'. Do they need to be 'size_t'?

Same as above, will require a large number of changes.
 
>> @@ -860,7 +860,7 @@ int pathspec_needs_expanded_index(struct index_state *istate,
>>  				 * directory name and the sparse directory is the first
>>  				 * component of the pathspec, need to expand the index.
>>  				 */
>> -				if (item.nowildcard_len > ce_namelen(ce) &&
>> +				if ((unsigned int)item.nowildcard_len > ce_namelen(ce) &&
>>  				    !strncmp(item.original, ce->name, ce_namelen(ce))) {
>>  					res = 1;
>>  					break;
>> --
>> 2.48.1
> 
> Same question as above!

-- 
Regards,
Arnav Bhate
(He/Him)


  reply	other threads:[~2025-03-13 14:20 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-12 20:20 [GSoC PATCH] pathspec: fix sign comparison warnings Arnav Bhate
2025-03-13 12:52 ` Karthik Nayak
2025-03-13 14:20   ` Arnav Bhate [this message]
2025-03-13 14:44     ` Karthik Nayak
2025-03-13 15:33   ` Junio C Hamano
2025-03-13 15:50     ` Karthik Nayak
2025-03-16 10:14     ` Arnav Bhate
2025-03-29  6:17 ` [GSoC PATCH v2] " Arnav Bhate
2025-03-29  6:18   ` Arnav Bhate
2025-03-29 18:36   ` Junio C Hamano
2025-03-30 17:45     ` [GSoC PATCH v3] " Arnav Bhate

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0946673c-72ce-4848-b7f6-edd7a1cc7ff4@gmail.com \
    --to=bhatearnav@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=karthik.188@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.