From: Arnav Bhate <bhatearnav@gmail.com>
To: Karthik Nayak <karthik.188@gmail.com>, git@vger.kernel.org
Subject: Re: [GSoC PATCH] pathspec: fix sign comparison warnings
Date: Thu, 13 Mar 2025 19:50:52 +0530 [thread overview]
Message-ID: <0946673c-72ce-4848-b7f6-edd7a1cc7ff4@gmail.com> (raw)
In-Reply-To: <CAOLa=ZRNJD7NqceGQ4B8ox+50m2q1nU1t29x7O0roc=-_4ww0w@mail.gmail.com>
Karthik Nayak <karthik.188@gmail.com> writes:
> Arnav Bhate <bhatearnav@gmail.com> writes:
>
>> There are multiple places, especially in loops, where a signed and an
>> unsigned data type are compared. Git uses a mix of signed and unsigned
>> types to store lengths of arrays. This sometimes leads to using a signed
>> index for an array whose length is stored in an unsigned variable or
>> vice versa.
>>
>> Replace signed data types with unsigned data types and vice versa
>> wherever necessary. In some cases, introduce a new variable, where both
>> signed and unsigned data types have been used to store lengths of arrays
>> in the same function, where previously only one variable was used to
>> iterate over both types. In cases where this is not possible, add
>> appropriate cast. Remove #define DISABLE_SIGN_COMPARE_WARNINGS.
>>
>> Signed-off-by: Arnav Bhate <bhatearnav@gmail.com>
>> ---
>> pathspec.c | 26 +++++++++++++-------------
>> 1 file changed, 13 insertions(+), 13 deletions(-)
>>
>> diff --git a/pathspec.c b/pathspec.c
>> index 89663645e1..fd7dfdfd84 100644
>> --- a/pathspec.c
>> +++ b/pathspec.c
>> @@ -1,5 +1,4 @@
>> #define USE_THE_REPOSITORY_VARIABLE
>> -#define DISABLE_SIGN_COMPARE_WARNINGS
>>
>> #include "git-compat-util.h"
>> #include "abspath.h"
>> @@ -36,6 +35,7 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
>> enum ps_skip_worktree_action sw_action)
>> {
>> int num_unmatched = 0, i;
>> + unsigned int j;
>>
>> /*
>> * Since we are walking the index as if we were walking the directory,
>> @@ -48,8 +48,8 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
>> num_unmatched++;
>> if (!num_unmatched)
>> return;
>> - for (i = 0; i < istate->cache_nr; i++) {
>> - const struct cache_entry *ce = istate->cache[i];
>> + for (j = 0; j < istate->cache_nr; j++) {
>> + const struct cache_entry *ce = istate->cache[j];
>> if (sw_action == PS_IGNORE_SKIP_WORKTREE &&
>> (ce_skip_worktree(ce) || !path_in_sparse_checkout(ce->name, istate)))
>> continue;
>
> While this is correct, now we have 'i' & 'j' as iteration variables,
> generally this is used in O(n^2) loops to define the outer and inner
> loops. Here, however, we use it to simply define two different types. I
> find this deviation from convention a little confusing.
>
> Perhaps, we could simply utilize the option of intializing loop
> variables in the loop itself?
>
> diff --git a/pathspec.c b/pathspec.c
> index 89663645e1..ff8854afb8 100644
> --- a/pathspec.c
> +++ b/pathspec.c
> @@ -35,7 +35,7 @@ void add_pathspec_matches_against_index(const
> struct pathspec *pathspec,
> char *seen,
> enum ps_skip_worktree_action sw_action)
> {
> - int num_unmatched = 0, i;
> + int num_unmatched = 0;
>
> /*
> * Since we are walking the index as if we were walking the directory,
> @@ -43,12 +43,12 @@ void add_pathspec_matches_against_index(const
> struct pathspec *pathspec,
> * mistakenly think that the user gave a pathspec that did not match
> * anything.
> */
> - for (i = 0; i < pathspec->nr; i++)
> + for (int i = 0; i < pathspec->nr; i++)
> if (!seen[i])
> num_unmatched++;
> if (!num_unmatched)
> return;
> - for (i = 0; i < istate->cache_nr; i++) {
> + for (unsigned int i = 0; i < istate->cache_nr; i++) {
> const struct cache_entry *ce = istate->cache[i];
> if (sw_action == PS_IGNORE_SKIP_WORKTREE &&
> (ce_skip_worktree(ce) || !path_in_sparse_checkout(ce->name, istate)))
>
> This would read much cleaner and also avoid two different loop
> variables. WDYT?
We could certainly do that. My impression was that the convention was
not to do so.
> Also a bigger question is, shouldn't the type of `pathspec.nr` and
> 'istate.cache_nr' be the actual change required? Shouldn't they be set
> to 'size_t'?
I tried that first and found that it required making a large number of
changes spread over many files. As noted in my commit message, both
signed and unsigned types are used at different places for this
purpose.
>> @@ -78,7 +78,7 @@ char *find_pathspecs_matching_skip_worktree(const struct pathspec *pathspec)
>> {
>> struct index_state *istate = the_repository->index;
>> char *seen = xcalloc(pathspec->nr, 1);
>> - int i;
>> + unsigned int i;
>>
>
> Nit: We could also drop this and move the initialization to the line
> below.
>
>> for (i = 0; i < istate->cache_nr; i++) {
>> struct cache_entry *ce = istate->cache[i];
>> @@ -130,7 +130,7 @@ static void prefix_magic(struct strbuf *sb, int prefixlen,
>> if (element[1] != '(') {
>> /* Process an element in shorthand form (e.g. ":!/<match>") */
>> strbuf_addstr(sb, ":(");
>> - for (int i = 0; i < ARRAY_SIZE(pathspec_magic); i++) {
>> + for (unsigned int i = 0; i < ARRAY_SIZE(pathspec_magic); i++) {
>> if ((magic & pathspec_magic[i].bit) &&
>> pathspec_magic[i].mnemonic) {
>> if (sb->buf[sb->len - 1] != '(')
>
> Shouldn't we use 'size_t' for this, since we're iterating over the
> elements of an array?
We can use size_t there.
>
>> @@ -341,7 +341,7 @@ static const char *parse_long_magic(unsigned *magic, int *prefix_len,
>>
>> for (pos = elem + 2; *pos && *pos != ')'; pos = nextat) {
>> size_t len = strcspn_escaped(pos, ",)");
>> - int i;
>> + unsigned int i;
>>
>
> This too should be 'size_t'.
>
>> if (pos[len] == ',')
>> nextat = pos + len + 1; /* handle ',' */
>> @@ -354,7 +354,7 @@ static const char *parse_long_magic(unsigned *magic, int *prefix_len,
>> if (starts_with(pos, "prefix:")) {
>> char *endptr;
>> *prefix_len = strtol(pos + 7, &endptr, 10);
>> - if (endptr - pos != len)
>> + if ((size_t)(endptr - pos) != len)
>> die(_("invalid parameter for pathspec magic 'prefix'"));
>> continue;
>> }
>
> This makes sense. But is it guaranteed that `endptr - pos` is greater
> than 0?
endptr - pos will be greater than or equal to zero, as endptr is set by
strtol
>> @@ -400,7 +400,7 @@ static const char *parse_short_magic(unsigned *magic, const char *elem)
>>
>> for (pos = elem + 1; *pos && *pos != ':'; pos++) {
>> char ch = *pos;
>> - int i;
>> + unsigned int i;
>>
>
> This too, should be 'size_t'
>
>> /* Special case alias for '!' */
>> if (ch == '^') {
>> @@ -564,7 +564,7 @@ static int pathspec_item_cmp(const void *a_, const void *b_)
>>
>> void pathspec_magic_names(unsigned magic, struct strbuf *out)
>> {
>> - int i;
>> + unsigned int i;
>
> This can be inlined and made 'size_t'.
>
>> for (i = 0; i < ARRAY_SIZE(pathspec_magic); i++) {
>> const struct pathspec_magic *m = pathspec_magic + i;
>> if (!(magic & m->bit))
>> @@ -803,8 +803,8 @@ int match_pathspec_attrs(struct index_state *istate,
>> int pathspec_needs_expanded_index(struct index_state *istate,
>> const struct pathspec *pathspec)
>> {
>> - unsigned int i, pos;
>> - int res = 0;
>> + unsigned int pos;
>> + int i, res = 0;
>> char *skip_worktree_seen = NULL;
>>
>
> This can be inlined, but this change is done to match 'pathspec.nr''s
> type. This goes to my earlier question, I would say we first need to
> modify 'pathspec.nr' itself to be 'size_t'.
>
>> /*
>> @@ -845,7 +845,7 @@ int pathspec_needs_expanded_index(struct index_state *istate,
>> * - not-in-cone/bar*: may need expanded index
>> * - **.c: may need expanded index
>> */
>> - if (strspn(item.original + item.nowildcard_len, "*") == item.len - item.nowildcard_len &&
>> + if (strspn(item.original + item.nowildcard_len, "*") == (unsigned int)(item.len - item.nowildcard_len) &&
>> path_in_cone_mode_sparse_checkout(item.original, istate))
>> continue;
>>
>
> Similar here, I see the types of 'item.len' and 'item.nowwildcard_len'
> are 'int'. Do they need to be 'size_t'?
Same as above, will require a large number of changes.
>> @@ -860,7 +860,7 @@ int pathspec_needs_expanded_index(struct index_state *istate,
>> * directory name and the sparse directory is the first
>> * component of the pathspec, need to expand the index.
>> */
>> - if (item.nowildcard_len > ce_namelen(ce) &&
>> + if ((unsigned int)item.nowildcard_len > ce_namelen(ce) &&
>> !strncmp(item.original, ce->name, ce_namelen(ce))) {
>> res = 1;
>> break;
>> --
>> 2.48.1
>
> Same question as above!
--
Regards,
Arnav Bhate
(He/Him)
next prev parent reply other threads:[~2025-03-13 14:20 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-12 20:20 [GSoC PATCH] pathspec: fix sign comparison warnings Arnav Bhate
2025-03-13 12:52 ` Karthik Nayak
2025-03-13 14:20 ` Arnav Bhate [this message]
2025-03-13 14:44 ` Karthik Nayak
2025-03-13 15:33 ` Junio C Hamano
2025-03-13 15:50 ` Karthik Nayak
2025-03-16 10:14 ` Arnav Bhate
2025-03-29 6:17 ` [GSoC PATCH v2] " Arnav Bhate
2025-03-29 6:18 ` Arnav Bhate
2025-03-29 18:36 ` Junio C Hamano
2025-03-30 17:45 ` [GSoC PATCH v3] " Arnav Bhate
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0946673c-72ce-4848-b7f6-edd7a1cc7ff4@gmail.com \
--to=bhatearnav@gmail.com \
--cc=git@vger.kernel.org \
--cc=karthik.188@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).