* [PATCH 1/4] string-list: allow passing NULL for `get_entry_index`
2025-09-07 16:40 [PATCH 0/4] enhance string-list API to fix sign compare warnings shejialuo
@ 2025-09-07 16:42 ` shejialuo
2025-09-09 6:22 ` Patrick Steinhardt
2025-09-07 16:42 ` [PATCH 2/4] string-list: replace negative index encoding with "exact_match" parameter shejialuo
` (4 subsequent siblings)
5 siblings, 1 reply; 44+ messages in thread
From: shejialuo @ 2025-09-07 16:42 UTC (permalink / raw)
To: git
Callers of `get_entry_index()` are required to pass a non-NULL
`exact_match` parameter to receive information about whether an exact
match is found. However, in some cases, callers only need the index
position.
Let's allow callers to pass NULL for the `exact_match` parameter
when they don't need this information, reducing unnecessary variable
declarations in calling code.
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
string-list.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/string-list.c b/string-list.c
index 343cf1ca90..bf358d1a5c 100644
--- a/string-list.c
+++ b/string-list.c
@@ -29,12 +29,14 @@ static size_t get_entry_index(const struct string_list *list, const char *string
else if (compare > 0)
left = middle + 1;
else {
- *exact_match = 1;
+ if (exact_match)
+ *exact_match = 1;
return middle;
}
}
- *exact_match = 0;
+ if (exact_match)
+ *exact_match = 0;
return right;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 44+ messages in thread* Re: [PATCH 1/4] string-list: allow passing NULL for `get_entry_index`
2025-09-07 16:42 ` [PATCH 1/4] string-list: allow passing NULL for `get_entry_index` shejialuo
@ 2025-09-09 6:22 ` Patrick Steinhardt
0 siblings, 0 replies; 44+ messages in thread
From: Patrick Steinhardt @ 2025-09-09 6:22 UTC (permalink / raw)
To: shejialuo; +Cc: git
On Mon, Sep 08, 2025 at 12:42:20AM +0800, shejialuo wrote:
> Callers of `get_entry_index()` are required to pass a non-NULL
> `exact_match` parameter to receive information about whether an exact
> match is found. However, in some cases, callers only need the index
> position.
>
> Let's allow callers to pass NULL for the `exact_match` parameter
> when they don't need this information, reducing unnecessary variable
> declarations in calling code.
It would be nice to either explain why we want this or, alternatively,
to include that specific user in the same patch.
Patrick
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 2/4] string-list: replace negative index encoding with "exact_match" parameter
2025-09-07 16:40 [PATCH 0/4] enhance string-list API to fix sign compare warnings shejialuo
2025-09-07 16:42 ` [PATCH 1/4] string-list: allow passing NULL for `get_entry_index` shejialuo
@ 2025-09-07 16:42 ` shejialuo
2025-09-09 6:22 ` Patrick Steinhardt
2025-09-07 16:42 ` [PATCH 3/4] string-list: change "string_list_find_insert_index" return type to "size_t" shejialuo
` (3 subsequent siblings)
5 siblings, 1 reply; 44+ messages in thread
From: shejialuo @ 2025-09-07 16:42 UTC (permalink / raw)
To: git
We would return negative index to indicate exact match by converting the
original positive index to be "-1 - index" in
"string_list_find_insert_index", which requires callers to decode this
information.
This is bad due to the following reasons:
1. The callers need to convert the negative index back to the original
positive value, which requires the callers to understand the detail
of the function.
2. As we have to return negative index, we need to specify the return
type to be `int` instead of `size_t`, which would cause sign compare
warnings.
Refactor "string_list_find_insert_index" to use an output parameter
"exact_match" for indicating the exact match rather than encoding
through negative return values.
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
add-interactive.c | 7 ++++---
mailmap.c | 7 +++----
refs.c | 2 +-
string-list.c | 8 ++------
string-list.h | 2 +-
5 files changed, 11 insertions(+), 15 deletions(-)
diff --git a/add-interactive.c b/add-interactive.c
index 3e692b47ec..9a42b3b38b 100644
--- a/add-interactive.c
+++ b/add-interactive.c
@@ -221,7 +221,8 @@ static void find_unique_prefixes(struct prefix_item_list *list)
static ssize_t find_unique(const char *string, struct prefix_item_list *list)
{
- int index = string_list_find_insert_index(&list->sorted, string, 1);
+ int exact_match;
+ int index = string_list_find_insert_index(&list->sorted, string, &exact_match);
struct string_list_item *item;
if (list->items.nr != list->sorted.nr)
@@ -229,8 +230,8 @@ static ssize_t find_unique(const char *string, struct prefix_item_list *list)
" vs %"PRIuMAX")",
(uintmax_t)list->items.nr, (uintmax_t)list->sorted.nr);
- if (index < 0)
- item = list->sorted.items[-1 - index].util;
+ if (exact_match)
+ item = list->sorted.items[index].util;
else if (index > 0 &&
starts_with(list->sorted.items[index - 1].string, string))
return -1;
diff --git a/mailmap.c b/mailmap.c
index 56c72102d9..253517cdf6 100644
--- a/mailmap.c
+++ b/mailmap.c
@@ -243,10 +243,9 @@ void clear_mailmap(struct string_list *map)
static struct string_list_item *lookup_prefix(struct string_list *map,
const char *string, size_t len)
{
- int i = string_list_find_insert_index(map, string, 1);
- if (i < 0) {
- /* exact match */
- i = -1 - i;
+ int exact_match;
+ int i = string_list_find_insert_index(map, string, &exact_match);
+ if (exact_match) {
if (!string[len])
return &map->items[i];
/*
diff --git a/refs.c b/refs.c
index 4ff55cf24f..f1ff5bf846 100644
--- a/refs.c
+++ b/refs.c
@@ -1699,7 +1699,7 @@ const char *find_descendant_ref(const char *dirname,
* with dirname (remember, dirname includes the trailing
* slash) and is not in skip, then we have a conflict.
*/
- for (pos = string_list_find_insert_index(extras, dirname, 0);
+ for (pos = string_list_find_insert_index(extras, dirname, NULL);
pos < extras->nr; pos++) {
const char *extra_refname = extras->items[pos].string;
diff --git a/string-list.c b/string-list.c
index bf358d1a5c..224bc182ff 100644
--- a/string-list.c
+++ b/string-list.c
@@ -92,13 +92,9 @@ int string_list_has_string(const struct string_list *list, const char *string)
}
int string_list_find_insert_index(const struct string_list *list, const char *string,
- int negative_existing_index)
+ int *exact_match)
{
- int exact_match;
- int index = get_entry_index(list, string, &exact_match);
- if (exact_match)
- index = -1 - (negative_existing_index ? index : 0);
- return index;
+ return get_entry_index(list, string, exact_match);
}
struct string_list_item *string_list_lookup(struct string_list *list, const char *string)
diff --git a/string-list.h b/string-list.h
index 2b438c7733..03c7009472 100644
--- a/string-list.h
+++ b/string-list.h
@@ -174,7 +174,7 @@ void string_list_remove_empty_items(struct string_list *list, int free_util);
/** Determine if the string_list has a given string or not. */
int string_list_has_string(const struct string_list *list, const char *string);
int string_list_find_insert_index(const struct string_list *list, const char *string,
- int negative_existing_index);
+ int *exact_match);
/**
* Insert a new element to the string_list. The returned pointer can
--
2.51.0
^ permalink raw reply related [flat|nested] 44+ messages in thread* Re: [PATCH 2/4] string-list: replace negative index encoding with "exact_match" parameter
2025-09-07 16:42 ` [PATCH 2/4] string-list: replace negative index encoding with "exact_match" parameter shejialuo
@ 2025-09-09 6:22 ` Patrick Steinhardt
2025-09-15 12:11 ` shejialuo
0 siblings, 1 reply; 44+ messages in thread
From: Patrick Steinhardt @ 2025-09-09 6:22 UTC (permalink / raw)
To: shejialuo; +Cc: git
On Mon, Sep 08, 2025 at 12:42:29AM +0800, shejialuo wrote:
> diff --git a/mailmap.c b/mailmap.c
> index 56c72102d9..253517cdf6 100644
> --- a/mailmap.c
> +++ b/mailmap.c
> @@ -243,10 +243,9 @@ void clear_mailmap(struct string_list *map)
> static struct string_list_item *lookup_prefix(struct string_list *map,
> const char *string, size_t len)
> {
> - int i = string_list_find_insert_index(map, string, 1);
> - if (i < 0) {
> - /* exact match */
> - i = -1 - i;
> + int exact_match;
> + int i = string_list_find_insert_index(map, string, &exact_match);
> + if (exact_match) {
> if (!string[len])
> return &map->items[i];
> /*
Yeah, this looks much cleaner compared to before.
> diff --git a/string-list.c b/string-list.c
> index bf358d1a5c..224bc182ff 100644
> --- a/string-list.c
> +++ b/string-list.c
> @@ -92,13 +92,9 @@ int string_list_has_string(const struct string_list *list, const char *string)
> }
>
> int string_list_find_insert_index(const struct string_list *list, const char *string,
> - int negative_existing_index)
> + int *exact_match)
> {
> - int exact_match;
> - int index = get_entry_index(list, string, &exact_match);
> - if (exact_match)
> - index = -1 - (negative_existing_index ? index : 0);
> - return index;
> + return get_entry_index(list, string, exact_match);
> }
>
> struct string_list_item *string_list_lookup(struct string_list *list, const char *string)
Okay, this here is where the preceding patch comes from, as some callers
pass `NULL` to `string_list_find_insert_index()`.
> diff --git a/string-list.h b/string-list.h
> index 2b438c7733..03c7009472 100644
> --- a/string-list.h
> +++ b/string-list.h
> @@ -174,7 +174,7 @@ void string_list_remove_empty_items(struct string_list *list, int free_util);
> /** Determine if the string_list has a given string or not. */
> int string_list_has_string(const struct string_list *list, const char *string);
> int string_list_find_insert_index(const struct string_list *list, const char *string,
> - int negative_existing_index);
> + int *exact_match);
>
Makes me wonder whether we want to use `bool *exact_match` now to hint
that this is really only a true/false value? If so, we'd also have to
adapt the signature in the preceding commit.
Patrick
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH 2/4] string-list: replace negative index encoding with "exact_match" parameter
2025-09-09 6:22 ` Patrick Steinhardt
@ 2025-09-15 12:11 ` shejialuo
0 siblings, 0 replies; 44+ messages in thread
From: shejialuo @ 2025-09-15 12:11 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
On Tue, Sep 09, 2025 at 08:22:56AM +0200, Patrick Steinhardt wrote:
> > index 2b438c7733..03c7009472 100644
> > --- a/string-list.h
> > +++ b/string-list.h
> > @@ -174,7 +174,7 @@ void string_list_remove_empty_items(struct string_list *list, int free_util);
> > /** Determine if the string_list has a given string or not. */
> > int string_list_has_string(const struct string_list *list, const char *string);
> > int string_list_find_insert_index(const struct string_list *list, const char *string,
> > - int negative_existing_index);
> > + int *exact_match);
> >
>
> Makes me wonder whether we want to use `bool *exact_match` now to hint
> that this is really only a true/false value? If so, we'd also have to
> adapt the signature in the preceding commit.
>
That's right, I think `bool *` would be much better. I would improve
this in the next version.
Thanks,
Jialuo
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 3/4] string-list: change "string_list_find_insert_index" return type to "size_t"
2025-09-07 16:40 [PATCH 0/4] enhance string-list API to fix sign compare warnings shejialuo
2025-09-07 16:42 ` [PATCH 1/4] string-list: allow passing NULL for `get_entry_index` shejialuo
2025-09-07 16:42 ` [PATCH 2/4] string-list: replace negative index encoding with "exact_match" parameter shejialuo
@ 2025-09-07 16:42 ` shejialuo
2025-09-09 6:23 ` Patrick Steinhardt
2025-09-07 16:42 ` [PATCH 4/4] refs: enable sign compare warnings check shejialuo
` (2 subsequent siblings)
5 siblings, 1 reply; 44+ messages in thread
From: shejialuo @ 2025-09-07 16:42 UTC (permalink / raw)
To: git
As "string_list_find_insert_index" is a simple wrapper of
"get_entry_index", we could simply change its return type to "size_t".
Update all callers to use size_t variables for storing the return value.
The tricky fix is the loop condition in "mailmap.c" to properly handle
"size_t" underflow by changing from `0 <= --i` to `i-- > 0`.
Remove "DISABLE_SIGN_COMPARE_WARNINGS" from "mailmap.c" as it's no
longer needed with the proper unsigned types.
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
add-interactive.c | 2 +-
mailmap.c | 5 ++---
refs.c | 4 +---
string-list.c | 4 ++--
string-list.h | 4 ++--
5 files changed, 8 insertions(+), 11 deletions(-)
diff --git a/add-interactive.c b/add-interactive.c
index 9a42b3b38b..2005f56b69 100644
--- a/add-interactive.c
+++ b/add-interactive.c
@@ -222,7 +222,7 @@ static void find_unique_prefixes(struct prefix_item_list *list)
static ssize_t find_unique(const char *string, struct prefix_item_list *list)
{
int exact_match;
- int index = string_list_find_insert_index(&list->sorted, string, &exact_match);
+ size_t index = string_list_find_insert_index(&list->sorted, string, &exact_match);
struct string_list_item *item;
if (list->items.nr != list->sorted.nr)
diff --git a/mailmap.c b/mailmap.c
index 253517cdf6..0168342650 100644
--- a/mailmap.c
+++ b/mailmap.c
@@ -1,5 +1,4 @@
#define USE_THE_REPOSITORY_VARIABLE
-#define DISABLE_SIGN_COMPARE_WARNINGS
#include "git-compat-util.h"
#include "environment.h"
@@ -244,7 +243,7 @@ static struct string_list_item *lookup_prefix(struct string_list *map,
const char *string, size_t len)
{
int exact_match;
- int i = string_list_find_insert_index(map, string, &exact_match);
+ size_t i = string_list_find_insert_index(map, string, &exact_match);
if (exact_match) {
if (!string[len])
return &map->items[i];
@@ -266,7 +265,7 @@ static struct string_list_item *lookup_prefix(struct string_list *map,
* overlong key would be inserted, which must come after the
* real location of the key if one exists.
*/
- while (0 <= --i && i < map->nr) {
+ while (i-- > 0 && i < map->nr) {
int cmp = strncasecmp(map->items[i].string, string, len);
if (cmp < 0)
/*
diff --git a/refs.c b/refs.c
index f1ff5bf846..a8f06b9a0a 100644
--- a/refs.c
+++ b/refs.c
@@ -1688,8 +1688,6 @@ const char *find_descendant_ref(const char *dirname,
const struct string_list *extras,
const struct string_list *skip)
{
- int pos;
-
if (!extras)
return NULL;
@@ -1699,7 +1697,7 @@ const char *find_descendant_ref(const char *dirname,
* with dirname (remember, dirname includes the trailing
* slash) and is not in skip, then we have a conflict.
*/
- for (pos = string_list_find_insert_index(extras, dirname, NULL);
+ for (size_t pos = string_list_find_insert_index(extras, dirname, NULL);
pos < extras->nr; pos++) {
const char *extra_refname = extras->items[pos].string;
diff --git a/string-list.c b/string-list.c
index 224bc182ff..e69923cd88 100644
--- a/string-list.c
+++ b/string-list.c
@@ -91,8 +91,8 @@ int string_list_has_string(const struct string_list *list, const char *string)
return exact_match;
}
-int string_list_find_insert_index(const struct string_list *list, const char *string,
- int *exact_match)
+size_t string_list_find_insert_index(const struct string_list *list, const char *string,
+ int *exact_match)
{
return get_entry_index(list, string, exact_match);
}
diff --git a/string-list.h b/string-list.h
index 03c7009472..f6be2bd5c7 100644
--- a/string-list.h
+++ b/string-list.h
@@ -173,8 +173,8 @@ void string_list_remove_empty_items(struct string_list *list, int free_util);
/** Determine if the string_list has a given string or not. */
int string_list_has_string(const struct string_list *list, const char *string);
-int string_list_find_insert_index(const struct string_list *list, const char *string,
- int *exact_match);
+size_t string_list_find_insert_index(const struct string_list *list, const char *string,
+ int *exact_match);
/**
* Insert a new element to the string_list. The returned pointer can
--
2.51.0
^ permalink raw reply related [flat|nested] 44+ messages in thread* Re: [PATCH 3/4] string-list: change "string_list_find_insert_index" return type to "size_t"
2025-09-07 16:42 ` [PATCH 3/4] string-list: change "string_list_find_insert_index" return type to "size_t" shejialuo
@ 2025-09-09 6:23 ` Patrick Steinhardt
2025-09-09 19:21 ` Junio C Hamano
0 siblings, 1 reply; 44+ messages in thread
From: Patrick Steinhardt @ 2025-09-09 6:23 UTC (permalink / raw)
To: shejialuo; +Cc: git
On Mon, Sep 08, 2025 at 12:42:37AM +0800, shejialuo wrote:
> As "string_list_find_insert_index" is a simple wrapper of
> "get_entry_index", we could simply change its return type to "size_t".
The missing connecting piece is that `get_entry_index()` itself already
returns a `size_t`.
> diff --git a/mailmap.c b/mailmap.c
> index 253517cdf6..0168342650 100644
> --- a/mailmap.c
> +++ b/mailmap.c
> @@ -266,7 +265,7 @@ static struct string_list_item *lookup_prefix(struct string_list *map,
> * overlong key would be inserted, which must come after the
> * real location of the key if one exists.
> */
> - while (0 <= --i && i < map->nr) {
> + while (i-- > 0 && i < map->nr) {
This could simply be `while (i-- && i < map->nr)`.
Patrick
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH 3/4] string-list: change "string_list_find_insert_index" return type to "size_t"
2025-09-09 6:23 ` Patrick Steinhardt
@ 2025-09-09 19:21 ` Junio C Hamano
2025-09-10 4:57 ` Patrick Steinhardt
0 siblings, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2025-09-09 19:21 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: shejialuo, git
Patrick Steinhardt <ps@pks.im> writes:
>> @@ -266,7 +265,7 @@ static struct string_list_item *lookup_prefix(struct string_list *map,
>> * overlong key would be inserted, which must come after the
>> * real location of the key if one exists.
>> */
>> - while (0 <= --i && i < map->nr) {
>> + while (i-- > 0 && i < map->nr) {
>
> This could simply be `while (i-- && i < map->nr)`.
Yes, especially if the reason why we avoid "not negative" aka "0 <="
is because the counter is now unsigned, yours is much more intuitive
way to say "as long as i is not yet zero". Alternatively you could
say "while (i-- != 0 && ...", but not comparing with 0 is more
customary.
Better yet, shouldn't we stay away from "i", if the point of the
change is to make it unsigned, as "i" has a strong connotation with
"int, the platform natural signed integer type"?
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH 3/4] string-list: change "string_list_find_insert_index" return type to "size_t"
2025-09-09 19:21 ` Junio C Hamano
@ 2025-09-10 4:57 ` Patrick Steinhardt
0 siblings, 0 replies; 44+ messages in thread
From: Patrick Steinhardt @ 2025-09-10 4:57 UTC (permalink / raw)
To: Junio C Hamano; +Cc: shejialuo, git
On Tue, Sep 09, 2025 at 12:21:15PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
>
> >> @@ -266,7 +265,7 @@ static struct string_list_item *lookup_prefix(struct string_list *map,
> >> * overlong key would be inserted, which must come after the
> >> * real location of the key if one exists.
> >> */
> >> - while (0 <= --i && i < map->nr) {
> >> + while (i-- > 0 && i < map->nr) {
> >
> > This could simply be `while (i-- && i < map->nr)`.
>
> Yes, especially if the reason why we avoid "not negative" aka "0 <="
> is because the counter is now unsigned, yours is much more intuitive
> way to say "as long as i is not yet zero". Alternatively you could
> say "while (i-- != 0 && ...", but not comparing with 0 is more
> customary.
>
> Better yet, shouldn't we stay away from "i", if the point of the
> change is to make it unsigned, as "i" has a strong connotation with
> "int, the platform natural signed integer type"?
For me "i" in a loop typically just means "index" and not "int". So
personally, I'm fine with that variable name.
Patrick
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 4/4] refs: enable sign compare warnings check
2025-09-07 16:40 [PATCH 0/4] enhance string-list API to fix sign compare warnings shejialuo
` (2 preceding siblings ...)
2025-09-07 16:42 ` [PATCH 3/4] string-list: change "string_list_find_insert_index" return type to "size_t" shejialuo
@ 2025-09-07 16:42 ` shejialuo
2025-09-09 6:23 ` Patrick Steinhardt
2025-09-07 16:43 ` [PATCH 0/4] enhance string-list API to fix sign compare warnings shejialuo
2025-09-17 9:18 ` [PATCH v2 " shejialuo
5 siblings, 1 reply; 44+ messages in thread
From: shejialuo @ 2025-09-07 16:42 UTC (permalink / raw)
To: git
After fixing the tricky compare warning introduced by calling
"string_list_find_insert_index", there are only two loop iterator type
mismatches. Fix them to enable compare warnings check.
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
refs.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/refs.c b/refs.c
index a8f06b9a0a..79069c37b8 100644
--- a/refs.c
+++ b/refs.c
@@ -3,7 +3,6 @@
*/
#define USE_THE_REPOSITORY_VARIABLE
-#define DISABLE_SIGN_COMPARE_WARNINGS
#include "git-compat-util.h"
#include "advice.h"
@@ -2381,7 +2380,7 @@ static int run_transaction_hook(struct ref_transaction *transaction,
struct child_process proc = CHILD_PROCESS_INIT;
struct strbuf buf = STRBUF_INIT;
const char *hook;
- int ret = 0, i;
+ int ret = 0;
hook = find_hook(transaction->ref_store->repo, "reference-transaction");
if (!hook)
@@ -2398,7 +2397,7 @@ static int run_transaction_hook(struct ref_transaction *transaction,
sigchain_push(SIGPIPE, SIG_IGN);
- for (i = 0; i < transaction->nr; i++) {
+ for (size_t i = 0; i < transaction->nr; i++) {
struct ref_update *update = transaction->updates[i];
if (update->flags & REF_LOG_ONLY)
@@ -2791,9 +2790,7 @@ void ref_transaction_for_each_queued_update(struct ref_transaction *transaction,
ref_transaction_for_each_queued_update_fn cb,
void *cb_data)
{
- int i;
-
- for (i = 0; i < transaction->nr; i++) {
+ for (size_t i = 0; i < transaction->nr; i++) {
struct ref_update *update = transaction->updates[i];
cb(update->refname,
--
2.51.0
^ permalink raw reply related [flat|nested] 44+ messages in thread* Re: [PATCH 0/4] enhance string-list API to fix sign compare warnings
2025-09-07 16:40 [PATCH 0/4] enhance string-list API to fix sign compare warnings shejialuo
` (3 preceding siblings ...)
2025-09-07 16:42 ` [PATCH 4/4] refs: enable sign compare warnings check shejialuo
@ 2025-09-07 16:43 ` shejialuo
2025-09-17 9:18 ` [PATCH v2 " shejialuo
5 siblings, 0 replies; 44+ messages in thread
From: shejialuo @ 2025-09-07 16:43 UTC (permalink / raw)
To: git
On Mon, Sep 08, 2025 at 12:40:16AM +0800, shejialuo wrote:
Please ignore some standalone patches, I wrongly forgot to add
"In-Reply-To" filed.
> Hi All:
>
> This is a small PATCH to enhance string-list API
> "string_list_find_insert_index" which has introduced sign compare
> warnings.
>
> Thanks,
> Jialuo
>
> shejialuo (4):
> string-list: allow passing NULL for `get_entry_index`
> string-list: replace negative index encoding with "exact_match"
> parameter
> string-list: change "string_list_find_insert_index" return type to
> "size_t"
> refs: enable sign compare warnings check
>
> add-interactive.c | 7 ++++---
> mailmap.c | 10 ++++------
> refs.c | 13 ++++---------
> string-list.c | 16 +++++++---------
> string-list.h | 4 ++--
> 5 files changed, 21 insertions(+), 29 deletions(-)
>
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH v2 0/4] enhance string-list API to fix sign compare warnings
2025-09-07 16:40 [PATCH 0/4] enhance string-list API to fix sign compare warnings shejialuo
` (4 preceding siblings ...)
2025-09-07 16:43 ` [PATCH 0/4] enhance string-list API to fix sign compare warnings shejialuo
@ 2025-09-17 9:18 ` shejialuo
2025-09-17 9:19 ` [PATCH v2 1/4] string-list: use bool instead of int for "exact_match" shejialuo
` (4 more replies)
5 siblings, 5 replies; 44+ messages in thread
From: shejialuo @ 2025-09-17 9:18 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt
Hi All:
This is a small PATCH to enhance string-list API
"string_list_find_insert_index" which has introduced sign compare
warnings.
---
Changes since v1:
1. Create a new commit which aims at using `bool` for "exact_match"
parameter.
2. Rebase the [PATCH 1/4] and [PATCH 2/4] into a single [PATCH v2 2/4]
commit to avoid confusing the user with the motivation of the
original commit [PATCH 1/4].
3. Enhance the comimt message of [PATCH 2/4] to improve the motivation.
4. Update "i-- > 0" to "i--" for [PATCH 3/4]
Thanks,
Jialuo
shejialuo (4):
string-list: use bool instead of int for "exact_match"
string-list: replace negative index encoding with "exact_match"
parameter
string-list: change "string_list_find_insert_index" return type to
"size_t"
refs: enable sign compare warnings check
add-interactive.c | 7 ++++---
mailmap.c | 10 ++++------
refs.c | 13 ++++---------
string-list.c | 29 ++++++++++++++---------------
string-list.h | 6 +++---
5 files changed, 29 insertions(+), 36 deletions(-)
Range-diff against v1:
1: d3333b4ff6 ! 1: c3786fa386 string-list: allow passing NULL for `get_entry_index`
@@ Metadata
Author: shejialuo <shejialuo@gmail.com>
## Commit message ##
- string-list: allow passing NULL for `get_entry_index`
+ string-list: use bool instead of int for "exact_match"
- Callers of `get_entry_index()` are required to pass a non-NULL
- `exact_match` parameter to receive information about whether an exact
- match is found. However, in some cases, callers only need the index
- position.
-
- Let's allow callers to pass NULL for the `exact_match` parameter
- when they don't need this information, reducing unnecessary variable
- declarations in calling code.
+ The "exact_match" parameter in "get_entry_index" is used to indicate
+ whether a string is found or not, which is fundamentally a true/false
+ value. As we allow the use of bool, let's use bool instead of int to
+ make the function more semantically clear.
Signed-off-by: shejialuo <shejialuo@gmail.com>
## string-list.c ##
+@@ string-list.c: void string_list_init_dup(struct string_list *list)
+ /* if there is no exact match, point to the index where the entry could be
+ * inserted */
+ static size_t get_entry_index(const struct string_list *list, const char *string,
+- int *exact_match)
++ bool *exact_match)
+ {
+ size_t left = 0, right = list->nr;
+ compare_strings_fn cmp = list->cmp ? list->cmp : strcmp;
@@ string-list.c: static size_t get_entry_index(const struct string_list *list, const char *string
else if (compare > 0)
left = middle + 1;
else {
- *exact_match = 1;
-+ if (exact_match)
-+ *exact_match = 1;
++ *exact_match = true;
return middle;
}
}
- *exact_match = 0;
-+ if (exact_match)
-+ *exact_match = 0;
++ *exact_match = false;
return right;
}
+ static size_t add_entry(struct string_list *list, const char *string)
+ {
+- int exact_match = 0;
++ bool exact_match;
+ size_t index = get_entry_index(list, string, &exact_match);
+
+ if (exact_match)
+@@ string-list.c: struct string_list_item *string_list_insert(struct string_list *list, const char
+ void string_list_remove(struct string_list *list, const char *string,
+ int free_util)
+ {
+- int exact_match;
++ bool exact_match;
+ int i = get_entry_index(list, string, &exact_match);
+
+ if (exact_match) {
+@@ string-list.c: void string_list_remove(struct string_list *list, const char *string,
+ }
+ }
+
+-int string_list_has_string(const struct string_list *list, const char *string)
++bool string_list_has_string(const struct string_list *list, const char *string)
+ {
+- int exact_match;
++ bool exact_match;
+ get_entry_index(list, string, &exact_match);
+ return exact_match;
+ }
+@@ string-list.c: int string_list_has_string(const struct string_list *list, const char *string)
+ int string_list_find_insert_index(const struct string_list *list, const char *string,
+ int negative_existing_index)
+ {
+- int exact_match;
++ bool exact_match;
+ int index = get_entry_index(list, string, &exact_match);
+ if (exact_match)
+ index = -1 - (negative_existing_index ? index : 0);
+@@ string-list.c: int string_list_find_insert_index(const struct string_list *list, const char *st
+
+ struct string_list_item *string_list_lookup(struct string_list *list, const char *string)
+ {
+- int exact_match, i = get_entry_index(list, string, &exact_match);
++ bool exact_match;
++ size_t i = get_entry_index(list, string, &exact_match);
+ if (!exact_match)
+ return NULL;
+ return list->items + i;
+
+ ## string-list.h ##
+@@ string-list.h: void string_list_remove_empty_items(struct string_list *list, int free_util);
+ /* Use these functions only on sorted lists: */
+
+ /** Determine if the string_list has a given string or not. */
+-int string_list_has_string(const struct string_list *list, const char *string);
++bool string_list_has_string(const struct string_list *list, const char *string);
+ int string_list_find_insert_index(const struct string_list *list, const char *string,
+ int negative_existing_index);
+
2: 104c090d8d ! 2: 7ac8fd69c0 string-list: replace negative index encoding with "exact_match" parameter
@@ Commit message
We would return negative index to indicate exact match by converting the
original positive index to be "-1 - index" in
"string_list_find_insert_index", which requires callers to decode this
- information.
+ information. This approach has several limitations:
- This is bad due to the following reasons:
+ 1. It prevents us from using the full range of size_t, which is
+ necessary for large string list.
+ 2. Using int for indices while other parts of the codebase use size_t
+ creates signed comparison warnings when these values are compared.
- 1. The callers need to convert the negative index back to the original
- positive value, which requires the callers to understand the detail
- of the function.
- 2. As we have to return negative index, we need to specify the return
- type to be `int` instead of `size_t`, which would cause sign compare
- warnings.
+ To address these limitations, change the function to return size_t for
+ the index value and use a separate bool parameter to indicate whether
+ the index refers to an existing entry or an insertion point.
- Refactor "string_list_find_insert_index" to use an output parameter
- "exact_match" for indicating the exact match rather than encoding
- through negative return values.
+ In some cases, the callers of "string_list_find_insert_index" only need
+ the index position and don't care whether an exact match is found.
+ However, "get_entry_index" currently requires a non-NULL "exact_match"
+ parameter, forcing these callers to declare unnecessary variables.
+ Let's allow callers to pass NULL for the "exact_match" parameter when
+ they don't need this information, reducing unnecessary variable
+ declarations in calling code.
Signed-off-by: shejialuo <shejialuo@gmail.com>
@@ add-interactive.c: static void find_unique_prefixes(struct prefix_item_list *lis
static ssize_t find_unique(const char *string, struct prefix_item_list *list)
{
- int index = string_list_find_insert_index(&list->sorted, string, 1);
-+ int exact_match;
++ bool exact_match;
+ int index = string_list_find_insert_index(&list->sorted, string, &exact_match);
struct string_list_item *item;
@@ mailmap.c: void clear_mailmap(struct string_list *map)
- if (i < 0) {
- /* exact match */
- i = -1 - i;
-+ int exact_match;
++ bool exact_match;
+ int i = string_list_find_insert_index(map, string, &exact_match);
+ if (exact_match) {
if (!string[len])
@@ refs.c: const char *find_descendant_ref(const char *dirname,
## string-list.c ##
-@@ string-list.c: int string_list_has_string(const struct string_list *list, const char *string)
+@@ string-list.c: static size_t get_entry_index(const struct string_list *list, const char *string
+ else if (compare > 0)
+ left = middle + 1;
+ else {
+- *exact_match = true;
++ if (exact_match)
++ *exact_match = true;
+ return middle;
+ }
+ }
+
+- *exact_match = false;
++ if (exact_match)
++ *exact_match = false;
+ return right;
+ }
+
+@@ string-list.c: bool string_list_has_string(const struct string_list *list, const char *string)
}
int string_list_find_insert_index(const struct string_list *list, const char *string,
- int negative_existing_index)
-+ int *exact_match)
++ bool *exact_match)
{
-- int exact_match;
+- bool exact_match;
- int index = get_entry_index(list, string, &exact_match);
- if (exact_match)
- index = -1 - (negative_existing_index ? index : 0);
@@ string-list.c: int string_list_has_string(const struct string_list *list, const
## string-list.h ##
@@ string-list.h: void string_list_remove_empty_items(struct string_list *list, int free_util);
/** Determine if the string_list has a given string or not. */
- int string_list_has_string(const struct string_list *list, const char *string);
+ bool string_list_has_string(const struct string_list *list, const char *string);
int string_list_find_insert_index(const struct string_list *list, const char *string,
- int negative_existing_index);
-+ int *exact_match);
++ bool *exact_match);
/**
* Insert a new element to the string_list. The returned pointer can
3: 4d4bdf7cda ! 3: 1cf914fab5 string-list: change "string_list_find_insert_index" return type to "size_t"
@@ Commit message
string-list: change "string_list_find_insert_index" return type to "size_t"
As "string_list_find_insert_index" is a simple wrapper of
- "get_entry_index", we could simply change its return type to "size_t".
+ "get_entry_index" and the return type of "get_entry_index" is already
+ "size_t", we could simply change its return type to "size_t".
Update all callers to use size_t variables for storing the return value.
The tricky fix is the loop condition in "mailmap.c" to properly handle
- "size_t" underflow by changing from `0 <= --i` to `i-- > 0`.
+ "size_t" underflow by changing from `0 <= --i` to `i--`.
Remove "DISABLE_SIGN_COMPARE_WARNINGS" from "mailmap.c" as it's no
longer needed with the proper unsigned types.
@@ add-interactive.c
@@ add-interactive.c: static void find_unique_prefixes(struct prefix_item_list *list)
static ssize_t find_unique(const char *string, struct prefix_item_list *list)
{
- int exact_match;
+ bool exact_match;
- int index = string_list_find_insert_index(&list->sorted, string, &exact_match);
+ size_t index = string_list_find_insert_index(&list->sorted, string, &exact_match);
struct string_list_item *item;
@@ mailmap.c
@@ mailmap.c: static struct string_list_item *lookup_prefix(struct string_list *map,
const char *string, size_t len)
{
- int exact_match;
+ bool exact_match;
- int i = string_list_find_insert_index(map, string, &exact_match);
+ size_t i = string_list_find_insert_index(map, string, &exact_match);
if (exact_match) {
@@ mailmap.c: static struct string_list_item *lookup_prefix(struct string_list *map
* real location of the key if one exists.
*/
- while (0 <= --i && i < map->nr) {
-+ while (i-- > 0 && i < map->nr) {
++ while (i-- && i < map->nr) {
int cmp = strncasecmp(map->items[i].string, string, len);
if (cmp < 0)
/*
@@ refs.c: const char *find_descendant_ref(const char *dirname,
## string-list.c ##
-@@ string-list.c: int string_list_has_string(const struct string_list *list, const char *string)
+@@ string-list.c: bool string_list_has_string(const struct string_list *list, const char *string)
return exact_match;
}
-int string_list_find_insert_index(const struct string_list *list, const char *string,
-- int *exact_match)
+- bool *exact_match)
+size_t string_list_find_insert_index(const struct string_list *list, const char *string,
-+ int *exact_match)
++ bool *exact_match)
{
return get_entry_index(list, string, exact_match);
}
@@ string-list.h
@@ string-list.h: void string_list_remove_empty_items(struct string_list *list, int free_util);
/** Determine if the string_list has a given string or not. */
- int string_list_has_string(const struct string_list *list, const char *string);
+ bool string_list_has_string(const struct string_list *list, const char *string);
-int string_list_find_insert_index(const struct string_list *list, const char *string,
-- int *exact_match);
+- bool *exact_match);
+size_t string_list_find_insert_index(const struct string_list *list, const char *string,
-+ int *exact_match);
++ bool *exact_match);
/**
* Insert a new element to the string_list. The returned pointer can
4: 9f2b55fb41 = 4: 8a445549dd refs: enable sign compare warnings check
--
2.51.0
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH v2 1/4] string-list: use bool instead of int for "exact_match"
2025-09-17 9:18 ` [PATCH v2 " shejialuo
@ 2025-09-17 9:19 ` shejialuo
2025-09-17 9:19 ` [PATCH v2 2/4] string-list: replace negative index encoding with "exact_match" parameter shejialuo
` (3 subsequent siblings)
4 siblings, 0 replies; 44+ messages in thread
From: shejialuo @ 2025-09-17 9:19 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt
The "exact_match" parameter in "get_entry_index" is used to indicate
whether a string is found or not, which is fundamentally a true/false
value. As we allow the use of bool, let's use bool instead of int to
make the function more semantically clear.
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
string-list.c | 19 ++++++++++---------
string-list.h | 2 +-
2 files changed, 11 insertions(+), 10 deletions(-)
diff --git a/string-list.c b/string-list.c
index 343cf1ca90..d8da3dd414 100644
--- a/string-list.c
+++ b/string-list.c
@@ -16,7 +16,7 @@ void string_list_init_dup(struct string_list *list)
/* if there is no exact match, point to the index where the entry could be
* inserted */
static size_t get_entry_index(const struct string_list *list, const char *string,
- int *exact_match)
+ bool *exact_match)
{
size_t left = 0, right = list->nr;
compare_strings_fn cmp = list->cmp ? list->cmp : strcmp;
@@ -29,18 +29,18 @@ static size_t get_entry_index(const struct string_list *list, const char *string
else if (compare > 0)
left = middle + 1;
else {
- *exact_match = 1;
+ *exact_match = true;
return middle;
}
}
- *exact_match = 0;
+ *exact_match = false;
return right;
}
static size_t add_entry(struct string_list *list, const char *string)
{
- int exact_match = 0;
+ bool exact_match;
size_t index = get_entry_index(list, string, &exact_match);
if (exact_match)
@@ -68,7 +68,7 @@ struct string_list_item *string_list_insert(struct string_list *list, const char
void string_list_remove(struct string_list *list, const char *string,
int free_util)
{
- int exact_match;
+ bool exact_match;
int i = get_entry_index(list, string, &exact_match);
if (exact_match) {
@@ -82,9 +82,9 @@ void string_list_remove(struct string_list *list, const char *string,
}
}
-int string_list_has_string(const struct string_list *list, const char *string)
+bool string_list_has_string(const struct string_list *list, const char *string)
{
- int exact_match;
+ bool exact_match;
get_entry_index(list, string, &exact_match);
return exact_match;
}
@@ -92,7 +92,7 @@ int string_list_has_string(const struct string_list *list, const char *string)
int string_list_find_insert_index(const struct string_list *list, const char *string,
int negative_existing_index)
{
- int exact_match;
+ bool exact_match;
int index = get_entry_index(list, string, &exact_match);
if (exact_match)
index = -1 - (negative_existing_index ? index : 0);
@@ -101,7 +101,8 @@ int string_list_find_insert_index(const struct string_list *list, const char *st
struct string_list_item *string_list_lookup(struct string_list *list, const char *string)
{
- int exact_match, i = get_entry_index(list, string, &exact_match);
+ bool exact_match;
+ size_t i = get_entry_index(list, string, &exact_match);
if (!exact_match)
return NULL;
return list->items + i;
diff --git a/string-list.h b/string-list.h
index 2b438c7733..bc7f38022e 100644
--- a/string-list.h
+++ b/string-list.h
@@ -172,7 +172,7 @@ void string_list_remove_empty_items(struct string_list *list, int free_util);
/* Use these functions only on sorted lists: */
/** Determine if the string_list has a given string or not. */
-int string_list_has_string(const struct string_list *list, const char *string);
+bool string_list_has_string(const struct string_list *list, const char *string);
int string_list_find_insert_index(const struct string_list *list, const char *string,
int negative_existing_index);
--
2.51.0
^ permalink raw reply related [flat|nested] 44+ messages in thread* [PATCH v2 2/4] string-list: replace negative index encoding with "exact_match" parameter
2025-09-17 9:18 ` [PATCH v2 " shejialuo
2025-09-17 9:19 ` [PATCH v2 1/4] string-list: use bool instead of int for "exact_match" shejialuo
@ 2025-09-17 9:19 ` shejialuo
2025-09-23 8:14 ` Patrick Steinhardt
2025-09-23 9:35 ` Karthik Nayak
2025-09-17 9:20 ` [PATCH v2 3/4] string-list: change "string_list_find_insert_index" return type to "size_t" shejialuo
` (2 subsequent siblings)
4 siblings, 2 replies; 44+ messages in thread
From: shejialuo @ 2025-09-17 9:19 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt
We would return negative index to indicate exact match by converting the
original positive index to be "-1 - index" in
"string_list_find_insert_index", which requires callers to decode this
information. This approach has several limitations:
1. It prevents us from using the full range of size_t, which is
necessary for large string list.
2. Using int for indices while other parts of the codebase use size_t
creates signed comparison warnings when these values are compared.
To address these limitations, change the function to return size_t for
the index value and use a separate bool parameter to indicate whether
the index refers to an existing entry or an insertion point.
In some cases, the callers of "string_list_find_insert_index" only need
the index position and don't care whether an exact match is found.
However, "get_entry_index" currently requires a non-NULL "exact_match"
parameter, forcing these callers to declare unnecessary variables.
Let's allow callers to pass NULL for the "exact_match" parameter when
they don't need this information, reducing unnecessary variable
declarations in calling code.
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
add-interactive.c | 7 ++++---
mailmap.c | 7 +++----
refs.c | 2 +-
string-list.c | 14 ++++++--------
string-list.h | 2 +-
5 files changed, 15 insertions(+), 17 deletions(-)
diff --git a/add-interactive.c b/add-interactive.c
index 3e692b47ec..7c0fd3d218 100644
--- a/add-interactive.c
+++ b/add-interactive.c
@@ -221,7 +221,8 @@ static void find_unique_prefixes(struct prefix_item_list *list)
static ssize_t find_unique(const char *string, struct prefix_item_list *list)
{
- int index = string_list_find_insert_index(&list->sorted, string, 1);
+ bool exact_match;
+ int index = string_list_find_insert_index(&list->sorted, string, &exact_match);
struct string_list_item *item;
if (list->items.nr != list->sorted.nr)
@@ -229,8 +230,8 @@ static ssize_t find_unique(const char *string, struct prefix_item_list *list)
" vs %"PRIuMAX")",
(uintmax_t)list->items.nr, (uintmax_t)list->sorted.nr);
- if (index < 0)
- item = list->sorted.items[-1 - index].util;
+ if (exact_match)
+ item = list->sorted.items[index].util;
else if (index > 0 &&
starts_with(list->sorted.items[index - 1].string, string))
return -1;
diff --git a/mailmap.c b/mailmap.c
index 56c72102d9..58a4484963 100644
--- a/mailmap.c
+++ b/mailmap.c
@@ -243,10 +243,9 @@ void clear_mailmap(struct string_list *map)
static struct string_list_item *lookup_prefix(struct string_list *map,
const char *string, size_t len)
{
- int i = string_list_find_insert_index(map, string, 1);
- if (i < 0) {
- /* exact match */
- i = -1 - i;
+ bool exact_match;
+ int i = string_list_find_insert_index(map, string, &exact_match);
+ if (exact_match) {
if (!string[len])
return &map->items[i];
/*
diff --git a/refs.c b/refs.c
index 4ff55cf24f..f1ff5bf846 100644
--- a/refs.c
+++ b/refs.c
@@ -1699,7 +1699,7 @@ const char *find_descendant_ref(const char *dirname,
* with dirname (remember, dirname includes the trailing
* slash) and is not in skip, then we have a conflict.
*/
- for (pos = string_list_find_insert_index(extras, dirname, 0);
+ for (pos = string_list_find_insert_index(extras, dirname, NULL);
pos < extras->nr; pos++) {
const char *extra_refname = extras->items[pos].string;
diff --git a/string-list.c b/string-list.c
index d8da3dd414..c589ab5a2c 100644
--- a/string-list.c
+++ b/string-list.c
@@ -29,12 +29,14 @@ static size_t get_entry_index(const struct string_list *list, const char *string
else if (compare > 0)
left = middle + 1;
else {
- *exact_match = true;
+ if (exact_match)
+ *exact_match = true;
return middle;
}
}
- *exact_match = false;
+ if (exact_match)
+ *exact_match = false;
return right;
}
@@ -90,13 +92,9 @@ bool string_list_has_string(const struct string_list *list, const char *string)
}
int string_list_find_insert_index(const struct string_list *list, const char *string,
- int negative_existing_index)
+ bool *exact_match)
{
- bool exact_match;
- int index = get_entry_index(list, string, &exact_match);
- if (exact_match)
- index = -1 - (negative_existing_index ? index : 0);
- return index;
+ return get_entry_index(list, string, exact_match);
}
struct string_list_item *string_list_lookup(struct string_list *list, const char *string)
diff --git a/string-list.h b/string-list.h
index bc7f38022e..8830ce671d 100644
--- a/string-list.h
+++ b/string-list.h
@@ -174,7 +174,7 @@ void string_list_remove_empty_items(struct string_list *list, int free_util);
/** Determine if the string_list has a given string or not. */
bool string_list_has_string(const struct string_list *list, const char *string);
int string_list_find_insert_index(const struct string_list *list, const char *string,
- int negative_existing_index);
+ bool *exact_match);
/**
* Insert a new element to the string_list. The returned pointer can
--
2.51.0
^ permalink raw reply related [flat|nested] 44+ messages in thread* Re: [PATCH v2 2/4] string-list: replace negative index encoding with "exact_match" parameter
2025-09-17 9:19 ` [PATCH v2 2/4] string-list: replace negative index encoding with "exact_match" parameter shejialuo
@ 2025-09-23 8:14 ` Patrick Steinhardt
2025-10-05 13:31 ` shejialuo
2025-09-23 9:35 ` Karthik Nayak
1 sibling, 1 reply; 44+ messages in thread
From: Patrick Steinhardt @ 2025-09-23 8:14 UTC (permalink / raw)
To: shejialuo; +Cc: git, Junio C Hamano
On Wed, Sep 17, 2025 at 05:19:54PM +0800, shejialuo wrote:
> We would return negative index to indicate exact match by converting the
> original positive index to be "-1 - index" in
> "string_list_find_insert_index", which requires callers to decode this
> information. This approach has several limitations:
>
> 1. It prevents us from using the full range of size_t, which is
> necessary for large string list.
I guess this is more of a theoretical concern. We probably wouldn't
handle it well when our list had 2 billion entries anyway.
> 2. Using int for indices while other parts of the codebase use size_t
> creates signed comparison warnings when these values are compared.
Yup.
I think that the required juggling around negative indices is another
factor here. It's somewhat weird, and while existing callers all handle
this correct I think that it makes for a suboptimal interface.
> To address these limitations, change the function to return size_t for
> the index value and use a separate bool parameter to indicate whether
> the index refers to an existing entry or an insertion point.
>
> In some cases, the callers of "string_list_find_insert_index" only need
> the index position and don't care whether an exact match is found.
> However, "get_entry_index" currently requires a non-NULL "exact_match"
> parameter, forcing these callers to declare unnecessary variables.
> Let's allow callers to pass NULL for the "exact_match" parameter when
> they don't need this information, reducing unnecessary variable
> declarations in calling code.
Makes sense.
I don't really think that my above comments need to be addressed, and
the other patches in this series look good to me. Thanks!
Patrick
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/4] string-list: replace negative index encoding with "exact_match" parameter
2025-09-23 8:14 ` Patrick Steinhardt
@ 2025-10-05 13:31 ` shejialuo
0 siblings, 0 replies; 44+ messages in thread
From: shejialuo @ 2025-10-05 13:31 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Junio C Hamano
On Tue, Sep 23, 2025 at 10:14:55AM +0200, Patrick Steinhardt wrote:
> On Wed, Sep 17, 2025 at 05:19:54PM +0800, shejialuo wrote:
> > We would return negative index to indicate exact match by converting the
> > original positive index to be "-1 - index" in
> > "string_list_find_insert_index", which requires callers to decode this
> > information. This approach has several limitations:
> >
> > 1. It prevents us from using the full range of size_t, which is
> > necessary for large string list.
>
> I guess this is more of a theoretical concern. We probably wouldn't
> handle it well when our list had 2 billion entries anyway.
>
That's right. From the discussion, I would update the commit message in
the next version.
> > 2. Using int for indices while other parts of the codebase use size_t
> > creates signed comparison warnings when these values are compared.
>
> Yup.
>
> I think that the required juggling around negative indices is another
> factor here. It's somewhat weird, and while existing callers all handle
> this correct I think that it makes for a suboptimal interface.
>
That's right, I would improve this in the next version.
Thanks,
Jialuo
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/4] string-list: replace negative index encoding with "exact_match" parameter
2025-09-17 9:19 ` [PATCH v2 2/4] string-list: replace negative index encoding with "exact_match" parameter shejialuo
2025-09-23 8:14 ` Patrick Steinhardt
@ 2025-09-23 9:35 ` Karthik Nayak
2025-09-23 18:48 ` Junio C Hamano
1 sibling, 1 reply; 44+ messages in thread
From: Karthik Nayak @ 2025-09-23 9:35 UTC (permalink / raw)
To: shejialuo, git; +Cc: Junio C Hamano, Patrick Steinhardt
[-- Attachment #1: Type: text/plain, Size: 3025 bytes --]
shejialuo <shejialuo@gmail.com> writes:
> We would return negative index to indicate exact match by converting the
> original positive index to be "-1 - index" in
> "string_list_find_insert_index", which requires callers to decode this
> information. This approach has several limitations:
>
Nit: It would be nice to start by explaining what
"string_list_find_insert_index" does and then talking about the negative
index. Perhaps something like:
The `string_list_find_insert_index()` function is used to determine
the correct insertion index for a new string within the string list.
The function also doubles up to convey if the string is already
existing in the list, this is done by returning a negative index
"-1 -index". Users are expected to decode this information.
> 1. It prevents us from using the full range of size_t, which is
> necessary for large string list.
> 2. Using int for indices while other parts of the codebase use size_t
> creates signed comparison warnings when these values are compared.
>
> To address these limitations, change the function to return size_t for
> the index value and use a separate bool parameter to indicate whether
> the index refers to an existing entry or an insertion point.
>
> In some cases, the callers of "string_list_find_insert_index" only need
> the index position and don't care whether an exact match is found.
> However, "get_entry_index" currently requires a non-NULL "exact_match"
> parameter, forcing these callers to declare unnecessary variables.
> Let's allow callers to pass NULL for the "exact_match" parameter when
> they don't need this information, reducing unnecessary variable
> declarations in calling code.
>
Makes sense, and much cleaner..
> Signed-off-by: shejialuo <shejialuo@gmail.com>
> ---
> add-interactive.c | 7 ++++---
> mailmap.c | 7 +++----
> refs.c | 2 +-
> string-list.c | 14 ++++++--------
> string-list.h | 2 +-
> 5 files changed, 15 insertions(+), 17 deletions(-)
>
> diff --git a/add-interactive.c b/add-interactive.c
> index 3e692b47ec..7c0fd3d218 100644
> --- a/add-interactive.c
> +++ b/add-interactive.c
> @@ -221,7 +221,8 @@ static void find_unique_prefixes(struct prefix_item_list *list)
>
> static ssize_t find_unique(const char *string, struct prefix_item_list *list)
> {
> - int index = string_list_find_insert_index(&list->sorted, string, 1);
> + bool exact_match;
> + int index = string_list_find_insert_index(&list->sorted, string, &exact_match);
> struct string_list_item *item;
>
> if (list->items.nr != list->sorted.nr)
> @@ -229,8 +230,8 @@ static ssize_t find_unique(const char *string, struct prefix_item_list *list)
> " vs %"PRIuMAX")",
> (uintmax_t)list->items.nr, (uintmax_t)list->sorted.nr);
>
> - if (index < 0)
> - item = list->sorted.items[-1 - index].util;
Thanks for this, this is so confusing to read if one doesn't know that
the incoming information is encoded with a special format.
Rest of the patch looks great.
[snip]
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH v2 2/4] string-list: replace negative index encoding with "exact_match" parameter
2025-09-23 9:35 ` Karthik Nayak
@ 2025-09-23 18:48 ` Junio C Hamano
2025-09-24 5:36 ` Jeff King
2025-10-05 14:06 ` shejialuo
0 siblings, 2 replies; 44+ messages in thread
From: Junio C Hamano @ 2025-09-23 18:48 UTC (permalink / raw)
To: Karthik Nayak; +Cc: shejialuo, git, Patrick Steinhardt
Karthik Nayak <karthik.188@gmail.com> writes:
> shejialuo <shejialuo@gmail.com> writes:
>
>> We would return negative index to indicate exact match by converting the
>> original positive index to be "-1 - index" in
>> "string_list_find_insert_index", which requires callers to decode this
>> information. This approach has several limitations:
>>
>
> Nit: It would be nice to start by explaining what
> "string_list_find_insert_index" does and then talking about the negative
> index. Perhaps something like:
>
> The `string_list_find_insert_index()` function is used to determine
> the correct insertion index for a new string within the string list.
> The function also doubles up to convey if the string is already
> existing in the list, this is done by returning a negative index
> "-1 -index". Users are expected to decode this information.
Yeah, such an introductory statement would help those who are not
familiar with the convention. Thanks for suggesting it.
>> 1. It prevents us from using the full range of size_t, which is
>> necessary for large string list.
It is a disease to think that countable things must be counted in
size_t and it needs to be somehow cured.
It is a type to count the size of memory allocations, nothing more.
If you are holding 1000-bytes per the stuff you are counting, you
would not need the full range of size_t --- you'll ran out your
memory way before you fill size_t with the things you are counting.
When there is no external constraints (like you need to specify
exact size to describe a file format to be interoperable), the most
appropriate type to count things in is a platform natural "int".
You wouldn't be handling billions of strings in string-list anyway
(and that is smaller than half of 32-bit size_t; 64-bit size_t is
much larger).
>> 2. Using int for indices while other parts of the codebase use size_t
>> creates signed comparison warnings when these values are compared.
The other thing may be (mis)using size_t when it should not be. If
they were also using "int" that would also squelch the warnings from
"-Wsign-compare".
For an amusing read:
https://lore.kernel.org/lkml/CAHk-=wg+_6eQnLWm-kihFxJo1_EmyLSGruKVGzuRUwACE=osrA@mail.gmail.com/
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/4] string-list: replace negative index encoding with "exact_match" parameter
2025-09-23 18:48 ` Junio C Hamano
@ 2025-09-24 5:36 ` Jeff King
2025-09-24 13:20 ` Junio C Hamano
2025-10-05 14:11 ` shejialuo
2025-10-05 14:06 ` shejialuo
1 sibling, 2 replies; 44+ messages in thread
From: Jeff King @ 2025-09-24 5:36 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Karthik Nayak, shejialuo, git, Patrick Steinhardt
On Tue, Sep 23, 2025 at 11:48:36AM -0700, Junio C Hamano wrote:
> >> 1. It prevents us from using the full range of size_t, which is
> >> necessary for large string list.
>
> It is a disease to think that countable things must be counted in
> size_t and it needs to be somehow cured.
>
> It is a type to count the size of memory allocations, nothing more.
> If you are holding 1000-bytes per the stuff you are counting, you
> would not need the full range of size_t --- you'll ran out your
> memory way before you fill size_t with the things you are counting.
>
> When there is no external constraints (like you need to specify
> exact size to describe a file format to be interoperable), the most
> appropriate type to count things in is a platform natural "int".
> You wouldn't be handling billions of strings in string-list anyway
> (and that is smaller than half of 32-bit size_t; 64-bit size_t is
> much larger).
I agree that size_t is much more than one needs for counting most
things. But the problem is that "int" is much too small, if you are
worried about malicious input causing integer overflows that could cause
memory access errors.
A nice property of counting everything as size_t is that if we are
storing even a single byte per item, we will fail to allocate before
hitting an integer overflow. So no, we do not expect to store billions
of strings. But it is not that hard to convince Git to allocate billions
of items in a list on a 64-bit system with 32-bit ints. And it is nice
to know that iterating over them or trying to extend the array will
never hit an integer overflow bug.
I'd say the "right" size for preventing overflows probably only needs to
be 58-60 bits or so, since usually we are storing more than one byte
(plus overhead). But 64-bit is the natural machine word size that
matches what we want. However, we should _not_ be worried about losing
one bit to making it signed, especially if that makes it less
error-prone to convert instances of "int" to use "size_t". I would be
surprised if an attacker could convince a program to truly use up half
of its address space.
> >> 2. Using int for indices while other parts of the codebase use size_t
> >> creates signed comparison warnings when these values are compared.
>
> The other thing may be (mis)using size_t when it should not be. If
> they were also using "int" that would also squelch the warnings from
> "-Wsign-compare".
So I really care only about truncation and overflow above. Sign issues
can cause bugs, of course, but the real issue is the size mismatch
between "int" and "size_t". And while -Wsign-compare is sometimes an
easy way to find those mismatches (because of the sign mismatch between
them), it may bring more hassle than it's worth.
I didn't participate in the count_t discussion a month or two ago. But I
think it could make sense to have a count_t that is really just a larger
"int", but is still signed.
-Peff
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/4] string-list: replace negative index encoding with "exact_match" parameter
2025-09-24 5:36 ` Jeff King
@ 2025-09-24 13:20 ` Junio C Hamano
2025-09-25 2:50 ` Jeff King
2025-10-08 1:49 ` Collin Funk
2025-10-05 14:11 ` shejialuo
1 sibling, 2 replies; 44+ messages in thread
From: Junio C Hamano @ 2025-09-24 13:20 UTC (permalink / raw)
To: Jeff King; +Cc: Karthik Nayak, shejialuo, git, Patrick Steinhardt
Jeff King <peff@peff.net> writes:
> I agree that size_t is much more than one needs for counting most
> things. But the problem is that "int" is much too small, if you are
> worried about malicious input causing integer overflows that could cause
> memory access errors.
Well, a malicious input can cause overflow/wraparound size_t while
parsing, so I do not think that is really an argument.
The code need to be protected against such overflows either way.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/4] string-list: replace negative index encoding with "exact_match" parameter
2025-09-24 13:20 ` Junio C Hamano
@ 2025-09-25 2:50 ` Jeff King
2025-09-25 13:33 ` Junio C Hamano
2025-10-08 1:49 ` Collin Funk
1 sibling, 1 reply; 44+ messages in thread
From: Jeff King @ 2025-09-25 2:50 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Karthik Nayak, shejialuo, git, Patrick Steinhardt
On Wed, Sep 24, 2025 at 06:20:13AM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > I agree that size_t is much more than one needs for counting most
> > things. But the problem is that "int" is much too small, if you are
> > worried about malicious input causing integer overflows that could cause
> > memory access errors.
>
> Well, a malicious input can cause overflow/wraparound size_t while
> parsing, so I do not think that is really an argument.
>
> The code need to be protected against such overflows either way.
Yes, but it's much harder to wrap a size_t, especially if the code is
allocating as it goes (e.g., a loop expanding an array). Because if
expanding your allocation from "n" to "n+k" items will overflow, then
that implies the current allocation is within "k" items of filling up
the entire memory space.
In many cases "k" is 1, or a small-ish number (like the size of a
struct). In cases where the size is computed purely from untrusted
input, we do need overflow checks (and have added them over the years).
We do those checks in the size_t space, since that's how we count
allocated bytes, even if the thing we are storing is not 1 byte per
item.
If a data structure uses a smaller type (like "int") to do book-keeping
for its allocation, it risks the case where the smaller type wraps, but
is still valid as a size_t. For a signed type and a small "k" this is
often OK (if you wrap 2^31-1 around to -2^31, that ends up as an
implausibly large size_t and the allocation will fail). But there are
cases where you can wrap straight back around to "0", underallocate, and
have an unexpectedly small allocation.
I don't _think_ we have any cases of those anymore, but it's hard to
audit for. And IMHO easier to reason about if we use size_t for
book-keeping.
But if we use size_t inside string_list, say, and you do this:
for (int i = 0; i < list.nr; i++)
printf("got: %s", list->items[i].string);
Now we have another problem. The string list can store more than 2^31
items (even if we do not expect it to). And at some point you start
looking at list->items[-2147483648]. It's at least an out-of-bounds
read, rather than a write, but it's still rather ugly (and a clever
attacker can often stuff buffers to convince it to read whatever values
they want).
If the iterator is a size_t, then overflow in that loop is impossible
(because it implies we've allocated the entire address space). And ditto
if it is a signed 64-bit value (which would implies we've allocated half
of the entire address space).
So yes, I'd agree we need to protect against overflows, and that's what
I'm advocating for. But I think consistently using integer types that
are sized along with our memory is an important part of our strategy
there.
-Peff
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH v2 2/4] string-list: replace negative index encoding with "exact_match" parameter
2025-09-25 2:50 ` Jeff King
@ 2025-09-25 13:33 ` Junio C Hamano
2025-10-09 5:52 ` Jeff King
0 siblings, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2025-09-25 13:33 UTC (permalink / raw)
To: Jeff King; +Cc: Karthik Nayak, shejialuo, git, Patrick Steinhardt
Jeff King <peff@peff.net> writes:
> Yes, but it's much harder to wrap a size_t, especially if the code is
> allocating as it goes (e.g., a loop expanding an array). Because if
> expanding your allocation from "n" to "n+k" items will overflow, then
> that implies the current allocation is within "k" items of filling up
> the entire memory space.
We'd be protecting ourselves by noticing that n+k wraps around with
st_add() and friends, and relying on malloc() and realloc() to
notice and signal an error. Use of size_t to count the number of
things that are getting allocated is not making these any easier to
do compared to the case you were counting in "int", no? Either way
we'd need to be careful.
> But if we use size_t inside string_list, say, and you do this:
>
> for (int i = 0; i < list.nr; i++)
> printf("got: %s", list->items[i].string);
>
> Now we have another problem.
It is obvious that in order to count up from 0 to list.nr, you'd
better use a counter variable of the typeof(list.nr) or manage the
wraparound yourself. So I do not see what is new here. My point
was that use of size_t to _count_ the strings in the string list is
where the problem stems from.
So, I still do not buy into the religion or superstition that things
must be counted in size_t yet.
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH v2 2/4] string-list: replace negative index encoding with "exact_match" parameter
2025-09-25 13:33 ` Junio C Hamano
@ 2025-10-09 5:52 ` Jeff King
0 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2025-10-09 5:52 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Karthik Nayak, shejialuo, git, Patrick Steinhardt
On Thu, Sep 25, 2025 at 06:33:13AM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > Yes, but it's much harder to wrap a size_t, especially if the code is
> > allocating as it goes (e.g., a loop expanding an array). Because if
> > expanding your allocation from "n" to "n+k" items will overflow, then
> > that implies the current allocation is within "k" items of filling up
> > the entire memory space.
>
> We'd be protecting ourselves by noticing that n+k wraps around with
> st_add() and friends, and relying on malloc() and realloc() to
> notice and signal an error. Use of size_t to count the number of
> things that are getting allocated is not making these any easier to
> do compared to the case you were counting in "int", no? Either way
> we'd need to be careful.
Yes and no. I agree it is probably good to check for overflow as a
general principle, even if using size_t. But it is also easy to miss
such spots, and I do think using size_t (or something similarly large)
can provide some backup safety.
If you are getting values "foo" and "bar" from the user and computing a
length like "size_t len = foo * bar", then the size of the type will not
help you, and you need to do checked arithmetic. But I think those cases
are relatively easy to spot. The much more insidious ones are those that
append to a data structure one item at a time. With a type that is close
to the practical size of memory, you will fail to grow your data
structure before you hit the overflow. With a much smaller type like
int (on an LP64 platform), it is much easier for malicious input to
cause funky wrapping[1].
If your position is "it would not matter if we were properly checking
for overflow", then I agree. I just think it's hard to catch all of the
spots. But maybe I am being overly pessimistic. _Most_ of that should go
through ALLOC_GROW() or similar, so the checks could be centralized-ish[2].
-Peff
[1] We have had (and I'd wager probably still have) bugs where an
attacker can wrap around to negative int. We get saved from a
vulnerability here because the negative value is eventually cast to
a gigantic size_t to allocate, which fails. In that sense I think
"unsigned int" is the _most_ dangerous type to use.
[2] I think there are some practical implementation questions here, too.
If we use st_add() etc in ALLOC_GROW(), then we may get caught by
accidental type promotions, where those functions say "sure, it is
OK to increase this len", but then return a size_t which is
truncated when we try to stuff it back in an "int". So to make
ALLOC_GROW() type independent, we probably need to do some more
macro hackery with our checked arithmetic to make sure we are
operating with the same size type that the caller is using. Not
impossible, but something we'd have to be careful to get right.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/4] string-list: replace negative index encoding with "exact_match" parameter
2025-09-24 13:20 ` Junio C Hamano
2025-09-25 2:50 ` Jeff King
@ 2025-10-08 1:49 ` Collin Funk
2025-10-09 5:55 ` Jeff King
1 sibling, 1 reply; 44+ messages in thread
From: Collin Funk @ 2025-10-08 1:49 UTC (permalink / raw)
To: Junio C Hamano
Cc: Jeff King, Karthik Nayak, shejialuo, git, Patrick Steinhardt
Junio C Hamano <gitster@pobox.com> writes:
> Jeff King <peff@peff.net> writes:
>
>> I agree that size_t is much more than one needs for counting most
>> things. But the problem is that "int" is much too small, if you are
>> worried about malicious input causing integer overflows that could cause
>> memory access errors.
>
> Well, a malicious input can cause overflow/wraparound size_t while
> parsing, so I do not think that is really an argument.
>
> The code need to be protected against such overflows either way.
Apologies for jumping into this thread so long after it happened, but I
wanted to voice my agreement with Junio here and mention another
consideration.
In GNU Coreutils and Gnulib we often use 'idx_t', which is a typedef to
the standard signed type 'ptrdiff_t', when we refer to allocation of
objects or indexes.
The rational is written in the header file where it is defined [1].
However, I want to highlight one part that I find most useful:
* Security: Signed types can be checked for overflow via
'-fsanitize=undefined', but unsigned types cannot.
On common platforms, you will never need to allocate more memory than
PTRDIFF_MAX anyways:
$ numfmt --to=iec-i `echo $(((1 << 63) - 1))`
8.0Ei
I think that addresses Jeff's point that 'int' is too small, which I
agree with.
In C23 it is also easy to do wraparound arithmetic on signed integers if
you want to. Here is an example:
$ cat main.c
#include <stdio.h>
#include <inttypes.h>
#include <stddef.h>
#include <stdckdint.h>
int
main (void)
{
ptrdiff_t value = PTRDIFF_MAX;
if (! ckd_add (&value, value, 1))
printf ("No overflow\n");
else
{
/* Or handle overflow. */
printf ("%td\n", value);
printf ("%td\n", PTRDIFF_MIN);
}
return 0;
}
$ gcc -std=gnu23 main.c
$ ./a.out
-9223372036854775808
-9223372036854775808
Paul Eggert wrote some macros to implement these on old compilers which
is very helpful [2] [3]. They only assume that signed integers are two's
complement without padding bits (I would hope that git doesn't have to
support anything else...).
Collin
[1] https://github.com/coreutils/gnulib/blob/master/lib/idx.h
[2] https://github.com/coreutils/gnulib/blob/master/lib/intprops.h
[3] https://github.com/coreutils/gnulib/blob/master/lib/stdckdint.in.h
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH v2 2/4] string-list: replace negative index encoding with "exact_match" parameter
2025-10-08 1:49 ` Collin Funk
@ 2025-10-09 5:55 ` Jeff King
0 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2025-10-09 5:55 UTC (permalink / raw)
To: Collin Funk
Cc: Junio C Hamano, Karthik Nayak, shejialuo, git, Patrick Steinhardt
On Tue, Oct 07, 2025 at 06:49:42PM -0700, Collin Funk wrote:
> In GNU Coreutils and Gnulib we often use 'idx_t', which is a typedef to
> the standard signed type 'ptrdiff_t', when we refer to allocation of
> objects or indexes.
>
> The rational is written in the header file where it is defined [1].
> However, I want to highlight one part that I find most useful:
>
> * Security: Signed types can be checked for overflow via
> '-fsanitize=undefined', but unsigned types cannot.
>
> On common platforms, you will never need to allocate more memory than
> PTRDIFF_MAX anyways:
>
> $ numfmt --to=iec-i `echo $(((1 << 63) - 1))`
> 8.0Ei
>
> I think that addresses Jeff's point that 'int' is too small, which I
> agree with.
Yeah, absolutely. I do not love size_t (and certainly switching signed
"int" to unsigned "size_t" is an easy way to introduce bugs when you
cross the "0" boundary). I'd be very happy with everything using
something like ptrdiff_t, and even hiding it behind idx_t or count_t or
whatever.
-Peff
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/4] string-list: replace negative index encoding with "exact_match" parameter
2025-09-24 5:36 ` Jeff King
2025-09-24 13:20 ` Junio C Hamano
@ 2025-10-05 14:11 ` shejialuo
1 sibling, 0 replies; 44+ messages in thread
From: shejialuo @ 2025-10-05 14:11 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, Karthik Nayak, git, Patrick Steinhardt
On Wed, Sep 24, 2025 at 01:36:01AM -0400, Jeff King wrote:
> On Tue, Sep 23, 2025 at 11:48:36AM -0700, Junio C Hamano wrote:
>
> > >> 1. It prevents us from using the full range of size_t, which is
> > >> necessary for large string list.
> >
> > It is a disease to think that countable things must be counted in
> > size_t and it needs to be somehow cured.
> >
> > It is a type to count the size of memory allocations, nothing more.
> > If you are holding 1000-bytes per the stuff you are counting, you
> > would not need the full range of size_t --- you'll ran out your
> > memory way before you fill size_t with the things you are counting.
> >
> > When there is no external constraints (like you need to specify
> > exact size to describe a file format to be interoperable), the most
> > appropriate type to count things in is a platform natural "int".
> > You wouldn't be handling billions of strings in string-list anyway
> > (and that is smaller than half of 32-bit size_t; 64-bit size_t is
> > much larger).
>
> I agree that size_t is much more than one needs for counting most
> things. But the problem is that "int" is much too small, if you are
> worried about malicious input causing integer overflows that could cause
> memory access errors.
>
> A nice property of counting everything as size_t is that if we are
> storing even a single byte per item, we will fail to allocate before
> hitting an integer overflow. So no, we do not expect to store billions
> of strings. But it is not that hard to convince Git to allocate billions
> of items in a list on a 64-bit system with 32-bit ints. And it is nice
> to know that iterating over them or trying to extend the array will
> never hit an integer overflow bug.
>
Make sense.
> I'd say the "right" size for preventing overflows probably only needs to
> be 58-60 bits or so, since usually we are storing more than one byte
> (plus overhead). But 64-bit is the natural machine word size that
> matches what we want. However, we should _not_ be worried about losing
> one bit to making it signed, especially if that makes it less
> error-prone to convert instances of "int" to use "size_t". I would be
> surprised if an attacker could convince a program to truly use up half
> of its address space.
>
> > >> 2. Using int for indices while other parts of the codebase use size_t
> > >> creates signed comparison warnings when these values are compared.
> >
> > The other thing may be (mis)using size_t when it should not be. If
> > they were also using "int" that would also squelch the warnings from
> > "-Wsign-compare".
>
> So I really care only about truncation and overflow above. Sign issues
> can cause bugs, of course, but the real issue is the size mismatch
> between "int" and "size_t". And while -Wsign-compare is sometimes an
> easy way to find those mismatches (because of the sign mismatch between
> them), it may bring more hassle than it's worth.
>
That's right, I would improve my commit message to show the correct
motivation.
Thanks,
Jialuo
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/4] string-list: replace negative index encoding with "exact_match" parameter
2025-09-23 18:48 ` Junio C Hamano
2025-09-24 5:36 ` Jeff King
@ 2025-10-05 14:06 ` shejialuo
1 sibling, 0 replies; 44+ messages in thread
From: shejialuo @ 2025-10-05 14:06 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Karthik Nayak, git, Patrick Steinhardt
On Tue, Sep 23, 2025 at 11:48:36AM -0700, Junio C Hamano wrote:
> Karthik Nayak <karthik.188@gmail.com> writes:
>
> > shejialuo <shejialuo@gmail.com> writes:
> >
> >> We would return negative index to indicate exact match by converting the
> >> original positive index to be "-1 - index" in
> >> "string_list_find_insert_index", which requires callers to decode this
> >> information. This approach has several limitations:
> >>
> >
> > Nit: It would be nice to start by explaining what
> > "string_list_find_insert_index" does and then talking about the negative
> > index. Perhaps something like:
> >
> > The `string_list_find_insert_index()` function is used to determine
> > the correct insertion index for a new string within the string list.
> > The function also doubles up to convey if the string is already
> > existing in the list, this is done by returning a negative index
> > "-1 -index". Users are expected to decode this information.
>
> Yeah, such an introductory statement would help those who are not
> familiar with the convention. Thanks for suggesting it.
>
> >> 1. It prevents us from using the full range of size_t, which is
> >> necessary for large string list.
>
> It is a disease to think that countable things must be counted in
> size_t and it needs to be somehow cured.
>
> It is a type to count the size of memory allocations, nothing more.
> If you are holding 1000-bytes per the stuff you are counting, you
> would not need the full range of size_t --- you'll ran out your
> memory way before you fill size_t with the things you are counting.
>
Make sense. We don't need the full range of size_t. I would improve
commit message later.
> When there is no external constraints (like you need to specify
> exact size to describe a file format to be interoperable), the most
> appropriate type to count things in is a platform natural "int".
> You wouldn't be handling billions of strings in string-list anyway
> (and that is smaller than half of 32-bit size_t; 64-bit size_t is
> much larger).
>
> >> 2. Using int for indices while other parts of the codebase use size_t
> >> creates signed comparison warnings when these values are compared.
>
> The other thing may be (mis)using size_t when it should not be. If
> they were also using "int" that would also squelch the warnings from
> "-Wsign-compare".
>
At first, I feel quite hard to understand above. After reading below
mail and the blog
https://staticthinking.wordpress.com/2023/07/25/wsign-compare-is-garbage/
I get your point. For sign compare warnings, the most intuitive way is
change all `int`s to `unsigned int`s or change all `unsigned int`s to
`int`s. However, this is a bad way as it does not solve the problem (and
it might cause other problems). Instead of simply making the type match,
we should check the value range.
For some cases, it is always ok to compare `size_t with `int` as long as
`size_t` does not cause overflow. Sign warnings might be a hint. And my
commit message is bad as I just want to express because others are
`unsigned`, so we should change. But the true point is that at now, the
returned index may overflow. Because we may have index that is greater
than what `int` could count. I would improve commit message later.
> For an amusing read:
>
> https://lore.kernel.org/lkml/CAHk-=wg+_6eQnLWm-kihFxJo1_EmyLSGruKVGzuRUwACE=osrA@mail.gmail.com/
Really thanks for this insightful hint.
Jialuo
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v2 3/4] string-list: change "string_list_find_insert_index" return type to "size_t"
2025-09-17 9:18 ` [PATCH v2 " shejialuo
2025-09-17 9:19 ` [PATCH v2 1/4] string-list: use bool instead of int for "exact_match" shejialuo
2025-09-17 9:19 ` [PATCH v2 2/4] string-list: replace negative index encoding with "exact_match" parameter shejialuo
@ 2025-09-17 9:20 ` shejialuo
2025-09-23 9:44 ` Karthik Nayak
2025-09-17 9:20 ` [PATCH v2 4/4] refs: enable sign compare warnings check shejialuo
2025-10-06 6:28 ` [PATCH v3 0/4] enhance string-list API to fix sign compare warnings shejialuo
4 siblings, 1 reply; 44+ messages in thread
From: shejialuo @ 2025-09-17 9:20 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt
As "string_list_find_insert_index" is a simple wrapper of
"get_entry_index" and the return type of "get_entry_index" is already
"size_t", we could simply change its return type to "size_t".
Update all callers to use size_t variables for storing the return value.
The tricky fix is the loop condition in "mailmap.c" to properly handle
"size_t" underflow by changing from `0 <= --i` to `i--`.
Remove "DISABLE_SIGN_COMPARE_WARNINGS" from "mailmap.c" as it's no
longer needed with the proper unsigned types.
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
add-interactive.c | 2 +-
mailmap.c | 5 ++---
refs.c | 4 +---
string-list.c | 4 ++--
string-list.h | 4 ++--
5 files changed, 8 insertions(+), 11 deletions(-)
diff --git a/add-interactive.c b/add-interactive.c
index 7c0fd3d218..19def3168a 100644
--- a/add-interactive.c
+++ b/add-interactive.c
@@ -222,7 +222,7 @@ static void find_unique_prefixes(struct prefix_item_list *list)
static ssize_t find_unique(const char *string, struct prefix_item_list *list)
{
bool exact_match;
- int index = string_list_find_insert_index(&list->sorted, string, &exact_match);
+ size_t index = string_list_find_insert_index(&list->sorted, string, &exact_match);
struct string_list_item *item;
if (list->items.nr != list->sorted.nr)
diff --git a/mailmap.c b/mailmap.c
index 58a4484963..37fd158a51 100644
--- a/mailmap.c
+++ b/mailmap.c
@@ -1,5 +1,4 @@
#define USE_THE_REPOSITORY_VARIABLE
-#define DISABLE_SIGN_COMPARE_WARNINGS
#include "git-compat-util.h"
#include "environment.h"
@@ -244,7 +243,7 @@ static struct string_list_item *lookup_prefix(struct string_list *map,
const char *string, size_t len)
{
bool exact_match;
- int i = string_list_find_insert_index(map, string, &exact_match);
+ size_t i = string_list_find_insert_index(map, string, &exact_match);
if (exact_match) {
if (!string[len])
return &map->items[i];
@@ -266,7 +265,7 @@ static struct string_list_item *lookup_prefix(struct string_list *map,
* overlong key would be inserted, which must come after the
* real location of the key if one exists.
*/
- while (0 <= --i && i < map->nr) {
+ while (i-- && i < map->nr) {
int cmp = strncasecmp(map->items[i].string, string, len);
if (cmp < 0)
/*
diff --git a/refs.c b/refs.c
index f1ff5bf846..a8f06b9a0a 100644
--- a/refs.c
+++ b/refs.c
@@ -1688,8 +1688,6 @@ const char *find_descendant_ref(const char *dirname,
const struct string_list *extras,
const struct string_list *skip)
{
- int pos;
-
if (!extras)
return NULL;
@@ -1699,7 +1697,7 @@ const char *find_descendant_ref(const char *dirname,
* with dirname (remember, dirname includes the trailing
* slash) and is not in skip, then we have a conflict.
*/
- for (pos = string_list_find_insert_index(extras, dirname, NULL);
+ for (size_t pos = string_list_find_insert_index(extras, dirname, NULL);
pos < extras->nr; pos++) {
const char *extra_refname = extras->items[pos].string;
diff --git a/string-list.c b/string-list.c
index c589ab5a2c..08dc00984c 100644
--- a/string-list.c
+++ b/string-list.c
@@ -91,8 +91,8 @@ bool string_list_has_string(const struct string_list *list, const char *string)
return exact_match;
}
-int string_list_find_insert_index(const struct string_list *list, const char *string,
- bool *exact_match)
+size_t string_list_find_insert_index(const struct string_list *list, const char *string,
+ bool *exact_match)
{
return get_entry_index(list, string, exact_match);
}
diff --git a/string-list.h b/string-list.h
index 8830ce671d..6b0a2f4752 100644
--- a/string-list.h
+++ b/string-list.h
@@ -173,8 +173,8 @@ void string_list_remove_empty_items(struct string_list *list, int free_util);
/** Determine if the string_list has a given string or not. */
bool string_list_has_string(const struct string_list *list, const char *string);
-int string_list_find_insert_index(const struct string_list *list, const char *string,
- bool *exact_match);
+size_t string_list_find_insert_index(const struct string_list *list, const char *string,
+ bool *exact_match);
/**
* Insert a new element to the string_list. The returned pointer can
--
2.51.0
^ permalink raw reply related [flat|nested] 44+ messages in thread* Re: [PATCH v2 3/4] string-list: change "string_list_find_insert_index" return type to "size_t"
2025-09-17 9:20 ` [PATCH v2 3/4] string-list: change "string_list_find_insert_index" return type to "size_t" shejialuo
@ 2025-09-23 9:44 ` Karthik Nayak
2025-10-05 9:29 ` shejialuo
0 siblings, 1 reply; 44+ messages in thread
From: Karthik Nayak @ 2025-09-23 9:44 UTC (permalink / raw)
To: shejialuo, git; +Cc: Junio C Hamano, Patrick Steinhardt
[-- Attachment #1: Type: text/plain, Size: 4764 bytes --]
shejialuo <shejialuo@gmail.com> writes:
> As "string_list_find_insert_index" is a simple wrapper of
> "get_entry_index" and the return type of "get_entry_index" is already
> "size_t", we could simply change its return type to "size_t".
>
> Update all callers to use size_t variables for storing the return value.
> The tricky fix is the loop condition in "mailmap.c" to properly handle
> "size_t" underflow by changing from `0 <= --i` to `i--`.
>
> Remove "DISABLE_SIGN_COMPARE_WARNINGS" from "mailmap.c" as it's no
> longer needed with the proper unsigned types.
>
> Signed-off-by: shejialuo <shejialuo@gmail.com>
> ---
> add-interactive.c | 2 +-
> mailmap.c | 5 ++---
> refs.c | 4 +---
> string-list.c | 4 ++--
> string-list.h | 4 ++--
> 5 files changed, 8 insertions(+), 11 deletions(-)
>
> diff --git a/add-interactive.c b/add-interactive.c
> index 7c0fd3d218..19def3168a 100644
> --- a/add-interactive.c
> +++ b/add-interactive.c
> @@ -222,7 +222,7 @@ static void find_unique_prefixes(struct prefix_item_list *list)
> static ssize_t find_unique(const char *string, struct prefix_item_list *list)
> {
> bool exact_match;
> - int index = string_list_find_insert_index(&list->sorted, string, &exact_match);
> + size_t index = string_list_find_insert_index(&list->sorted, string, &exact_match);
> struct string_list_item *item;
>
> if (list->items.nr != list->sorted.nr)
> diff --git a/mailmap.c b/mailmap.c
> index 58a4484963..37fd158a51 100644
> --- a/mailmap.c
> +++ b/mailmap.c
> @@ -1,5 +1,4 @@
> #define USE_THE_REPOSITORY_VARIABLE
> -#define DISABLE_SIGN_COMPARE_WARNINGS
>
> #include "git-compat-util.h"
> #include "environment.h"
> @@ -244,7 +243,7 @@ static struct string_list_item *lookup_prefix(struct string_list *map,
> const char *string, size_t len)
> {
> bool exact_match;
> - int i = string_list_find_insert_index(map, string, &exact_match);
> + size_t i = string_list_find_insert_index(map, string, &exact_match);
> if (exact_match) {
> if (!string[len])
> return &map->items[i];
> @@ -266,7 +265,7 @@ static struct string_list_item *lookup_prefix(struct string_list *map,
> * overlong key would be inserted, which must come after the
> * real location of the key if one exists.
> */
> - while (0 <= --i && i < map->nr) {
> + while (i-- && i < map->nr) {
> int cmp = strncasecmp(map->items[i].string, string, len);
> if (cmp < 0)
So earlier, if `i = 0`, we'd have a negative number and then the loop
would exit. But with `size_t`, i can never be negative. So by using
`i--`, we exit if `i = 0`. Okay
> /*
> diff --git a/refs.c b/refs.c
> index f1ff5bf846..a8f06b9a0a 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1688,8 +1688,6 @@ const char *find_descendant_ref(const char *dirname,
> const struct string_list *extras,
> const struct string_list *skip)
> {
> - int pos;
> -
> if (!extras)
> return NULL;
>
> @@ -1699,7 +1697,7 @@ const char *find_descendant_ref(const char *dirname,
> * with dirname (remember, dirname includes the trailing
> * slash) and is not in skip, then we have a conflict.
> */
> - for (pos = string_list_find_insert_index(extras, dirname, NULL);
> + for (size_t pos = string_list_find_insert_index(extras, dirname, NULL);
> pos < extras->nr; pos++) {
> const char *extra_refname = extras->items[pos].string;
>
> diff --git a/string-list.c b/string-list.c
> index c589ab5a2c..08dc00984c 100644
> --- a/string-list.c
> +++ b/string-list.c
> @@ -91,8 +91,8 @@ bool string_list_has_string(const struct string_list *list, const char *string)
> return exact_match;
> }
>
> -int string_list_find_insert_index(const struct string_list *list, const char *string,
> - bool *exact_match)
> +size_t string_list_find_insert_index(const struct string_list *list, const char *string,
> + bool *exact_match)
> {
> return get_entry_index(list, string, exact_match);
> }
> diff --git a/string-list.h b/string-list.h
> index 8830ce671d..6b0a2f4752 100644
> --- a/string-list.h
> +++ b/string-list.h
> @@ -173,8 +173,8 @@ void string_list_remove_empty_items(struct string_list *list, int free_util);
>
> /** Determine if the string_list has a given string or not. */
> bool string_list_has_string(const struct string_list *list, const char *string);
> -int string_list_find_insert_index(const struct string_list *list, const char *string,
> - bool *exact_match);
> +size_t string_list_find_insert_index(const struct string_list *list, const char *string,
> + bool *exact_match);
>
Super nit: can we also add documentation to this function while we're here.
> /**
> * Insert a new element to the string_list. The returned pointer can
> --
> 2.51.0
The patch looks good to me.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH v2 3/4] string-list: change "string_list_find_insert_index" return type to "size_t"
2025-09-23 9:44 ` Karthik Nayak
@ 2025-10-05 9:29 ` shejialuo
0 siblings, 0 replies; 44+ messages in thread
From: shejialuo @ 2025-10-05 9:29 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, Junio C Hamano, Patrick Steinhardt
On Tue, Sep 23, 2025 at 05:44:42AM -0400, Karthik Nayak wrote:
> > /** Determine if the string_list has a given string or not. */
> > bool string_list_has_string(const struct string_list *list, const char *string);
> > -int string_list_find_insert_index(const struct string_list *list, const char *string,
> > - bool *exact_match);
> > +size_t string_list_find_insert_index(const struct string_list *list, const char *string,
> > + bool *exact_match);
> >
>
> Super nit: can we also add documentation to this function while we're here.
>
Sure, I would update this in the next version.
Thanks,
Jialuo
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v2 4/4] refs: enable sign compare warnings check
2025-09-17 9:18 ` [PATCH v2 " shejialuo
` (2 preceding siblings ...)
2025-09-17 9:20 ` [PATCH v2 3/4] string-list: change "string_list_find_insert_index" return type to "size_t" shejialuo
@ 2025-09-17 9:20 ` shejialuo
2025-10-06 6:28 ` [PATCH v3 0/4] enhance string-list API to fix sign compare warnings shejialuo
4 siblings, 0 replies; 44+ messages in thread
From: shejialuo @ 2025-09-17 9:20 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt
After fixing the tricky compare warning introduced by calling
"string_list_find_insert_index", there are only two loop iterator type
mismatches. Fix them to enable compare warnings check.
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
refs.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/refs.c b/refs.c
index a8f06b9a0a..79069c37b8 100644
--- a/refs.c
+++ b/refs.c
@@ -3,7 +3,6 @@
*/
#define USE_THE_REPOSITORY_VARIABLE
-#define DISABLE_SIGN_COMPARE_WARNINGS
#include "git-compat-util.h"
#include "advice.h"
@@ -2381,7 +2380,7 @@ static int run_transaction_hook(struct ref_transaction *transaction,
struct child_process proc = CHILD_PROCESS_INIT;
struct strbuf buf = STRBUF_INIT;
const char *hook;
- int ret = 0, i;
+ int ret = 0;
hook = find_hook(transaction->ref_store->repo, "reference-transaction");
if (!hook)
@@ -2398,7 +2397,7 @@ static int run_transaction_hook(struct ref_transaction *transaction,
sigchain_push(SIGPIPE, SIG_IGN);
- for (i = 0; i < transaction->nr; i++) {
+ for (size_t i = 0; i < transaction->nr; i++) {
struct ref_update *update = transaction->updates[i];
if (update->flags & REF_LOG_ONLY)
@@ -2791,9 +2790,7 @@ void ref_transaction_for_each_queued_update(struct ref_transaction *transaction,
ref_transaction_for_each_queued_update_fn cb,
void *cb_data)
{
- int i;
-
- for (i = 0; i < transaction->nr; i++) {
+ for (size_t i = 0; i < transaction->nr; i++) {
struct ref_update *update = transaction->updates[i];
cb(update->refname,
--
2.51.0
^ permalink raw reply related [flat|nested] 44+ messages in thread* [PATCH v3 0/4] enhance string-list API to fix sign compare warnings
2025-09-17 9:18 ` [PATCH v2 " shejialuo
` (3 preceding siblings ...)
2025-09-17 9:20 ` [PATCH v2 4/4] refs: enable sign compare warnings check shejialuo
@ 2025-10-06 6:28 ` shejialuo
2025-10-06 6:32 ` [PATCH v3 1/4] string-list: use bool instead of int for "exact_match" shejialuo
` (4 more replies)
4 siblings, 5 replies; 44+ messages in thread
From: shejialuo @ 2025-10-06 6:28 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt, Karthik Nayak, Jeff King
Hi All:
This is a small PATCH to enhance string-list API
"string_list_find_insert_index" which has introduced sign compare
warnings.
---
Changes since v2:
1. Enhance [PATCH v2 2/4] commit message to express the motivation is
avoid overflow.
2. Add comments for `string_list_find_insert_index` function.
---
Changes since v1:
1. Create a new commit which aims at using `bool` for "exact_match"
parameter.
2. Rebase the [PATCH 1/4] and [PATCH 2/4] into a single [PATCH v2 2/4]
commit to avoid confusing the user with the motivation of the
original commit [PATCH 1/4].
3. Enhance the comimt message of [PATCH 2/4] to improve the motivation.
4. Update "i-- > 0" to "i--" for [PATCH 3/4]
Thanks,
Jialuo
shejialuo (4):
string-list: use bool instead of int for "exact_match"
string-list: replace negative index encoding with "exact_match"
parameter
string-list: change "string_list_find_insert_index" return type to
"size_t"
refs: enable sign compare warnings check
add-interactive.c | 7 ++++---
mailmap.c | 10 ++++------
refs.c | 13 ++++---------
string-list.c | 29 ++++++++++++++---------------
string-list.h | 12 +++++++++---
5 files changed, 35 insertions(+), 36 deletions(-)
Range-diff against v2:
1: c3786fa386 = 1: bdcac9b5fa string-list: use bool instead of int for "exact_match"
2: 7ac8fd69c0 ! 2: 0d6d09c8b0 string-list: replace negative index encoding with "exact_match" parameter
@@ Metadata
## Commit message ##
string-list: replace negative index encoding with "exact_match" parameter
- We would return negative index to indicate exact match by converting the
- original positive index to be "-1 - index" in
- "string_list_find_insert_index", which requires callers to decode this
- information. This approach has several limitations:
+ The "string_list_find_insert_index()" function is used to determine
+ the correct insertion index for a new string within the string list.
+ The function also doubles up to convey if the string is already
+ existing in the list, this is done by returning a negative index
+ "-1 -index". Users are expected to decode this information. This
+ approach has several limitations:
- 1. It prevents us from using the full range of size_t, which is
- necessary for large string list.
- 2. Using int for indices while other parts of the codebase use size_t
- creates signed comparison warnings when these values are compared.
+ 1. It requires the callers to look into the detail of the function to
+ understand how to decode the negative index encoding.
+ 2. Using int for indices can cause overflow issues when dealing with
+ large string lists.
To address these limitations, change the function to return size_t for
the index value and use a separate bool parameter to indicate whether
3: 1cf914fab5 ! 3: 9bfe17ab19 string-list: change "string_list_find_insert_index" return type to "size_t"
@@ string-list.h: void string_list_remove_empty_items(struct string_list *list, int
bool string_list_has_string(const struct string_list *list, const char *string);
-int string_list_find_insert_index(const struct string_list *list, const char *string,
- bool *exact_match);
++
++/**
++ * Find the index at which a new element should be inserted into the
++ * string_list to maintain sorted order. If exact_match is not NULL,
++ * it will be set to true if the string already exists in the list.
++ */
+size_t string_list_find_insert_index(const struct string_list *list, const char *string,
+ bool *exact_match);
4: 8a445549dd = 4: 2a602954f2 refs: enable sign compare warnings check
--
2.51.0
^ permalink raw reply [flat|nested] 44+ messages in thread* [PATCH v3 1/4] string-list: use bool instead of int for "exact_match"
2025-10-06 6:28 ` [PATCH v3 0/4] enhance string-list API to fix sign compare warnings shejialuo
@ 2025-10-06 6:32 ` shejialuo
2025-10-06 6:32 ` [PATCH v3 2/4] string-list: replace negative index encoding with "exact_match" parameter shejialuo
` (3 subsequent siblings)
4 siblings, 0 replies; 44+ messages in thread
From: shejialuo @ 2025-10-06 6:32 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt, Karthik Nayak, Jeff King
The "exact_match" parameter in "get_entry_index" is used to indicate
whether a string is found or not, which is fundamentally a true/false
value. As we allow the use of bool, let's use bool instead of int to
make the function more semantically clear.
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
string-list.c | 19 ++++++++++---------
string-list.h | 2 +-
2 files changed, 11 insertions(+), 10 deletions(-)
diff --git a/string-list.c b/string-list.c
index 343cf1ca90..d8da3dd414 100644
--- a/string-list.c
+++ b/string-list.c
@@ -16,7 +16,7 @@ void string_list_init_dup(struct string_list *list)
/* if there is no exact match, point to the index where the entry could be
* inserted */
static size_t get_entry_index(const struct string_list *list, const char *string,
- int *exact_match)
+ bool *exact_match)
{
size_t left = 0, right = list->nr;
compare_strings_fn cmp = list->cmp ? list->cmp : strcmp;
@@ -29,18 +29,18 @@ static size_t get_entry_index(const struct string_list *list, const char *string
else if (compare > 0)
left = middle + 1;
else {
- *exact_match = 1;
+ *exact_match = true;
return middle;
}
}
- *exact_match = 0;
+ *exact_match = false;
return right;
}
static size_t add_entry(struct string_list *list, const char *string)
{
- int exact_match = 0;
+ bool exact_match;
size_t index = get_entry_index(list, string, &exact_match);
if (exact_match)
@@ -68,7 +68,7 @@ struct string_list_item *string_list_insert(struct string_list *list, const char
void string_list_remove(struct string_list *list, const char *string,
int free_util)
{
- int exact_match;
+ bool exact_match;
int i = get_entry_index(list, string, &exact_match);
if (exact_match) {
@@ -82,9 +82,9 @@ void string_list_remove(struct string_list *list, const char *string,
}
}
-int string_list_has_string(const struct string_list *list, const char *string)
+bool string_list_has_string(const struct string_list *list, const char *string)
{
- int exact_match;
+ bool exact_match;
get_entry_index(list, string, &exact_match);
return exact_match;
}
@@ -92,7 +92,7 @@ int string_list_has_string(const struct string_list *list, const char *string)
int string_list_find_insert_index(const struct string_list *list, const char *string,
int negative_existing_index)
{
- int exact_match;
+ bool exact_match;
int index = get_entry_index(list, string, &exact_match);
if (exact_match)
index = -1 - (negative_existing_index ? index : 0);
@@ -101,7 +101,8 @@ int string_list_find_insert_index(const struct string_list *list, const char *st
struct string_list_item *string_list_lookup(struct string_list *list, const char *string)
{
- int exact_match, i = get_entry_index(list, string, &exact_match);
+ bool exact_match;
+ size_t i = get_entry_index(list, string, &exact_match);
if (!exact_match)
return NULL;
return list->items + i;
diff --git a/string-list.h b/string-list.h
index 2b438c7733..bc7f38022e 100644
--- a/string-list.h
+++ b/string-list.h
@@ -172,7 +172,7 @@ void string_list_remove_empty_items(struct string_list *list, int free_util);
/* Use these functions only on sorted lists: */
/** Determine if the string_list has a given string or not. */
-int string_list_has_string(const struct string_list *list, const char *string);
+bool string_list_has_string(const struct string_list *list, const char *string);
int string_list_find_insert_index(const struct string_list *list, const char *string,
int negative_existing_index);
--
2.51.0
^ permalink raw reply related [flat|nested] 44+ messages in thread* [PATCH v3 2/4] string-list: replace negative index encoding with "exact_match" parameter
2025-10-06 6:28 ` [PATCH v3 0/4] enhance string-list API to fix sign compare warnings shejialuo
2025-10-06 6:32 ` [PATCH v3 1/4] string-list: use bool instead of int for "exact_match" shejialuo
@ 2025-10-06 6:32 ` shejialuo
2025-10-06 6:32 ` [PATCH v3 3/4] string-list: change "string_list_find_insert_index" return type to "size_t" shejialuo
` (2 subsequent siblings)
4 siblings, 0 replies; 44+ messages in thread
From: shejialuo @ 2025-10-06 6:32 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt, Karthik Nayak, Jeff King
The "string_list_find_insert_index()" function is used to determine
the correct insertion index for a new string within the string list.
The function also doubles up to convey if the string is already
existing in the list, this is done by returning a negative index
"-1 -index". Users are expected to decode this information. This
approach has several limitations:
1. It requires the callers to look into the detail of the function to
understand how to decode the negative index encoding.
2. Using int for indices can cause overflow issues when dealing with
large string lists.
To address these limitations, change the function to return size_t for
the index value and use a separate bool parameter to indicate whether
the index refers to an existing entry or an insertion point.
In some cases, the callers of "string_list_find_insert_index" only need
the index position and don't care whether an exact match is found.
However, "get_entry_index" currently requires a non-NULL "exact_match"
parameter, forcing these callers to declare unnecessary variables.
Let's allow callers to pass NULL for the "exact_match" parameter when
they don't need this information, reducing unnecessary variable
declarations in calling code.
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
add-interactive.c | 7 ++++---
mailmap.c | 7 +++----
refs.c | 2 +-
string-list.c | 14 ++++++--------
string-list.h | 2 +-
5 files changed, 15 insertions(+), 17 deletions(-)
diff --git a/add-interactive.c b/add-interactive.c
index 6ffe64c38d..21bc3dca96 100644
--- a/add-interactive.c
+++ b/add-interactive.c
@@ -244,7 +244,8 @@ static void find_unique_prefixes(struct prefix_item_list *list)
static ssize_t find_unique(const char *string, struct prefix_item_list *list)
{
- int index = string_list_find_insert_index(&list->sorted, string, 1);
+ bool exact_match;
+ int index = string_list_find_insert_index(&list->sorted, string, &exact_match);
struct string_list_item *item;
if (list->items.nr != list->sorted.nr)
@@ -252,8 +253,8 @@ static ssize_t find_unique(const char *string, struct prefix_item_list *list)
" vs %"PRIuMAX")",
(uintmax_t)list->items.nr, (uintmax_t)list->sorted.nr);
- if (index < 0)
- item = list->sorted.items[-1 - index].util;
+ if (exact_match)
+ item = list->sorted.items[index].util;
else if (index > 0 &&
starts_with(list->sorted.items[index - 1].string, string))
return -1;
diff --git a/mailmap.c b/mailmap.c
index 56c72102d9..58a4484963 100644
--- a/mailmap.c
+++ b/mailmap.c
@@ -243,10 +243,9 @@ void clear_mailmap(struct string_list *map)
static struct string_list_item *lookup_prefix(struct string_list *map,
const char *string, size_t len)
{
- int i = string_list_find_insert_index(map, string, 1);
- if (i < 0) {
- /* exact match */
- i = -1 - i;
+ bool exact_match;
+ int i = string_list_find_insert_index(map, string, &exact_match);
+ if (exact_match) {
if (!string[len])
return &map->items[i];
/*
diff --git a/refs.c b/refs.c
index 750e5db077..74266e6392 100644
--- a/refs.c
+++ b/refs.c
@@ -1721,7 +1721,7 @@ const char *find_descendant_ref(const char *dirname,
* with dirname (remember, dirname includes the trailing
* slash) and is not in skip, then we have a conflict.
*/
- for (pos = string_list_find_insert_index(extras, dirname, 0);
+ for (pos = string_list_find_insert_index(extras, dirname, NULL);
pos < extras->nr; pos++) {
const char *extra_refname = extras->items[pos].string;
diff --git a/string-list.c b/string-list.c
index d8da3dd414..c589ab5a2c 100644
--- a/string-list.c
+++ b/string-list.c
@@ -29,12 +29,14 @@ static size_t get_entry_index(const struct string_list *list, const char *string
else if (compare > 0)
left = middle + 1;
else {
- *exact_match = true;
+ if (exact_match)
+ *exact_match = true;
return middle;
}
}
- *exact_match = false;
+ if (exact_match)
+ *exact_match = false;
return right;
}
@@ -90,13 +92,9 @@ bool string_list_has_string(const struct string_list *list, const char *string)
}
int string_list_find_insert_index(const struct string_list *list, const char *string,
- int negative_existing_index)
+ bool *exact_match)
{
- bool exact_match;
- int index = get_entry_index(list, string, &exact_match);
- if (exact_match)
- index = -1 - (negative_existing_index ? index : 0);
- return index;
+ return get_entry_index(list, string, exact_match);
}
struct string_list_item *string_list_lookup(struct string_list *list, const char *string)
diff --git a/string-list.h b/string-list.h
index bc7f38022e..8830ce671d 100644
--- a/string-list.h
+++ b/string-list.h
@@ -174,7 +174,7 @@ void string_list_remove_empty_items(struct string_list *list, int free_util);
/** Determine if the string_list has a given string or not. */
bool string_list_has_string(const struct string_list *list, const char *string);
int string_list_find_insert_index(const struct string_list *list, const char *string,
- int negative_existing_index);
+ bool *exact_match);
/**
* Insert a new element to the string_list. The returned pointer can
--
2.51.0
^ permalink raw reply related [flat|nested] 44+ messages in thread* [PATCH v3 3/4] string-list: change "string_list_find_insert_index" return type to "size_t"
2025-10-06 6:28 ` [PATCH v3 0/4] enhance string-list API to fix sign compare warnings shejialuo
2025-10-06 6:32 ` [PATCH v3 1/4] string-list: use bool instead of int for "exact_match" shejialuo
2025-10-06 6:32 ` [PATCH v3 2/4] string-list: replace negative index encoding with "exact_match" parameter shejialuo
@ 2025-10-06 6:32 ` shejialuo
2025-10-09 6:03 ` Jeff King
2025-10-06 6:32 ` [PATCH v3 4/4] refs: enable sign compare warnings check shejialuo
2025-10-06 22:09 ` [PATCH v3 0/4] enhance string-list API to fix sign compare warnings Junio C Hamano
4 siblings, 1 reply; 44+ messages in thread
From: shejialuo @ 2025-10-06 6:32 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt, Karthik Nayak, Jeff King
As "string_list_find_insert_index" is a simple wrapper of
"get_entry_index" and the return type of "get_entry_index" is already
"size_t", we could simply change its return type to "size_t".
Update all callers to use size_t variables for storing the return value.
The tricky fix is the loop condition in "mailmap.c" to properly handle
"size_t" underflow by changing from `0 <= --i` to `i--`.
Remove "DISABLE_SIGN_COMPARE_WARNINGS" from "mailmap.c" as it's no
longer needed with the proper unsigned types.
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
add-interactive.c | 2 +-
mailmap.c | 5 ++---
refs.c | 4 +---
string-list.c | 4 ++--
string-list.h | 10 ++++++++--
5 files changed, 14 insertions(+), 11 deletions(-)
diff --git a/add-interactive.c b/add-interactive.c
index 21bc3dca96..68fc09547d 100644
--- a/add-interactive.c
+++ b/add-interactive.c
@@ -245,7 +245,7 @@ static void find_unique_prefixes(struct prefix_item_list *list)
static ssize_t find_unique(const char *string, struct prefix_item_list *list)
{
bool exact_match;
- int index = string_list_find_insert_index(&list->sorted, string, &exact_match);
+ size_t index = string_list_find_insert_index(&list->sorted, string, &exact_match);
struct string_list_item *item;
if (list->items.nr != list->sorted.nr)
diff --git a/mailmap.c b/mailmap.c
index 58a4484963..37fd158a51 100644
--- a/mailmap.c
+++ b/mailmap.c
@@ -1,5 +1,4 @@
#define USE_THE_REPOSITORY_VARIABLE
-#define DISABLE_SIGN_COMPARE_WARNINGS
#include "git-compat-util.h"
#include "environment.h"
@@ -244,7 +243,7 @@ static struct string_list_item *lookup_prefix(struct string_list *map,
const char *string, size_t len)
{
bool exact_match;
- int i = string_list_find_insert_index(map, string, &exact_match);
+ size_t i = string_list_find_insert_index(map, string, &exact_match);
if (exact_match) {
if (!string[len])
return &map->items[i];
@@ -266,7 +265,7 @@ static struct string_list_item *lookup_prefix(struct string_list *map,
* overlong key would be inserted, which must come after the
* real location of the key if one exists.
*/
- while (0 <= --i && i < map->nr) {
+ while (i-- && i < map->nr) {
int cmp = strncasecmp(map->items[i].string, string, len);
if (cmp < 0)
/*
diff --git a/refs.c b/refs.c
index 74266e6392..b7c0aff85e 100644
--- a/refs.c
+++ b/refs.c
@@ -1710,8 +1710,6 @@ const char *find_descendant_ref(const char *dirname,
const struct string_list *extras,
const struct string_list *skip)
{
- int pos;
-
if (!extras)
return NULL;
@@ -1721,7 +1719,7 @@ const char *find_descendant_ref(const char *dirname,
* with dirname (remember, dirname includes the trailing
* slash) and is not in skip, then we have a conflict.
*/
- for (pos = string_list_find_insert_index(extras, dirname, NULL);
+ for (size_t pos = string_list_find_insert_index(extras, dirname, NULL);
pos < extras->nr; pos++) {
const char *extra_refname = extras->items[pos].string;
diff --git a/string-list.c b/string-list.c
index c589ab5a2c..08dc00984c 100644
--- a/string-list.c
+++ b/string-list.c
@@ -91,8 +91,8 @@ bool string_list_has_string(const struct string_list *list, const char *string)
return exact_match;
}
-int string_list_find_insert_index(const struct string_list *list, const char *string,
- bool *exact_match)
+size_t string_list_find_insert_index(const struct string_list *list, const char *string,
+ bool *exact_match)
{
return get_entry_index(list, string, exact_match);
}
diff --git a/string-list.h b/string-list.h
index 8830ce671d..fa6ba07853 100644
--- a/string-list.h
+++ b/string-list.h
@@ -173,8 +173,14 @@ void string_list_remove_empty_items(struct string_list *list, int free_util);
/** Determine if the string_list has a given string or not. */
bool string_list_has_string(const struct string_list *list, const char *string);
-int string_list_find_insert_index(const struct string_list *list, const char *string,
- bool *exact_match);
+
+/**
+ * Find the index at which a new element should be inserted into the
+ * string_list to maintain sorted order. If exact_match is not NULL,
+ * it will be set to true if the string already exists in the list.
+ */
+size_t string_list_find_insert_index(const struct string_list *list, const char *string,
+ bool *exact_match);
/**
* Insert a new element to the string_list. The returned pointer can
--
2.51.0
^ permalink raw reply related [flat|nested] 44+ messages in thread* Re: [PATCH v3 3/4] string-list: change "string_list_find_insert_index" return type to "size_t"
2025-10-06 6:32 ` [PATCH v3 3/4] string-list: change "string_list_find_insert_index" return type to "size_t" shejialuo
@ 2025-10-09 6:03 ` Jeff King
0 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2025-10-09 6:03 UTC (permalink / raw)
To: shejialuo; +Cc: git, Junio C Hamano, Patrick Steinhardt, Karthik Nayak
On Mon, Oct 06, 2025 at 02:32:40PM +0800, shejialuo wrote:
> @@ -266,7 +265,7 @@ static struct string_list_item *lookup_prefix(struct string_list *map,
> * overlong key would be inserted, which must come after the
> * real location of the key if one exists.
> */
> - while (0 <= --i && i < map->nr) {
> + while (i-- && i < map->nr) {
> int cmp = strncasecmp(map->items[i].string, string, len);
> if (cmp < 0)
> /*
BTW, Coverity complains about this line, because "i--" will wrap when
"i" is 0. I think that's OK for our purposes, because we will break out
of the loop on that condition (because it's a post-increment), and we
never look at "i" outside of the loop after that.
So I don't think it's worth changing even to try to shut Coverity up,
but it's an interesting data point in how subtle the signed/unsigned
conversions can be.
-Peff
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v3 4/4] refs: enable sign compare warnings check
2025-10-06 6:28 ` [PATCH v3 0/4] enhance string-list API to fix sign compare warnings shejialuo
` (2 preceding siblings ...)
2025-10-06 6:32 ` [PATCH v3 3/4] string-list: change "string_list_find_insert_index" return type to "size_t" shejialuo
@ 2025-10-06 6:32 ` shejialuo
2025-10-06 22:09 ` [PATCH v3 0/4] enhance string-list API to fix sign compare warnings Junio C Hamano
4 siblings, 0 replies; 44+ messages in thread
From: shejialuo @ 2025-10-06 6:32 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Patrick Steinhardt, Karthik Nayak, Jeff King
After fixing the tricky compare warning introduced by calling
"string_list_find_insert_index", there are only two loop iterator type
mismatches. Fix them to enable compare warnings check.
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
refs.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/refs.c b/refs.c
index b7c0aff85e..668c8ae632 100644
--- a/refs.c
+++ b/refs.c
@@ -3,7 +3,6 @@
*/
#define USE_THE_REPOSITORY_VARIABLE
-#define DISABLE_SIGN_COMPARE_WARNINGS
#include "git-compat-util.h"
#include "advice.h"
@@ -2408,7 +2407,7 @@ static int run_transaction_hook(struct ref_transaction *transaction,
struct child_process proc = CHILD_PROCESS_INIT;
struct strbuf buf = STRBUF_INIT;
const char *hook;
- int ret = 0, i;
+ int ret = 0;
hook = find_hook(transaction->ref_store->repo, "reference-transaction");
if (!hook)
@@ -2425,7 +2424,7 @@ static int run_transaction_hook(struct ref_transaction *transaction,
sigchain_push(SIGPIPE, SIG_IGN);
- for (i = 0; i < transaction->nr; i++) {
+ for (size_t i = 0; i < transaction->nr; i++) {
struct ref_update *update = transaction->updates[i];
if (update->flags & REF_LOG_ONLY)
@@ -2818,9 +2817,7 @@ void ref_transaction_for_each_queued_update(struct ref_transaction *transaction,
ref_transaction_for_each_queued_update_fn cb,
void *cb_data)
{
- int i;
-
- for (i = 0; i < transaction->nr; i++) {
+ for (size_t i = 0; i < transaction->nr; i++) {
struct ref_update *update = transaction->updates[i];
cb(update->refname,
--
2.51.0
^ permalink raw reply related [flat|nested] 44+ messages in thread* Re: [PATCH v3 0/4] enhance string-list API to fix sign compare warnings
2025-10-06 6:28 ` [PATCH v3 0/4] enhance string-list API to fix sign compare warnings shejialuo
` (3 preceding siblings ...)
2025-10-06 6:32 ` [PATCH v3 4/4] refs: enable sign compare warnings check shejialuo
@ 2025-10-06 22:09 ` Junio C Hamano
2025-10-08 1:52 ` Collin Funk
2025-10-08 8:11 ` Karthik Nayak
4 siblings, 2 replies; 44+ messages in thread
From: Junio C Hamano @ 2025-10-06 22:09 UTC (permalink / raw)
To: shejialuo; +Cc: git, Patrick Steinhardt, Karthik Nayak, Jeff King
shejialuo <shejialuo@gmail.com> writes:
> Hi All:
>
> This is a small PATCH to enhance string-list API
> "string_list_find_insert_index" which has introduced sign compare
> warnings.
>
> ---
>
> Changes since v2:
>
> 1. Enhance [PATCH v2 2/4] commit message to express the motivation is
> avoid overflow.
> 2. Add comments for `string_list_find_insert_index` function.
Thanks. I didn't see anything glaringly wrong in this round.
Shall we mark the topic for 'next' now?
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH v3 0/4] enhance string-list API to fix sign compare warnings
2025-10-06 22:09 ` [PATCH v3 0/4] enhance string-list API to fix sign compare warnings Junio C Hamano
@ 2025-10-08 1:52 ` Collin Funk
2025-10-08 15:56 ` Junio C Hamano
2025-10-08 8:11 ` Karthik Nayak
1 sibling, 1 reply; 44+ messages in thread
From: Collin Funk @ 2025-10-08 1:52 UTC (permalink / raw)
To: Junio C Hamano
Cc: shejialuo, git, Patrick Steinhardt, Karthik Nayak, Jeff King
Junio C Hamano <gitster@pobox.com> writes:
> Thanks. I didn't see anything glaringly wrong in this round.
>
> Shall we mark the topic for 'next' now?
These look good to me if you want to add:
Reviewed-by: Collin Funk <collin.funk1@gmail.com>
I wrote a bit about the signed vs. unsigned integer for object/indexes
in another mail. Because of my opinion on that, I usually just ignore
these warnings. But my impression is that my opinion is in the minority
regarding that. :)
Collin
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v3 0/4] enhance string-list API to fix sign compare warnings
2025-10-08 1:52 ` Collin Funk
@ 2025-10-08 15:56 ` Junio C Hamano
0 siblings, 0 replies; 44+ messages in thread
From: Junio C Hamano @ 2025-10-08 15:56 UTC (permalink / raw)
To: Collin Funk; +Cc: shejialuo, git, Patrick Steinhardt, Karthik Nayak, Jeff King
Collin Funk <collin.funk1@gmail.com> writes:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> Thanks. I didn't see anything glaringly wrong in this round.
>>
>> Shall we mark the topic for 'next' now?
>
> These look good to me if you want to add:
>
> Reviewed-by: Collin Funk <collin.funk1@gmail.com>
>
> I wrote a bit about the signed vs. unsigned integer for object/indexes
> in another mail. Because of my opinion on that, I usually just ignore
> these warnings. But my impression is that my opinion is in the minority
> regarding that. :)
Well I probably am in the minority who thinks that it is a disease
or superstition to think that things must be counted in size_t,
which is often unnecessarily big (which I do not mind too much on
modern architectures) and unsigned (which I do mind for the same
reasons as you do) and those infected by it should be somehow cured
;-).
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v3 0/4] enhance string-list API to fix sign compare warnings
2025-10-06 22:09 ` [PATCH v3 0/4] enhance string-list API to fix sign compare warnings Junio C Hamano
2025-10-08 1:52 ` Collin Funk
@ 2025-10-08 8:11 ` Karthik Nayak
1 sibling, 0 replies; 44+ messages in thread
From: Karthik Nayak @ 2025-10-08 8:11 UTC (permalink / raw)
To: Junio C Hamano, shejialuo; +Cc: git, Patrick Steinhardt, Jeff King
[-- Attachment #1: Type: text/plain, Size: 592 bytes --]
Junio C Hamano <gitster@pobox.com> writes:
> shejialuo <shejialuo@gmail.com> writes:
>
>> Hi All:
>>
>> This is a small PATCH to enhance string-list API
>> "string_list_find_insert_index" which has introduced sign compare
>> warnings.
>>
>> ---
>>
>> Changes since v2:
>>
>> 1. Enhance [PATCH v2 2/4] commit message to express the motivation is
>> avoid overflow.
>> 2. Add comments for `string_list_find_insert_index` function.
>
> Thanks. I didn't see anything glaringly wrong in this round.
>
Same, the range-diff looks good to me!
Thanks
> Shall we mark the topic for 'next' now?
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 44+ messages in thread