git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] utf8: change return type of some helpers from int to size_t
@ 2024-03-17  8:27 Mohit Marathe via GitGitGadget
  2024-03-17  8:27 ` [PATCH 1/2] utf8: change type " Mohit Marathe via GitGitGadget
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Mohit Marathe via GitGitGadget @ 2024-03-17  8:27 UTC (permalink / raw)
  To: git; +Cc: Mohit Marathe

Hello,

There are lot of places in the codebase where int is used to store size of
an object instead of size_t (which is better alternative due to reasons like
portability, readability, etc). This patch series accomplishes one such
#TODO comment, which addresses this issue.

I appreciate your review and feedback on this patch series.

Best regards, Mohit Marathe

Mohit Marathe (2):
  utf8: change type from int to size_t
  utf8: make utf8_strnwidth & utf8_strwidth return size_t instead of int

 builtin/blame.c    |  6 +++---
 builtin/branch.c   |  2 +-
 builtin/fetch.c    |  2 +-
 builtin/worktree.c |  4 ++--
 column.c           |  2 +-
 diff.c             |  8 +++++---
 gettext.c          |  2 +-
 gettext.h          |  2 +-
 pretty.c           |  4 ++--
 utf8.c             | 14 +++++---------
 utf8.h             |  4 ++--
 wt-status.c        |  4 ++--
 12 files changed, 26 insertions(+), 28 deletions(-)


base-commit: 2953d95d402b6bff1a59c4712f4d46f1b9ea137f
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1690%2Fmohit-marathe%2Fint-to-size_t-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1690/mohit-marathe/int-to-size_t-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1690
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] utf8: change type from int to size_t
  2024-03-17  8:27 [PATCH 0/2] utf8: change return type of some helpers from int to size_t Mohit Marathe via GitGitGadget
@ 2024-03-17  8:27 ` Mohit Marathe via GitGitGadget
  2024-03-21 10:30   ` Patrick Steinhardt
  2024-03-17  8:27 ` [PATCH 2/2] utf8: make utf8_strnwidth & utf8_strwidth return size_t instead of int Mohit Marathe via GitGitGadget
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 6+ messages in thread
From: Mohit Marathe via GitGitGadget @ 2024-03-17  8:27 UTC (permalink / raw)
  To: git; +Cc: Mohit Marathe, Mohit Marathe

From: Mohit Marathe <mohitmarathe@proton.me>

This update modifies the variable types that are used in calls to the
`utf8_strnwidth` and `utf8_strwidth` functions. This modification is
a proactive measure in anticipation of an upcoming interface change
to these functions in the subsequent patch.

Signed-off-by: Mohit Marathe <mohitmarathe@proton.me>
---
 builtin/blame.c    | 6 +++---
 builtin/branch.c   | 2 +-
 builtin/fetch.c    | 2 +-
 builtin/worktree.c | 4 ++--
 column.c           | 2 +-
 diff.c             | 8 +++++---
 gettext.c          | 2 +-
 gettext.h          | 2 +-
 pretty.c           | 4 ++--
 utf8.c             | 4 ++--
 wt-status.c        | 4 ++--
 11 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index db1f56de61a..a72e2552305 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -53,7 +53,7 @@ static const char *annotate_opt_usage[] = {
 };
 
 static int longest_file;
-static int longest_author;
+static size_t longest_author;
 static int max_orig_digits;
 static int max_digits;
 static int max_score_digits;
@@ -529,7 +529,7 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int
 					name = ci.author_mail.buf;
 				else
 					name = ci.author.buf;
-				pad = longest_author - utf8_strwidth(name);
+				pad = cast_size_t_to_int(longest_author - utf8_strwidth(name));
 				printf(" (%s%*s %10s",
 				       name, pad, "",
 				       format_time(ci.author_time,
@@ -631,7 +631,7 @@ static void find_alignment(struct blame_scoreboard *sb, int *option)
 
 	for (e = sb->ent; e; e = e->next) {
 		struct blame_origin *suspect = e->suspect;
-		int num;
+		size_t num;
 
 		if (compute_auto_abbrev)
 			auto_abbrev = update_auto_abbrev(auto_abbrev, suspect);
diff --git a/builtin/branch.c b/builtin/branch.c
index 8c2305ad2c8..321c3558f2d 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -349,7 +349,7 @@ static int calc_maxwidth(struct ref_array *refs, int remote_bonus)
 	for (i = 0; i < refs->nr; i++) {
 		struct ref_array_item *it = refs->items[i];
 		const char *desc = it->refname;
-		int w;
+		size_t w;
 
 		skip_prefix(it->refname, "refs/heads/", &desc);
 		skip_prefix(it->refname, "refs/remotes/", &desc);
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 46a793411a4..fee992c3c14 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -705,7 +705,7 @@ static int refcol_width(const struct ref *ref_map, int compact_format)
 		max = max * 2 / 3;
 
 	for (ref = ref_map; ref; ref = ref->next) {
-		int rlen, llen = 0, len;
+		size_t rlen, llen = 0, len;
 
 		if (ref->status == REF_STATUS_REJECT_SHALLOW ||
 		    !ref->peer_ref ||
diff --git a/builtin/worktree.c b/builtin/worktree.c
index 9c76b62b02d..bdbf46fb658 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -961,8 +961,8 @@ static void show_worktree_porcelain(struct worktree *wt, int line_terminator)
 static void show_worktree(struct worktree *wt, int path_maxlen, int abbrev_len)
 {
 	struct strbuf sb = STRBUF_INIT;
-	int cur_path_len = strlen(wt->path);
-	int path_adj = cur_path_len - utf8_strwidth(wt->path);
+	size_t cur_path_len = strlen(wt->path);
+	int path_adj = cast_size_t_to_int(cur_path_len - utf8_strwidth(wt->path));
 	const char *reason;
 
 	strbuf_addf(&sb, "%-*s ", 1 + path_maxlen + path_adj, wt->path);
diff --git a/column.c b/column.c
index 50bbccc92ee..ec874036de6 100644
--- a/column.c
+++ b/column.c
@@ -22,7 +22,7 @@ struct column_data {
 };
 
 /* return length of 's' in letters, ANSI escapes stripped */
-static int item_length(const char *s)
+static size_t item_length(const char *s)
 {
 	return utf8_strnwidth(s, strlen(s), 1);
 }
diff --git a/diff.c b/diff.c
index e50def45383..4faf151345a 100644
--- a/diff.c
+++ b/diff.c
@@ -2629,7 +2629,8 @@ void print_stat_summary(FILE *fp, int files,
 
 static void show_stats(struct diffstat_t *data, struct diff_options *options)
 {
-	int i, len, add, del, adds = 0, dels = 0;
+	int i, add, del, adds = 0, dels = 0;
+	size_t len = 0;
 	uintmax_t max_change = 0, max_len = 0;
 	int total_files = data->nr, count;
 	int width, name_width, graph_width, number_width = 0, bin_width = 0;
@@ -2780,7 +2781,8 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 		char *name = file->print_name;
 		uintmax_t added = file->added;
 		uintmax_t deleted = file->deleted;
-		int name_len, padding;
+		size_t name_len;
+		int padding;
 
 		if (!file->is_interesting && (added + deleted == 0))
 			continue;
@@ -2809,7 +2811,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 			if (slash)
 				name = slash;
 		}
-		padding = len - utf8_strwidth(name);
+		padding = cast_size_t_to_int(len - utf8_strwidth(name));
 		if (padding < 0)
 			padding = 0;
 
diff --git a/gettext.c b/gettext.c
index 57facbc21ec..5a77b4f7202 100644
--- a/gettext.c
+++ b/gettext.c
@@ -127,7 +127,7 @@ void git_setup_gettext(void)
 }
 
 /* return the number of columns of string 's' in current locale */
-int gettext_width(const char *s)
+size_t gettext_width(const char *s)
 {
 	static int is_utf8 = -1;
 	if (is_utf8 == -1)
diff --git a/gettext.h b/gettext.h
index 484cafa5628..f161a21b45c 100644
--- a/gettext.h
+++ b/gettext.h
@@ -31,7 +31,7 @@
 #ifndef NO_GETTEXT
 extern int git_gettext_enabled;
 void git_setup_gettext(void);
-int gettext_width(const char *s);
+size_t gettext_width(const char *s);
 #else
 #define git_gettext_enabled (0)
 static inline void git_setup_gettext(void)
diff --git a/pretty.c b/pretty.c
index bdbed4295aa..f03493c74b1 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1781,7 +1781,7 @@ static size_t format_and_pad_commit(struct strbuf *sb, /* in UTF-8 */
 
 	if (padding < 0) {
 		const char *start = strrchr(sb->buf, '\n');
-		int occupied;
+		size_t occupied;
 		if (!start)
 			start = sb->buf;
 		occupied = utf8_strnwidth(start, strlen(start), 1);
@@ -1802,7 +1802,7 @@ static size_t format_and_pad_commit(struct strbuf *sb, /* in UTF-8 */
 		placeholder++;
 		total_consumed++;
 	}
-	len = utf8_strnwidth(local_sb.buf, local_sb.len, 1);
+	len = cast_size_t_to_int(utf8_strnwidth(local_sb.buf, local_sb.len, 1));
 
 	if (c->flush_type == flush_left_and_steal) {
 		const char *ch = sb->buf + sb->len - 1;
diff --git a/utf8.c b/utf8.c
index 6bfaefa28eb..8ccdf684e07 100644
--- a/utf8.c
+++ b/utf8.c
@@ -466,7 +466,7 @@ int utf8_fprintf(FILE *stream, const char *format, ...)
 
 	columns = fputs(buf.buf, stream);
 	if (0 <= columns) /* keep the error from the I/O */
-		columns = utf8_strwidth(buf.buf);
+		columns = cast_size_t_to_int(utf8_strwidth(buf.buf));
 	strbuf_release(&buf);
 	return columns;
 }
@@ -806,7 +806,7 @@ void strbuf_utf8_align(struct strbuf *buf, align_type position, unsigned int wid
 		       const char *s)
 {
 	size_t slen = strlen(s);
-	int display_len = utf8_strnwidth(s, slen, 0);
+	size_t display_len = utf8_strnwidth(s, slen, 0);
 	int utf8_compensation = slen - display_len;
 
 	if (display_len >= width) {
diff --git a/wt-status.c b/wt-status.c
index 7108a92b52c..c847b4bb5ed 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -325,7 +325,7 @@ static int maxwidth(const char *(*label)(int), int minval, int maxval)
 
 	for (i = minval; i <= maxval; i++) {
 		const char *s = label(i);
-		int len = s ? utf8_strwidth(s) : 0;
+		size_t len = s ? utf8_strwidth(s) : 0;
 		if (len > result)
 			result = len;
 	}
@@ -341,7 +341,7 @@ static void wt_longstatus_print_unmerged_data(struct wt_status *s,
 	static char *padding;
 	static int label_width;
 	const char *one, *how;
-	int len;
+	size_t len;
 
 	if (!padding) {
 		label_width = maxwidth(wt_status_unmerged_status_string, 1, 7);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] utf8: make utf8_strnwidth & utf8_strwidth return size_t instead of int
  2024-03-17  8:27 [PATCH 0/2] utf8: change return type of some helpers from int to size_t Mohit Marathe via GitGitGadget
  2024-03-17  8:27 ` [PATCH 1/2] utf8: change type " Mohit Marathe via GitGitGadget
@ 2024-03-17  8:27 ` Mohit Marathe via GitGitGadget
  2024-03-17 16:51 ` [PATCH 0/2] utf8: change return type of some helpers from int to size_t Junio C Hamano
  2024-03-21 10:30 ` Patrick Steinhardt
  3 siblings, 0 replies; 6+ messages in thread
From: Mohit Marathe via GitGitGadget @ 2024-03-17  8:27 UTC (permalink / raw)
  To: git; +Cc: Mohit Marathe, Mohit Marathe

From: Mohit Marathe <mohitmarathe@proton.me>

This patch addresses the TODO comment of changing the return types
of these functions from int to size_t.

Signed-off-by: Mohit Marathe <mohitmarathe@proton.me>
---
 utf8.c | 10 +++-------
 utf8.h |  4 ++--
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/utf8.c b/utf8.c
index 8ccdf684e07..050fc8b3cdf 100644
--- a/utf8.c
+++ b/utf8.c
@@ -206,7 +206,7 @@ int utf8_width(const char **start, size_t *remainder_p)
  * string, assuming that the string is utf8.  Returns strlen() instead
  * if the string does not look like a valid utf8 string.
  */
-int utf8_strnwidth(const char *string, size_t len, int skip_ansi)
+size_t utf8_strnwidth(const char *string, size_t len, int skip_ansi)
 {
 	const char *orig = string;
 	size_t width = 0;
@@ -224,14 +224,10 @@ int utf8_strnwidth(const char *string, size_t len, int skip_ansi)
 			width += glyph_width;
 	}
 
-	/*
-	 * TODO: fix the interface of this function and `utf8_strwidth()` to
-	 * return `size_t` instead of `int`.
-	 */
-	return cast_size_t_to_int(string ? width : len);
+	return string ? width : len;
 }
 
-int utf8_strwidth(const char *string)
+size_t utf8_strwidth(const char *string)
 {
 	return utf8_strnwidth(string, strlen(string), 0);
 }
diff --git a/utf8.h b/utf8.h
index 35df76086a6..cae10d5ac1f 100644
--- a/utf8.h
+++ b/utf8.h
@@ -7,8 +7,8 @@ typedef unsigned int ucs_char_t;  /* assuming 32bit int */
 
 size_t display_mode_esc_sequence_len(const char *s);
 int utf8_width(const char **start, size_t *remainder_p);
-int utf8_strnwidth(const char *string, size_t len, int skip_ansi);
-int utf8_strwidth(const char *string);
+size_t utf8_strnwidth(const char *string, size_t len, int skip_ansi);
+size_t utf8_strwidth(const char *string);
 int is_utf8(const char *text);
 int is_encoding_utf8(const char *name);
 int same_encoding(const char *, const char *);
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] utf8: change return type of some helpers from int to size_t
  2024-03-17  8:27 [PATCH 0/2] utf8: change return type of some helpers from int to size_t Mohit Marathe via GitGitGadget
  2024-03-17  8:27 ` [PATCH 1/2] utf8: change type " Mohit Marathe via GitGitGadget
  2024-03-17  8:27 ` [PATCH 2/2] utf8: make utf8_strnwidth & utf8_strwidth return size_t instead of int Mohit Marathe via GitGitGadget
@ 2024-03-17 16:51 ` Junio C Hamano
  2024-03-21 10:30 ` Patrick Steinhardt
  3 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2024-03-17 16:51 UTC (permalink / raw)
  To: Mohit Marathe via GitGitGadget; +Cc: git, Mohit Marathe

"Mohit Marathe via GitGitGadget" <gitgitgadget@gmail.com> writes:

> There are lot of places in the codebase where int is used to store size of
> an object instead of size_t (which is better alternative due to reasons like
> portability, readability, etc). This patch series accomplishes one such
> #TODO comment, which addresses this issue.

Nice. In principle this is the right thing to do.

To reviewers, if I were reviewing this series (which I will not in
the next 24 hours), I'd pay particular attention to the callers that
currently accept "int" and use negative return value as an sign of a
failure.  Moving to "size_t" from "int" will break such callers.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] utf8: change type from int to size_t
  2024-03-17  8:27 ` [PATCH 1/2] utf8: change type " Mohit Marathe via GitGitGadget
@ 2024-03-21 10:30   ` Patrick Steinhardt
  0 siblings, 0 replies; 6+ messages in thread
From: Patrick Steinhardt @ 2024-03-21 10:30 UTC (permalink / raw)
  To: Mohit Marathe via GitGitGadget; +Cc: git, Mohit Marathe

[-- Attachment #1: Type: text/plain, Size: 11686 bytes --]

On Sun, Mar 17, 2024 at 08:27:50AM +0000, Mohit Marathe via GitGitGadget wrote:
> From: Mohit Marathe <mohitmarathe@proton.me>
> 
> This update modifies the variable types that are used in calls to the
> `utf8_strnwidth` and `utf8_strwidth` functions. This modification is
> a proactive measure in anticipation of an upcoming interface change
> to these functions in the subsequent patch.

It would help to mention `utf8_str{,n}width()` in the commit subject so
that it's immediately obvious which functions get touched by this
commit.

> Signed-off-by: Mohit Marathe <mohitmarathe@proton.me>
> ---
>  builtin/blame.c    | 6 +++---
>  builtin/branch.c   | 2 +-
>  builtin/fetch.c    | 2 +-
>  builtin/worktree.c | 4 ++--
>  column.c           | 2 +-
>  diff.c             | 8 +++++---
>  gettext.c          | 2 +-
>  gettext.h          | 2 +-
>  pretty.c           | 4 ++--
>  utf8.c             | 4 ++--
>  wt-status.c        | 4 ++--
>  11 files changed, 21 insertions(+), 19 deletions(-)
> 
> diff --git a/builtin/blame.c b/builtin/blame.c
> index db1f56de61a..a72e2552305 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -53,7 +53,7 @@ static const char *annotate_opt_usage[] = {
>  };
>  
>  static int longest_file;

We also end up assigning `longest_file = num`, where `num` became a
`size_t` now. We should likely change this variable's type to `size_t`
as well.

> -static int longest_author;
> +static size_t longest_author;
>  static int max_orig_digits;
>  static int max_digits;
>  static int max_score_digits;
> @@ -529,7 +529,7 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int
>  					name = ci.author_mail.buf;
>  				else
>  					name = ci.author.buf;
> -				pad = longest_author - utf8_strwidth(name);
> +				pad = cast_size_t_to_int(longest_author - utf8_strwidth(name));

I feel like this cast here is somewhat dubious. Both `longest_author`
and the return value of `utf8_strwidth()` are `size_t` now. So an
alternative way to handle this would be to assign the return value to a
temporary variable and then `if (longest_author > name_len) BUG(...)`.

>  				printf(" (%s%*s %10s",
>  				       name, pad, "",
>  				       format_time(ci.author_time,
> @@ -631,7 +631,7 @@ static void find_alignment(struct blame_scoreboard *sb, int *option)

A few lines before this hunk we declare `longest_src_lines` and
`longest_dst_lines` as `int`, but we assign `num` to them. We should fix
those to become `size_t`, as well. This will bubble up so that we also
have to fix `max_orig_digits` and `max_digits`.

>  	for (e = sb->ent; e; e = e->next) {
>  		struct blame_origin *suspect = e->suspect;
> -		int num;
> +		size_t num;
>  
>  		if (compute_auto_abbrev)
>  			auto_abbrev = update_auto_abbrev(auto_abbrev, suspect);
> diff --git a/builtin/branch.c b/builtin/branch.c
> index 8c2305ad2c8..321c3558f2d 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -349,7 +349,7 @@ static int calc_maxwidth(struct ref_array *refs, int remote_bonus)
>  	for (i = 0; i < refs->nr; i++) {
>  		struct ref_array_item *it = refs->items[i];
>  		const char *desc = it->refname;
> -		int w;
> +		size_t w;

This isn't sufficient as we end up assignign `max = w`, where `max` is
declared as `int`. We would thus also have to change that variable and
then change the return value of `calc_maxwidth()` itself to be `size_t`.

Furthermore, we do `w += remote_bonus` in this function, where
`remote_bonus` is of type `int`. Is it guaranteed that this variable is
always positive? If so, it should likely become a `size_t`, too.
Otherwise we'll need to have some safety checks that the result does not
wrap.

>  		skip_prefix(it->refname, "refs/heads/", &desc);
>  		skip_prefix(it->refname, "refs/remotes/", &desc);
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 46a793411a4..fee992c3c14 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -705,7 +705,7 @@ static int refcol_width(const struct ref *ref_map, int compact_format)
>  		max = max * 2 / 3;
>  
>  	for (ref = ref_map; ref; ref = ref->next) {
> -		int rlen, llen = 0, len;
> +		size_t rlen, llen = 0, len;

We assign `width = rlen` further down, but `width` is an `int`. It
should likely be changed to become a `size_t` as well, which will
require us to change the return value of this function.

>  		if (ref->status == REF_STATUS_REJECT_SHALLOW ||
>  		    !ref->peer_ref ||
> diff --git a/builtin/worktree.c b/builtin/worktree.c
> index 9c76b62b02d..bdbf46fb658 100644
> --- a/builtin/worktree.c
> +++ b/builtin/worktree.c
> @@ -961,8 +961,8 @@ static void show_worktree_porcelain(struct worktree *wt, int line_terminator)
>  static void show_worktree(struct worktree *wt, int path_maxlen, int abbrev_len)
>  {
>  	struct strbuf sb = STRBUF_INIT;
> -	int cur_path_len = strlen(wt->path);
> -	int path_adj = cur_path_len - utf8_strwidth(wt->path);
> +	size_t cur_path_len = strlen(wt->path);
> +	int path_adj = cast_size_t_to_int(cur_path_len - utf8_strwidth(wt->path));
>  	const char *reason;

I was wondering whether there should be a sanity check here for whether
`cur_path_len >= path_adj`. But given that both measure the length of
`wt->path`, and given that `utf8_strwidth()` should only ever return a
value smaller than the original length, I think this should be fine.

>  	strbuf_addf(&sb, "%-*s ", 1 + path_maxlen + path_adj, wt->path);
> diff --git a/column.c b/column.c
> index 50bbccc92ee..ec874036de6 100644
> --- a/column.c
> +++ b/column.c
> @@ -22,7 +22,7 @@ struct column_data {
>  };
>  
>  /* return length of 's' in letters, ANSI escapes stripped */
> -static int item_length(const char *s)
> +static size_t item_length(const char *s)
>  {
>  	return utf8_strnwidth(s, strlen(s), 1);
>  }

The value of this function is assigned to `struct column_data::len`,
which is an array of `int`. That type needs to change.

> diff --git a/diff.c b/diff.c
> index e50def45383..4faf151345a 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -2629,7 +2629,8 @@ void print_stat_summary(FILE *fp, int files,
>  
>  static void show_stats(struct diffstat_t *data, struct diff_options *options)
>  {
> -	int i, len, add, del, adds = 0, dels = 0;
> +	int i, add, del, adds = 0, dels = 0;
> +	size_t len = 0;
>  	uintmax_t max_change = 0, max_len = 0;

It's somewhat weird that `max_len` is declared as `uintmax_t`. Should it
become a `size_t`, as well?

>  	int total_files = data->nr, count;
>  	int width, name_width, graph_width, number_width = 0, bin_width = 0;
> @@ -2780,7 +2781,8 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
>  		char *name = file->print_name;
>  		uintmax_t added = file->added;
>  		uintmax_t deleted = file->deleted;
> -		int name_len, padding;
> +		size_t name_len;

We compare `name_len` with `name_width`. The latter should likely become
a `size_t`, as well.

> +		int padding;
>  
>  		if (!file->is_interesting && (added + deleted == 0))
>  			continue;
> @@ -2809,7 +2811,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
>  			if (slash)
>  				name = slash;
>  		}
> -		padding = len - utf8_strwidth(name);
> +		padding = cast_size_t_to_int(len - utf8_strwidth(name));

I think we shouldn't abuse `cast_size_t_to_int()` like this but instead
explicitly check whether `len >= len(name)`.

>  		if (padding < 0)
>  			padding = 0;
>  
> diff --git a/gettext.c b/gettext.c
> index 57facbc21ec..5a77b4f7202 100644
> --- a/gettext.c
> +++ b/gettext.c
> @@ -127,7 +127,7 @@ void git_setup_gettext(void)
>  }
>  
>  /* return the number of columns of string 's' in current locale */
> -int gettext_width(const char *s)
> +size_t gettext_width(const char *s)

We also need to adjust the callsite of this function. There's only a
single one in "builtin/fetch.c::display_ref_update()".

>  {
>  	static int is_utf8 = -1;
>  	if (is_utf8 == -1)
> diff --git a/gettext.h b/gettext.h
> index 484cafa5628..f161a21b45c 100644
> --- a/gettext.h
> +++ b/gettext.h
> @@ -31,7 +31,7 @@
>  #ifndef NO_GETTEXT
>  extern int git_gettext_enabled;
>  void git_setup_gettext(void);
> -int gettext_width(const char *s);
> +size_t gettext_width(const char *s);
>  #else
>  #define git_gettext_enabled (0)
>  static inline void git_setup_gettext(void)
> diff --git a/pretty.c b/pretty.c
> index bdbed4295aa..f03493c74b1 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -1781,7 +1781,7 @@ static size_t format_and_pad_commit(struct strbuf *sb, /* in UTF-8 */
>  
>  	if (padding < 0) {
>  		const char *start = strrchr(sb->buf, '\n');
> -		int occupied;
> +		size_t occupied;
>  		if (!start)
>  			start = sb->buf;
>  		occupied = utf8_strnwidth(start, strlen(start), 1);

We're assigning `padding = (-padding) - occupied` next, where `padding`
is of type `int`. We should add some safety checks here.

> @@ -1802,7 +1802,7 @@ static size_t format_and_pad_commit(struct strbuf *sb, /* in UTF-8 */
>  		placeholder++;
>  		total_consumed++;
>  	}
> -	len = utf8_strnwidth(local_sb.buf, local_sb.len, 1);
> +	len = cast_size_t_to_int(utf8_strnwidth(local_sb.buf, local_sb.len, 1));

Why did you decide to use `cast_size_t_to_int()` instead of converting
`len` to be of type `size_t`?

>  	if (c->flush_type == flush_left_and_steal) {
>  		const char *ch = sb->buf + sb->len - 1;
> diff --git a/utf8.c b/utf8.c
> index 6bfaefa28eb..8ccdf684e07 100644
> --- a/utf8.c
> +++ b/utf8.c
> @@ -466,7 +466,7 @@ int utf8_fprintf(FILE *stream, const char *format, ...)
>  
>  	columns = fputs(buf.buf, stream);
>  	if (0 <= columns) /* keep the error from the I/O */
> -		columns = utf8_strwidth(buf.buf);
> +		columns = cast_size_t_to_int(utf8_strwidth(buf.buf));
>  	strbuf_release(&buf);
>  	return columns;
>  }
> @@ -806,7 +806,7 @@ void strbuf_utf8_align(struct strbuf *buf, align_type position, unsigned int wid
>  		       const char *s)
>  {
>  	size_t slen = strlen(s);
> -	int display_len = utf8_strnwidth(s, slen, 0);
> +	size_t display_len = utf8_strnwidth(s, slen, 0);
>  	int utf8_compensation = slen - display_len;
>  
>  	if (display_len >= width) {

A few lines further down we have:

    int left = (width - dispay_len) / 2;

This should be changed to become a `size_t`.

> diff --git a/wt-status.c b/wt-status.c
> index 7108a92b52c..c847b4bb5ed 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -325,7 +325,7 @@ static int maxwidth(const char *(*label)(int), int minval, int maxval)
>  
>  	for (i = minval; i <= maxval; i++) {
>  		const char *s = label(i);
> -		int len = s ? utf8_strwidth(s) : 0;
> +		size_t len = s ? utf8_strwidth(s) : 0;
>  		if (len > result)
>  			result = len;

`result` is of type `int`, so we'd have to change both its type as well
as the return type of `maxwidth()`.

>  	}
> @@ -341,7 +341,7 @@ static void wt_longstatus_print_unmerged_data(struct wt_status *s,
>  	static char *padding;
>  	static int label_width;
>  	const char *one, *how;
> -	int len;
> +	size_t len;
>  
>  	if (!padding) {
>  		label_width = maxwidth(wt_status_unmerged_status_string, 1, 7);

`len` gets passed to the formatter "%.*s". Quoting fprintf(3p):

    A field width, or precision, or both, may be indicated by an
    <asterisk> ('*'). In this case an argument of type int supplies the
    field width or precision.

So this must be an `int` or otherwise bad things will happen depending
on the size of those types on your platform.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] utf8: change return type of some helpers from int to size_t
  2024-03-17  8:27 [PATCH 0/2] utf8: change return type of some helpers from int to size_t Mohit Marathe via GitGitGadget
                   ` (2 preceding siblings ...)
  2024-03-17 16:51 ` [PATCH 0/2] utf8: change return type of some helpers from int to size_t Junio C Hamano
@ 2024-03-21 10:30 ` Patrick Steinhardt
  3 siblings, 0 replies; 6+ messages in thread
From: Patrick Steinhardt @ 2024-03-21 10:30 UTC (permalink / raw)
  To: Mohit Marathe via GitGitGadget; +Cc: git, Mohit Marathe

[-- Attachment #1: Type: text/plain, Size: 1144 bytes --]

On Sun, Mar 17, 2024 at 08:27:49AM +0000, Mohit Marathe via GitGitGadget wrote:
> Hello,
> 
> There are lot of places in the codebase where int is used to store size of
> an object instead of size_t (which is better alternative due to reasons like
> portability, readability, etc). This patch series accomplishes one such
> #TODO comment, which addresses this issue.
> 
> I appreciate your review and feedback on this patch series.

I really appreciate you working on this -- our mismatch of types really
irks me a ton, so I'm very happy to see some fixes. I've provided a
bunch of feedback for your patch 1/2, where we need to be quite a bit
more careful about the blast radius of each of the changes.

I'd recommend to split that patch up into multiple parts. While each of
the changes is basically only changing some types around, it's rather
involved to review as you have to also investigate the vicinity of each
change quite vigorously. So at the very least I would recommend to split
out patches for every change that requires further changes, e.g. to the
return types of additional functions.

Thanks!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-03-21 10:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-17  8:27 [PATCH 0/2] utf8: change return type of some helpers from int to size_t Mohit Marathe via GitGitGadget
2024-03-17  8:27 ` [PATCH 1/2] utf8: change type " Mohit Marathe via GitGitGadget
2024-03-21 10:30   ` Patrick Steinhardt
2024-03-17  8:27 ` [PATCH 2/2] utf8: make utf8_strnwidth & utf8_strwidth return size_t instead of int Mohit Marathe via GitGitGadget
2024-03-17 16:51 ` [PATCH 0/2] utf8: change return type of some helpers from int to size_t Junio C Hamano
2024-03-21 10:30 ` Patrick Steinhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).