From: Patrick Steinhardt <ps@pks.im>
To: Mohit Marathe via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Mohit Marathe <mohitmarathe@proton.me>
Subject: Re: [PATCH 1/2] utf8: change type from int to size_t
Date: Thu, 21 Mar 2024 11:30:19 +0100 [thread overview]
Message-ID: <ZfwMO-6IEXDm3z1b@tanuki> (raw)
In-Reply-To: <2e89da2e72b460228b3f77b1a5168f0a1fe0adcf.1710664071.git.gitgitgadget@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 11686 bytes --]
On Sun, Mar 17, 2024 at 08:27:50AM +0000, Mohit Marathe via GitGitGadget wrote:
> From: Mohit Marathe <mohitmarathe@proton.me>
>
> This update modifies the variable types that are used in calls to the
> `utf8_strnwidth` and `utf8_strwidth` functions. This modification is
> a proactive measure in anticipation of an upcoming interface change
> to these functions in the subsequent patch.
It would help to mention `utf8_str{,n}width()` in the commit subject so
that it's immediately obvious which functions get touched by this
commit.
> Signed-off-by: Mohit Marathe <mohitmarathe@proton.me>
> ---
> builtin/blame.c | 6 +++---
> builtin/branch.c | 2 +-
> builtin/fetch.c | 2 +-
> builtin/worktree.c | 4 ++--
> column.c | 2 +-
> diff.c | 8 +++++---
> gettext.c | 2 +-
> gettext.h | 2 +-
> pretty.c | 4 ++--
> utf8.c | 4 ++--
> wt-status.c | 4 ++--
> 11 files changed, 21 insertions(+), 19 deletions(-)
>
> diff --git a/builtin/blame.c b/builtin/blame.c
> index db1f56de61a..a72e2552305 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -53,7 +53,7 @@ static const char *annotate_opt_usage[] = {
> };
>
> static int longest_file;
We also end up assigning `longest_file = num`, where `num` became a
`size_t` now. We should likely change this variable's type to `size_t`
as well.
> -static int longest_author;
> +static size_t longest_author;
> static int max_orig_digits;
> static int max_digits;
> static int max_score_digits;
> @@ -529,7 +529,7 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int
> name = ci.author_mail.buf;
> else
> name = ci.author.buf;
> - pad = longest_author - utf8_strwidth(name);
> + pad = cast_size_t_to_int(longest_author - utf8_strwidth(name));
I feel like this cast here is somewhat dubious. Both `longest_author`
and the return value of `utf8_strwidth()` are `size_t` now. So an
alternative way to handle this would be to assign the return value to a
temporary variable and then `if (longest_author > name_len) BUG(...)`.
> printf(" (%s%*s %10s",
> name, pad, "",
> format_time(ci.author_time,
> @@ -631,7 +631,7 @@ static void find_alignment(struct blame_scoreboard *sb, int *option)
A few lines before this hunk we declare `longest_src_lines` and
`longest_dst_lines` as `int`, but we assign `num` to them. We should fix
those to become `size_t`, as well. This will bubble up so that we also
have to fix `max_orig_digits` and `max_digits`.
> for (e = sb->ent; e; e = e->next) {
> struct blame_origin *suspect = e->suspect;
> - int num;
> + size_t num;
>
> if (compute_auto_abbrev)
> auto_abbrev = update_auto_abbrev(auto_abbrev, suspect);
> diff --git a/builtin/branch.c b/builtin/branch.c
> index 8c2305ad2c8..321c3558f2d 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -349,7 +349,7 @@ static int calc_maxwidth(struct ref_array *refs, int remote_bonus)
> for (i = 0; i < refs->nr; i++) {
> struct ref_array_item *it = refs->items[i];
> const char *desc = it->refname;
> - int w;
> + size_t w;
This isn't sufficient as we end up assignign `max = w`, where `max` is
declared as `int`. We would thus also have to change that variable and
then change the return value of `calc_maxwidth()` itself to be `size_t`.
Furthermore, we do `w += remote_bonus` in this function, where
`remote_bonus` is of type `int`. Is it guaranteed that this variable is
always positive? If so, it should likely become a `size_t`, too.
Otherwise we'll need to have some safety checks that the result does not
wrap.
> skip_prefix(it->refname, "refs/heads/", &desc);
> skip_prefix(it->refname, "refs/remotes/", &desc);
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 46a793411a4..fee992c3c14 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -705,7 +705,7 @@ static int refcol_width(const struct ref *ref_map, int compact_format)
> max = max * 2 / 3;
>
> for (ref = ref_map; ref; ref = ref->next) {
> - int rlen, llen = 0, len;
> + size_t rlen, llen = 0, len;
We assign `width = rlen` further down, but `width` is an `int`. It
should likely be changed to become a `size_t` as well, which will
require us to change the return value of this function.
> if (ref->status == REF_STATUS_REJECT_SHALLOW ||
> !ref->peer_ref ||
> diff --git a/builtin/worktree.c b/builtin/worktree.c
> index 9c76b62b02d..bdbf46fb658 100644
> --- a/builtin/worktree.c
> +++ b/builtin/worktree.c
> @@ -961,8 +961,8 @@ static void show_worktree_porcelain(struct worktree *wt, int line_terminator)
> static void show_worktree(struct worktree *wt, int path_maxlen, int abbrev_len)
> {
> struct strbuf sb = STRBUF_INIT;
> - int cur_path_len = strlen(wt->path);
> - int path_adj = cur_path_len - utf8_strwidth(wt->path);
> + size_t cur_path_len = strlen(wt->path);
> + int path_adj = cast_size_t_to_int(cur_path_len - utf8_strwidth(wt->path));
> const char *reason;
I was wondering whether there should be a sanity check here for whether
`cur_path_len >= path_adj`. But given that both measure the length of
`wt->path`, and given that `utf8_strwidth()` should only ever return a
value smaller than the original length, I think this should be fine.
> strbuf_addf(&sb, "%-*s ", 1 + path_maxlen + path_adj, wt->path);
> diff --git a/column.c b/column.c
> index 50bbccc92ee..ec874036de6 100644
> --- a/column.c
> +++ b/column.c
> @@ -22,7 +22,7 @@ struct column_data {
> };
>
> /* return length of 's' in letters, ANSI escapes stripped */
> -static int item_length(const char *s)
> +static size_t item_length(const char *s)
> {
> return utf8_strnwidth(s, strlen(s), 1);
> }
The value of this function is assigned to `struct column_data::len`,
which is an array of `int`. That type needs to change.
> diff --git a/diff.c b/diff.c
> index e50def45383..4faf151345a 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -2629,7 +2629,8 @@ void print_stat_summary(FILE *fp, int files,
>
> static void show_stats(struct diffstat_t *data, struct diff_options *options)
> {
> - int i, len, add, del, adds = 0, dels = 0;
> + int i, add, del, adds = 0, dels = 0;
> + size_t len = 0;
> uintmax_t max_change = 0, max_len = 0;
It's somewhat weird that `max_len` is declared as `uintmax_t`. Should it
become a `size_t`, as well?
> int total_files = data->nr, count;
> int width, name_width, graph_width, number_width = 0, bin_width = 0;
> @@ -2780,7 +2781,8 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
> char *name = file->print_name;
> uintmax_t added = file->added;
> uintmax_t deleted = file->deleted;
> - int name_len, padding;
> + size_t name_len;
We compare `name_len` with `name_width`. The latter should likely become
a `size_t`, as well.
> + int padding;
>
> if (!file->is_interesting && (added + deleted == 0))
> continue;
> @@ -2809,7 +2811,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
> if (slash)
> name = slash;
> }
> - padding = len - utf8_strwidth(name);
> + padding = cast_size_t_to_int(len - utf8_strwidth(name));
I think we shouldn't abuse `cast_size_t_to_int()` like this but instead
explicitly check whether `len >= len(name)`.
> if (padding < 0)
> padding = 0;
>
> diff --git a/gettext.c b/gettext.c
> index 57facbc21ec..5a77b4f7202 100644
> --- a/gettext.c
> +++ b/gettext.c
> @@ -127,7 +127,7 @@ void git_setup_gettext(void)
> }
>
> /* return the number of columns of string 's' in current locale */
> -int gettext_width(const char *s)
> +size_t gettext_width(const char *s)
We also need to adjust the callsite of this function. There's only a
single one in "builtin/fetch.c::display_ref_update()".
> {
> static int is_utf8 = -1;
> if (is_utf8 == -1)
> diff --git a/gettext.h b/gettext.h
> index 484cafa5628..f161a21b45c 100644
> --- a/gettext.h
> +++ b/gettext.h
> @@ -31,7 +31,7 @@
> #ifndef NO_GETTEXT
> extern int git_gettext_enabled;
> void git_setup_gettext(void);
> -int gettext_width(const char *s);
> +size_t gettext_width(const char *s);
> #else
> #define git_gettext_enabled (0)
> static inline void git_setup_gettext(void)
> diff --git a/pretty.c b/pretty.c
> index bdbed4295aa..f03493c74b1 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -1781,7 +1781,7 @@ static size_t format_and_pad_commit(struct strbuf *sb, /* in UTF-8 */
>
> if (padding < 0) {
> const char *start = strrchr(sb->buf, '\n');
> - int occupied;
> + size_t occupied;
> if (!start)
> start = sb->buf;
> occupied = utf8_strnwidth(start, strlen(start), 1);
We're assigning `padding = (-padding) - occupied` next, where `padding`
is of type `int`. We should add some safety checks here.
> @@ -1802,7 +1802,7 @@ static size_t format_and_pad_commit(struct strbuf *sb, /* in UTF-8 */
> placeholder++;
> total_consumed++;
> }
> - len = utf8_strnwidth(local_sb.buf, local_sb.len, 1);
> + len = cast_size_t_to_int(utf8_strnwidth(local_sb.buf, local_sb.len, 1));
Why did you decide to use `cast_size_t_to_int()` instead of converting
`len` to be of type `size_t`?
> if (c->flush_type == flush_left_and_steal) {
> const char *ch = sb->buf + sb->len - 1;
> diff --git a/utf8.c b/utf8.c
> index 6bfaefa28eb..8ccdf684e07 100644
> --- a/utf8.c
> +++ b/utf8.c
> @@ -466,7 +466,7 @@ int utf8_fprintf(FILE *stream, const char *format, ...)
>
> columns = fputs(buf.buf, stream);
> if (0 <= columns) /* keep the error from the I/O */
> - columns = utf8_strwidth(buf.buf);
> + columns = cast_size_t_to_int(utf8_strwidth(buf.buf));
> strbuf_release(&buf);
> return columns;
> }
> @@ -806,7 +806,7 @@ void strbuf_utf8_align(struct strbuf *buf, align_type position, unsigned int wid
> const char *s)
> {
> size_t slen = strlen(s);
> - int display_len = utf8_strnwidth(s, slen, 0);
> + size_t display_len = utf8_strnwidth(s, slen, 0);
> int utf8_compensation = slen - display_len;
>
> if (display_len >= width) {
A few lines further down we have:
int left = (width - dispay_len) / 2;
This should be changed to become a `size_t`.
> diff --git a/wt-status.c b/wt-status.c
> index 7108a92b52c..c847b4bb5ed 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -325,7 +325,7 @@ static int maxwidth(const char *(*label)(int), int minval, int maxval)
>
> for (i = minval; i <= maxval; i++) {
> const char *s = label(i);
> - int len = s ? utf8_strwidth(s) : 0;
> + size_t len = s ? utf8_strwidth(s) : 0;
> if (len > result)
> result = len;
`result` is of type `int`, so we'd have to change both its type as well
as the return type of `maxwidth()`.
> }
> @@ -341,7 +341,7 @@ static void wt_longstatus_print_unmerged_data(struct wt_status *s,
> static char *padding;
> static int label_width;
> const char *one, *how;
> - int len;
> + size_t len;
>
> if (!padding) {
> label_width = maxwidth(wt_status_unmerged_status_string, 1, 7);
`len` gets passed to the formatter "%.*s". Quoting fprintf(3p):
A field width, or precision, or both, may be indicated by an
<asterisk> ('*'). In this case an argument of type int supplies the
field width or precision.
So this must be an `int` or otherwise bad things will happen depending
on the size of those types on your platform.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-03-21 10:30 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-17 8:27 [PATCH 0/2] utf8: change return type of some helpers from int to size_t Mohit Marathe via GitGitGadget
2024-03-17 8:27 ` [PATCH 1/2] utf8: change type " Mohit Marathe via GitGitGadget
2024-03-21 10:30 ` Patrick Steinhardt [this message]
2024-03-17 8:27 ` [PATCH 2/2] utf8: make utf8_strnwidth & utf8_strwidth return size_t instead of int Mohit Marathe via GitGitGadget
2024-03-17 16:51 ` [PATCH 0/2] utf8: change return type of some helpers from int to size_t Junio C Hamano
2024-03-21 10:30 ` Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZfwMO-6IEXDm3z1b@tanuki \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=mohitmarathe@proton.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox