git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 4/4] commit.c: Fix type conversation warnings from msvc
@ 2025-01-26 13:00 Sören Krecker
  2025-01-27  7:26 ` Patrick Steinhardt
  2025-01-29 16:53 ` Phillip Wood
  0 siblings, 2 replies; 3+ messages in thread
From: Sören Krecker @ 2025-01-26 13:00 UTC (permalink / raw)
  To: git; +Cc: gitster, phillip.wood123, ps, sunshine, Sören Krecker

Fix compiler warnings from msvc in commit.c for value truncation from 64
bit to 32 bit integers.

Also switch from int to size_t for all variables with result of strlen()
which cannot become negative.

Signed-off-by: Sören Krecker <soekkle@freenet.de>
---
 commit.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/commit.c b/commit.c
index 540660359d..c9cc56bd9f 100644
--- a/commit.c
+++ b/commit.c
@@ -466,8 +466,8 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
 	struct object_id parent;
 	struct commit_list **pptr;
 	struct commit_graft *graft;
-	const int tree_entry_len = the_hash_algo->hexsz + 5;
-	const int parent_entry_len = the_hash_algo->hexsz + 7;
+	const size_t tree_entry_len = the_hash_algo->hexsz + 5;
+	const size_t parent_entry_len = the_hash_algo->hexsz + 7;
 	struct tree *tree;
 
 	if (item->object.parsed)
@@ -1114,10 +1114,10 @@ static const char *gpg_sig_headers[] = {
 
 int add_header_signature(struct strbuf *buf, struct strbuf *sig, const struct git_hash_algo *algo)
 {
-	int inspos, copypos;
+	ssize_t inspos, copypos;
 	const char *eoh;
 	const char *gpg_sig_header = gpg_sig_headers[hash_algo_by_ptr(algo)];
-	int gpg_sig_header_len = strlen(gpg_sig_header);
+	size_t gpg_sig_header_len = strlen(gpg_sig_header);
 
 	/* find the end of the header */
 	eoh = strstr(buf->buf, "\n\n");
@@ -1530,7 +1530,7 @@ int commit_tree(const char *msg, size_t msg_len, const struct object_id *tree,
 	return result;
 }
 
-static int find_invalid_utf8(const char *buf, int len)
+static int find_invalid_utf8(const char *buf, size_t len)
 {
 	int offset = 0;
 	static const unsigned int max_codepoint[] = {
@@ -1539,7 +1539,7 @@ static int find_invalid_utf8(const char *buf, int len)
 
 	while (len) {
 		unsigned char c = *buf++;
-		int bytes, bad_offset;
+		size_t bytes, bad_offset;
 		unsigned int codepoint;
 		unsigned int min_val, max_val;
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v3 4/4] commit.c: Fix type conversation warnings from msvc
  2025-01-26 13:00 [PATCH v3 4/4] commit.c: Fix type conversation warnings from msvc Sören Krecker
@ 2025-01-27  7:26 ` Patrick Steinhardt
  2025-01-29 16:53 ` Phillip Wood
  1 sibling, 0 replies; 3+ messages in thread
From: Patrick Steinhardt @ 2025-01-27  7:26 UTC (permalink / raw)
  To: 20250126125638.3089-1-soekkle
  Cc: git, gitster, phillip.wood123, sunshine, Sören Krecker

On Sun, Jan 26, 2025 at 02:00:38PM +0100, Sören Krecker wrote:
> diff --git a/commit.c b/commit.c
> index 540660359d..c9cc56bd9f 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -1530,7 +1530,7 @@ int commit_tree(const char *msg, size_t msg_len, const struct object_id *tree,
>  	return result;
>  }
>  
> -static int find_invalid_utf8(const char *buf, int len)
> +static int find_invalid_utf8(const char *buf, size_t len)
>  {
>  	int offset = 0;

Shouldn't the type of `offset` be adjusted, as well?

Patrick

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v3 4/4] commit.c: Fix type conversation warnings from msvc
  2025-01-26 13:00 [PATCH v3 4/4] commit.c: Fix type conversation warnings from msvc Sören Krecker
  2025-01-27  7:26 ` Patrick Steinhardt
@ 2025-01-29 16:53 ` Phillip Wood
  1 sibling, 0 replies; 3+ messages in thread
From: Phillip Wood @ 2025-01-29 16:53 UTC (permalink / raw)
  To: 20250126125638.3089-1-soekkle, git
  Cc: gitster, ps, sunshine, Sören Krecker

Hi Sören

On 26/01/2025 13:00, Sören Krecker wrote:

> -	const int tree_entry_len = the_hash_algo->hexsz + 5;
> -	const int parent_entry_len = the_hash_algo->hexsz + 7;
> +	const size_t tree_entry_len = the_hash_algo->hexsz + 5;
> +	const size_t parent_entry_len = the_hash_algo->hexsz + 7;

As Junio has previously pointed out it might make more sense to change 
the type of the_hash_algo->hexsz. What is the advantage of using size_t 
here?

>   int add_header_signature(struct strbuf *buf, struct strbuf *sig, const struct git_hash_algo *algo)
>   {
> -	int inspos, copypos;
> +	ssize_t inspos, copypos;

Note that POSIX allows "ssize_t" to be narrower than "int". We have at 
least one platform where that is the case [see c14e5a1a501 
(transport-helper: use xread instead of read, 2019-01-03)] so I'm not 
sure this change is a good idea.

>   	const char *eoh;
>   	const char *gpg_sig_header = gpg_sig_headers[hash_algo_by_ptr(algo)];
> -	int gpg_sig_header_len = strlen(gpg_sig_header);
> +	size_t gpg_sig_header_len = strlen(gpg_sig_header);

It is really unfortunate that the compiler cannot deduce that there is 
not any truncation here as the longest string is "gpgsig-sha256". I 
wonder if we should add a helper for cases like this something like

	int strlen_int(const char *s) {
		return cast_size_t_to_int(strlen(s));
	}

rather than having to deal with the fallout from changing "int" to "size_t"

> -static int find_invalid_utf8(const char *buf, int len)
> +static int find_invalid_utf8(const char *buf, size_t len)

I think changing the type of "len" is an improvement (even if it hard to 
see anyone creating such a long commit message) as it matches the 
buf->len in verify_utf8() which is the only caller of this function. 
However the conversion is incomplete. If we are to accommodate buffers 
longer than INT_MAX we need to change the return type as well. As it 
stands bad_offset is changed to size_t but truncated to int when the 
function returns. Patrick has already pointed out that the type of 
"offset" needs to match "len" as the loop does

	len--;
	offset++;

and

	bad_offset = offset - 1;

verify_utf8() uses a variable of type long to track the position so that 
should be changed as well if we're really going to support size_t length 
buffers.

I think that a good approach to fixing these warnings would be to ask 
"does it make sense to use size_t here?". If the answer is "yes" then we 
should focus on converting the function to work correctly with size_t 
rather than on fixing the compiler warnings. That way we are more likely 
to avoid subtle bugs like this as our focus is on the conversion rather 
than the compiler warnings which will be fixed as a by-product of the 
conversion. If the answer to the question is "no" then we should look to 
change the types of the other variable in the assignment or use 
something like cast_size_t_to_int() or case_size_t_to_ulong() as 
appropriate. There is a danger that we adopt an approach of "change the 
type to size_t to silence the warnings" which leads to code that looks 
like it handles size_t correctly but in fact contains subtle bugs.

Best Wishes

Phillip


>   {
>   	int offset = 0;
>   	static const unsigned int max_codepoint[] = {
> @@ -1539,7 +1539,7 @@ static int find_invalid_utf8(const char *buf, int len)
>   
>   	while (len) {
>   		unsigned char c = *buf++;
> -		int bytes, bad_offset;
> +		size_t bytes, bad_offset;
>   		unsigned int codepoint;
>   		unsigned int min_val, max_val;
>   


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-01-29 16:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-26 13:00 [PATCH v3 4/4] commit.c: Fix type conversation warnings from msvc Sören Krecker
2025-01-27  7:26 ` Patrick Steinhardt
2025-01-29 16:53 ` Phillip Wood

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).