[PATCH 0/4] Initialize a few uninitialized variables

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] Initialize a few uninitialized variables
@ 2025-03-27 12:43 Johannes Schindelin via GitGitGadget
  2025-03-27 12:43 ` [PATCH 1/4] cat_one_file(): make it easy to see that the `size` variable is initialized Johannes Schindelin via GitGitGadget
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2025-03-27 12:43 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin

When I ran CodeQL on Git's source code, it said that that variables might be
uninitialized in a few places.

Johannes Schindelin (4):
  cat_one_file(): make it easy to see that the `size` variable is
    initialized
  fsck: avoid using an uninitialized variable
  load_revindex_from_disk(): avoid accessing uninitialized data
  load_pack_mtimes_file(): avoid accessing uninitialized data

 builtin/cat-file.c | 2 +-
 fsck.c             | 2 +-
 pack-mtimes.c      | 2 +-
 pack-revindex.c    | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)


base-commit: 683c54c999c301c2cd6f715c411407c413b1d84e
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1888%2Fdscho%2Funinitialized-variables-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1888/dscho/uninitialized-variables-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1888
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] cat_one_file(): make it easy to see that the `size` variable is initialized
  2025-03-27 12:43 [PATCH 0/4] Initialize a few uninitialized variables Johannes Schindelin via GitGitGadget
@ 2025-03-27 12:43 ` Johannes Schindelin via GitGitGadget
  2025-03-28  3:46   ` Jeff King
  2025-03-27 12:43 ` [PATCH 2/4] fsck: avoid using an uninitialized variable Johannes Schindelin via GitGitGadget
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2025-03-27 12:43 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The large `switch` statement makes it a bit impractical to reason about
the code.

One of the code paths can technically lead to using `size` without being
initialized: if the `t` case is taken and the type name is set to the
empty string, we would actually leave `size` unintialized right until we
use it.

Practically, this cannot happen because the
`do_oid_object_info_extended()` function is expected to always populate
the `type_name` if asked for. However, it is quite unnecessary to leave
the code as unwieldy to reason about: Just initialize the variable to 0
and be done with it.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/cat-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index b13561cf73b..128c901fa8e 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -104,7 +104,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 	struct object_id oid;
 	enum object_type type;
 	char *buf;
-	unsigned long size;
+	unsigned long size = 0;
 	struct object_context obj_context = {0};
 	struct object_info oi = OBJECT_INFO_INIT;
 	struct strbuf sb = STRBUF_INIT;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] fsck: avoid using an uninitialized variable
  2025-03-27 12:43 [PATCH 0/4] Initialize a few uninitialized variables Johannes Schindelin via GitGitGadget
  2025-03-27 12:43 ` [PATCH 1/4] cat_one_file(): make it easy to see that the `size` variable is initialized Johannes Schindelin via GitGitGadget
@ 2025-03-27 12:43 ` Johannes Schindelin via GitGitGadget
  2025-03-28  4:07   ` Jeff King
  2025-03-27 12:43 ` [PATCH 3/4] load_revindex_from_disk(): avoid accessing uninitialized data Johannes Schindelin via GitGitGadget
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2025-03-27 12:43 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

In `fsck_commit()`, after counting the authors of a commit, we set the
`err` variable either when there was no author, or when there were more
than two authors recorded. Then we access the `err` variable to figure
out whether we should return early. But if there was exactly one author,
that variable is still uninitialized.

Let's just initialize the variable.

This issue was pointed out by CodeQL.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 fsck.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fsck.c b/fsck.c
index 9fc4c25ffd5..ad04b24ff13 100644
--- a/fsck.c
+++ b/fsck.c
@@ -925,7 +925,7 @@ static int fsck_commit(const struct object_id *oid,
 {
 	struct object_id tree_oid, parent_oid;
 	unsigned author_count;
-	int err;
+	int err = 0;
 	const char *buffer_begin = buffer;
 	const char *buffer_end = buffer + size;
 	const char *p;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] load_revindex_from_disk(): avoid accessing uninitialized data
  2025-03-27 12:43 [PATCH 0/4] Initialize a few uninitialized variables Johannes Schindelin via GitGitGadget
  2025-03-27 12:43 ` [PATCH 1/4] cat_one_file(): make it easy to see that the `size` variable is initialized Johannes Schindelin via GitGitGadget
  2025-03-27 12:43 ` [PATCH 2/4] fsck: avoid using an uninitialized variable Johannes Schindelin via GitGitGadget
@ 2025-03-27 12:43 ` Johannes Schindelin via GitGitGadget
  2025-03-27 14:23   ` Taylor Blau
  2025-03-27 12:43 ` [PATCH 4/4] load_pack_mtimes_file(): " Johannes Schindelin via GitGitGadget
  2025-07-17 16:45 ` [PATCH 0/4] Initialize a few uninitialized variables Johannes Schindelin
  4 siblings, 1 reply; 10+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2025-03-27 12:43 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The `revindex_size` value is uninitialized in case the function is
erroring out, but we want to assign its value. Let's just initialize it.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 pack-revindex.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pack-revindex.c b/pack-revindex.c
index d3832478d99..3b007d771b3 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -208,7 +208,7 @@ static int load_revindex_from_disk(char *revindex_name,
 	int fd, ret = 0;
 	struct stat st;
 	void *data = NULL;
-	size_t revindex_size;
+	size_t revindex_size = 0;
 	struct revindex_header *hdr;
 
 	if (git_env_bool(GIT_TEST_REV_INDEX_DIE_ON_DISK, 0))
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] load_pack_mtimes_file(): avoid accessing uninitialized data
  2025-03-27 12:43 [PATCH 0/4] Initialize a few uninitialized variables Johannes Schindelin via GitGitGadget
                   ` (2 preceding siblings ...)
  2025-03-27 12:43 ` [PATCH 3/4] load_revindex_from_disk(): avoid accessing uninitialized data Johannes Schindelin via GitGitGadget
@ 2025-03-27 12:43 ` Johannes Schindelin via GitGitGadget
  2025-03-27 14:24   ` Taylor Blau
  2025-07-17 16:45 ` [PATCH 0/4] Initialize a few uninitialized variables Johannes Schindelin
  4 siblings, 1 reply; 10+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2025-03-27 12:43 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The `mtimes_size` variable is uninitialzed when the function errors out,
yet its value is assigned to another variable. Let's just initialize it.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 pack-mtimes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pack-mtimes.c b/pack-mtimes.c
index cdf30b8d2b0..c1f531d45a0 100644
--- a/pack-mtimes.c
+++ b/pack-mtimes.c
@@ -29,7 +29,7 @@ static int load_pack_mtimes_file(char *mtimes_file,
 	int fd, ret = 0;
 	struct stat st;
 	uint32_t *data = NULL;
-	size_t mtimes_size, expected_size;
+	size_t mtimes_size = 0, expected_size;
 	struct mtimes_header header;
 
 	fd = git_open(mtimes_file);
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/4] load_revindex_from_disk(): avoid accessing uninitialized data
  2025-03-27 12:43 ` [PATCH 3/4] load_revindex_from_disk(): avoid accessing uninitialized data Johannes Schindelin via GitGitGadget
@ 2025-03-27 14:23   ` Taylor Blau
  0 siblings, 0 replies; 10+ messages in thread
From: Taylor Blau @ 2025-03-27 14:23 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

On Thu, Mar 27, 2025 at 12:43:48PM +0000, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> The `revindex_size` value is uninitialized in case the function is
> erroring out, but we want to assign its value. Let's just initialize it.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  pack-revindex.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/pack-revindex.c b/pack-revindex.c
> index d3832478d99..3b007d771b3 100644
> --- a/pack-revindex.c
> +++ b/pack-revindex.c
> @@ -208,7 +208,7 @@ static int load_revindex_from_disk(char *revindex_name,
>  	int fd, ret = 0;
>  	struct stat st;
>  	void *data = NULL;
> -	size_t revindex_size;
> +	size_t revindex_size = 0;

I'm certainly not opposed to initializing variables proactively, but in
this particular case I don't think it's necessary.

We assign 'revindex_size' out to 'len_p' when we enter the cleanup
routine label if 'ret' is zero. We'll use 'revindex_size' in the same
label to munmap() when 'ret' is non-zero, but only if 'data' is also
initialized.

So there are two conditions where we'll enter the cleanup label before
assigning 'revindex_size', when git_open() returns a negative value, or
fstat()ing the descriptor that git_open() gave us returns a non-zero
value. In both of those cases, ret is non-zero (it is assigned to 1 and
the return value of error_errno() in those cases, respectively). Since
'data' is also NULL here, this function will terminate without using
the uninitialized 'revindex_size'.

If both of those work (i.e., we opened the file and fstat()ed it
successfully), then we'll have revindex_size initialized to st.st_size
(really the result of calling xsize_t() on it). There are two sanity
checks on the size, both of which happen before we have mmap()ed the
file, and both sanity checks set 'ret' to a non-zero value upon failure.

So by the time we '*len_p = revindex_size' it is guaranteed to be
initialized and just junk bytes on the stack.

Did this trigger a warning from a static analyzer or something? If so,
I'm happy to take this patch to appease it. Perhaps that it what's going
on since I recall you mentioning that you were working on enabling
CodeQL in Microsoft's fork of Git.

But if not I might suggest dropping this patch for the reasons above.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] load_pack_mtimes_file(): avoid accessing uninitialized data
  2025-03-27 12:43 ` [PATCH 4/4] load_pack_mtimes_file(): " Johannes Schindelin via GitGitGadget
@ 2025-03-27 14:24   ` Taylor Blau
  0 siblings, 0 replies; 10+ messages in thread
From: Taylor Blau @ 2025-03-27 14:24 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

On Thu, Mar 27, 2025 at 12:43:49PM +0000, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> The `mtimes_size` variable is uninitialzed when the function errors out,
> yet its value is assigned to another variable. Let's just initialize it.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  pack-mtimes.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/pack-mtimes.c b/pack-mtimes.c
> index cdf30b8d2b0..c1f531d45a0 100644
> --- a/pack-mtimes.c
> +++ b/pack-mtimes.c
> @@ -29,7 +29,7 @@ static int load_pack_mtimes_file(char *mtimes_file,
>  	int fd, ret = 0;
>  	struct stat st;
>  	uint32_t *data = NULL;
> -	size_t mtimes_size, expected_size;
> +	size_t mtimes_size = 0, expected_size;

Hmm. This one follows an identical line of reasoning as in my previous
response in the thread. So I think this one is likewise unnecessary
(though not harmful, and certainly useful if it appeases static analysis
tools, etc).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] cat_one_file(): make it easy to see that the `size` variable is initialized
  2025-03-27 12:43 ` [PATCH 1/4] cat_one_file(): make it easy to see that the `size` variable is initialized Johannes Schindelin via GitGitGadget
@ 2025-03-28  3:46   ` Jeff King
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2025-03-28  3:46 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

On Thu, Mar 27, 2025 at 12:43:46PM +0000, Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> The large `switch` statement makes it a bit impractical to reason about
> the code.
> 
> One of the code paths can technically lead to using `size` without being
> initialized: if the `t` case is taken and the type name is set to the
> empty string, we would actually leave `size` unintialized right until we
> use it.

I don't think that's quite true. If we have an empty type name we leave
the switch and hit these lines:

	if (!buf)
		die("git cat-file %s: bad file", obj_name);

	write_or_die(1, buf, size);

Since we set buf to NULL before the switch and never touch it in the 't'
case, we'll always hit that die() call.

So this really is a false positive, regardless of what happens to the
type name buffer. I'm a little surprised that CodeQL would get this
wrong, just because it is very easy to see that buf is not touched in
the 't' case at all (and thus must be NULL). But maybe I'm missing
something.

I do agree that the flow through the switch statement (where "break" is
good for some cases and a failure mode for others) makes this code
rather hard to reason about. I'm sure it could be rewritten, but I'm not
sure if it's worth spending time on.

> Practically, this cannot happen because the
> `do_oid_object_info_extended()` function is expected to always populate
> the `type_name` if asked for. However, it is quite unnecessary to leave
> the code as unwieldy to reason about: Just initialize the variable to 0
> and be done with it.

You can trigger the path in question like this:

  oid=$(echo foo | git hash-object --literally --stdin -w -t '')
  git cat-file --allow-unknown -t $oid

which hits the "bad file" message.

(Obviously the above is horrible and arguably something we should
consider forbidding; I have some patches moving towards ripping out
support for non-standard types entirely).

> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index b13561cf73b..128c901fa8e 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -104,7 +104,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
>  	struct object_id oid;
>  	enum object_type type;
>  	char *buf;
> -	unsigned long size;
> +	unsigned long size = 0;

So even though I think your analysis above had a few wrong details, I do
agree this is a false positive in CodeQL and is probably OK to fix as
you do here. Though it might make more sense to do it alongside the
assignment to "buf" (or to move the initialization of "buf" up here).

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] fsck: avoid using an uninitialized variable
  2025-03-27 12:43 ` [PATCH 2/4] fsck: avoid using an uninitialized variable Johannes Schindelin via GitGitGadget
@ 2025-03-28  4:07   ` Jeff King
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2025-03-28  4:07 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

On Thu, Mar 27, 2025 at 12:43:47PM +0000, Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> In `fsck_commit()`, after counting the authors of a commit, we set the
> `err` variable either when there was no author, or when there were more
> than two authors recorded. Then we access the `err` variable to figure
> out whether we should return early. But if there was exactly one author,
> that variable is still uninitialized.
> 
> Let's just initialize the variable.
> 
> This issue was pointed out by CodeQL.

Hmm, I'd think we would hit this case all the time, since commits
generally have one author. But I think it's another false positive.

The code in question is this:

          author_count = 0;
          while (buffer < buffer_end && skip_prefix(buffer, "author ", &buffer)) {
                  author_count++;
                  err = fsck_ident(&buffer, oid, OBJ_COMMIT, options);
                  if (err)
                          return err;
          }
          if (author_count < 1)
                  err = report(options, oid, OBJ_COMMIT, FSCK_MSG_MISSING_AUTHOR, "invalid format - expected 'author' line");
          else if (author_count > 1)
                  err = report(options, oid, OBJ_COMMIT, FSCK_MSG_MULTIPLE_AUTHORS, "invalid format - multiple 'author' lines");
          if (err)
                  return err;

So we set "err" as soon as we find _any_ author (when we check whether
it is properly formatted via fsck_ident). And author_count will not be
incremented if we did not find one. So either we must have assigned
the result of fsck_ident(), or we will hit the "author_count < 1" case
and assign there.

It's certainly confusing, though, since "err" gets used in so many
spots. I think the whole thing would be easier to understand if we had
tighter-scoped single use variables like this:

diff --git a/fsck.c b/fsck.c
index 9fc4c25ffd..ea72b3247d 100644
--- a/fsck.c
+++ b/fsck.c
@@ -925,7 +925,6 @@ static int fsck_commit(const struct object_id *oid,
 {
 	struct object_id tree_oid, parent_oid;
 	unsigned author_count;
-	int err;
 	const char *buffer_begin = buffer;
 	const char *buffer_end = buffer + size;
 	const char *p;
@@ -941,39 +940,44 @@ static int fsck_commit(const struct object_id *oid,
 	if (buffer >= buffer_end || !skip_prefix(buffer, "tree ", &buffer))
 		return report(options, oid, OBJ_COMMIT, FSCK_MSG_MISSING_TREE, "invalid format - expected 'tree' line");
 	if (parse_oid_hex(buffer, &tree_oid, &p) || *p != '\n') {
-		err = report(options, oid, OBJ_COMMIT, FSCK_MSG_BAD_TREE_SHA1, "invalid 'tree' line format - bad sha1");
+		int err = report(options, oid, OBJ_COMMIT, FSCK_MSG_BAD_TREE_SHA1, "invalid 'tree' line format - bad sha1");
 		if (err)
 			return err;
 	}
 	buffer = p + 1;
 	while (buffer < buffer_end && skip_prefix(buffer, "parent ", &buffer)) {
 		if (parse_oid_hex(buffer, &parent_oid, &p) || *p != '\n') {
-			err = report(options, oid, OBJ_COMMIT, FSCK_MSG_BAD_PARENT_SHA1, "invalid 'parent' line format - bad sha1");
+			int err = report(options, oid, OBJ_COMMIT, FSCK_MSG_BAD_PARENT_SHA1, "invalid 'parent' line format - bad sha1");
 			if (err)
 				return err;
 		}
 		buffer = p + 1;
 	}
 	author_count = 0;
 	while (buffer < buffer_end && skip_prefix(buffer, "author ", &buffer)) {
+		int err = fsck_ident(&buffer, oid, OBJ_COMMIT, options);
+		if (err)
+			return err;
 		author_count++;
-		err = fsck_ident(&buffer, oid, OBJ_COMMIT, options);
+	}
+	if (author_count < 1) {
+		int err = report(options, oid, OBJ_COMMIT, FSCK_MSG_MISSING_AUTHOR, "invalid format - expected 'author' line");
+		if (err)
+			return err;
+	} else if (author_count > 1) {
+		int err = report(options, oid, OBJ_COMMIT, FSCK_MSG_MULTIPLE_AUTHORS, "invalid format - multiple 'author' lines");
 		if (err)
 			return err;
 	}
-	if (author_count < 1)
-		err = report(options, oid, OBJ_COMMIT, FSCK_MSG_MISSING_AUTHOR, "invalid format - expected 'author' line");
-	else if (author_count > 1)
-		err = report(options, oid, OBJ_COMMIT, FSCK_MSG_MULTIPLE_AUTHORS, "invalid format - multiple 'author' lines");
-	if (err)
-		return err;
 	if (buffer >= buffer_end || !skip_prefix(buffer, "committer ", &buffer))
 		return report(options, oid, OBJ_COMMIT, FSCK_MSG_MISSING_COMMITTER, "invalid format - expected 'committer' line");
-	err = fsck_ident(&buffer, oid, OBJ_COMMIT, options);
-	if (err)
-		return err;
+	else {
+		int err = fsck_ident(&buffer, oid, OBJ_COMMIT, options);
+		if (err)
+			return err;
+	}
 	if (memchr(buffer_begin, '\0', size)) {
-		err = report(options, oid, OBJ_COMMIT, FSCK_MSG_NUL_IN_COMMIT,
+		int err = report(options, oid, OBJ_COMMIT, FSCK_MSG_NUL_IN_COMMIT,
 			     "NUL byte in the commit object body");
 		if (err)
 			return err;

And then it is obvious that the general pattern is to propagate "err"
from individual calls (and the ones that do not stick out like sore
thumbs; are those bugs where we should keep going if the user set those
message types to warn/ignore?).

You could even wrap the pattern in a macro, though perhaps that is
getting too magical. The resulting logic is easier to follow, though, if
you can look past the macro:

diff --git a/fsck.c b/fsck.c
index ea72b3247d..8c7ac3c448 100644
--- a/fsck.c
+++ b/fsck.c
@@ -919,6 +919,12 @@ static int fsck_ident(const char **ident,
 	return 0;
 }
 
+#define MAYBE_RETURN(x) do { \
+	int err = (x); \
+	if (err) \
+		return err; \
+} while (0)
+
 static int fsck_commit(const struct object_id *oid,
 		       const char *buffer, unsigned long size,
 		       struct fsck_options *options)
@@ -939,49 +945,30 @@ static int fsck_commit(const struct object_id *oid,
 
 	if (buffer >= buffer_end || !skip_prefix(buffer, "tree ", &buffer))
 		return report(options, oid, OBJ_COMMIT, FSCK_MSG_MISSING_TREE, "invalid format - expected 'tree' line");
-	if (parse_oid_hex(buffer, &tree_oid, &p) || *p != '\n') {
-		int err = report(options, oid, OBJ_COMMIT, FSCK_MSG_BAD_TREE_SHA1, "invalid 'tree' line format - bad sha1");
-		if (err)
-			return err;
-	}
+	if (parse_oid_hex(buffer, &tree_oid, &p) || *p != '\n')
+		MAYBE_RETURN(report(options, oid, OBJ_COMMIT, FSCK_MSG_BAD_TREE_SHA1, "invalid 'tree' line format - bad sha1"));
 	buffer = p + 1;
 	while (buffer < buffer_end && skip_prefix(buffer, "parent ", &buffer)) {
-		if (parse_oid_hex(buffer, &parent_oid, &p) || *p != '\n') {
-			int err = report(options, oid, OBJ_COMMIT, FSCK_MSG_BAD_PARENT_SHA1, "invalid 'parent' line format - bad sha1");
-			if (err)
-				return err;
-		}
+		if (parse_oid_hex(buffer, &parent_oid, &p) || *p != '\n')
+			MAYBE_RETURN(report(options, oid, OBJ_COMMIT, FSCK_MSG_BAD_PARENT_SHA1, "invalid 'parent' line format - bad sha1"));
 		buffer = p + 1;
 	}
 	author_count = 0;
 	while (buffer < buffer_end && skip_prefix(buffer, "author ", &buffer)) {
-		int err = fsck_ident(&buffer, oid, OBJ_COMMIT, options);
-		if (err)
-			return err;
+		MAYBE_RETURN(fsck_ident(&buffer, oid, OBJ_COMMIT, options));
 		author_count++;
 	}
-	if (author_count < 1) {
-		int err = report(options, oid, OBJ_COMMIT, FSCK_MSG_MISSING_AUTHOR, "invalid format - expected 'author' line");
-		if (err)
-			return err;
-	} else if (author_count > 1) {
-		int err = report(options, oid, OBJ_COMMIT, FSCK_MSG_MULTIPLE_AUTHORS, "invalid format - multiple 'author' lines");
-		if (err)
-			return err;
-	}
+	if (author_count < 1)
+		MAYBE_RETURN(report(options, oid, OBJ_COMMIT, FSCK_MSG_MISSING_AUTHOR, "invalid format - expected 'author' line"));
+	else if (author_count > 1)
+		MAYBE_RETURN(report(options, oid, OBJ_COMMIT, FSCK_MSG_MULTIPLE_AUTHORS, "invalid format - multiple 'author' lines"));
 	if (buffer >= buffer_end || !skip_prefix(buffer, "committer ", &buffer))
 		return report(options, oid, OBJ_COMMIT, FSCK_MSG_MISSING_COMMITTER, "invalid format - expected 'committer' line");
-	else {
-		int err = fsck_ident(&buffer, oid, OBJ_COMMIT, options);
-		if (err)
-			return err;
-	}
-	if (memchr(buffer_begin, '\0', size)) {
-		int err = report(options, oid, OBJ_COMMIT, FSCK_MSG_NUL_IN_COMMIT,
-			     "NUL byte in the commit object body");
-		if (err)
-			return err;
-	}
+	else
+		MAYBE_RETURN(fsck_ident(&buffer, oid, OBJ_COMMIT, options));
+
+	if (memchr(buffer_begin, '\0', size))
+		MAYBE_RETURN(report(options, oid, OBJ_COMMIT, FSCK_MSG_NUL_IN_COMMIT, "NUL byte in the commit object body"));
 	return 0;
 }
 

I'd suspect that just the first patch above would fix the CodeQL issue.
It's certainly a larger diff, but IMHO the result is less confusing for
humans, too.

-Peff

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/4] Initialize a few uninitialized variables
  2025-03-27 12:43 [PATCH 0/4] Initialize a few uninitialized variables Johannes Schindelin via GitGitGadget
                   ` (3 preceding siblings ...)
  2025-03-27 12:43 ` [PATCH 4/4] load_pack_mtimes_file(): " Johannes Schindelin via GitGitGadget
@ 2025-07-17 16:45 ` Johannes Schindelin
  4 siblings, 0 replies; 10+ messages in thread
From: Johannes Schindelin @ 2025-07-17 16:45 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git

Hi,

On Thu, 27 Mar 2025, Johannes Schindelin via GitGitGadget wrote:

> When I ran CodeQL on Git's source code, it said that that variables might be
> uninitialized in a few places.

For the record, I am abandoning my efforts to upstream this. I was never
interested in the actual changes, I was interested in getting CodeQL to
run on Git's code base without reporting false positives, so that I could
compare the quality of the reports against Coverity's. Therefore, I was
not really prepared to polish these patches as if they were something
important: They are not, and I cannot justify spending any more time on
these patches. I will carry them as-are in microsoft/git, under the label
'uninitialized-variables', as long as they apply without major merge
conflict headaches.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-07-17 16:45 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-27 12:43 [PATCH 0/4] Initialize a few uninitialized variables Johannes Schindelin via GitGitGadget
2025-03-27 12:43 ` [PATCH 1/4] cat_one_file(): make it easy to see that the `size` variable is initialized Johannes Schindelin via GitGitGadget
2025-03-28  3:46   ` Jeff King
2025-03-27 12:43 ` [PATCH 2/4] fsck: avoid using an uninitialized variable Johannes Schindelin via GitGitGadget
2025-03-28  4:07   ` Jeff King
2025-03-27 12:43 ` [PATCH 3/4] load_revindex_from_disk(): avoid accessing uninitialized data Johannes Schindelin via GitGitGadget
2025-03-27 14:23   ` Taylor Blau
2025-03-27 12:43 ` [PATCH 4/4] load_pack_mtimes_file(): " Johannes Schindelin via GitGitGadget
2025-03-27 14:24   ` Taylor Blau
2025-07-17 16:45 ` [PATCH 0/4] Initialize a few uninitialized variables Johannes Schindelin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).