git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11] CodeQL-inspired fixes
@ 2025-05-15 13:11 Johannes Schindelin via GitGitGadget
  2025-05-15 13:11 ` [PATCH 01/11] commit: simplify code Johannes Schindelin via GitGitGadget
                   ` (11 more replies)
  0 siblings, 12 replies; 36+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2025-05-15 13:11 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin

CodeQL [https://codeql.github.com/] pointed out a couple of issues, which
are addressed in this patch series.

Johannes Schindelin (11):
  commit: simplify code
  fetch: carefully clear local variable's address after use
  commit-graph: avoid malloc'ing a local variable
  upload-pack: rename `enum` to reflect the operation
  has_dir_name(): make code more obvious
  fetch: avoid unnecessary work when there is no current branch
  Avoid redundant conditions
  trace2: avoid "futile conditional"
  commit-graph: avoid using stale stack addresses
  bundle-uri: avoid using undefined output of `sscanf()`
  sequencer: stop pretending that an assignment is a condition

 builtin/commit.c   |   2 +-
 builtin/fetch.c    |   3 +-
 bundle-uri.c       |  12 ++--
 commit-graph.c     | 148 +++++++++++++++++++++++----------------------
 help.c             |   2 +-
 read-cache.c       |  55 ++++-------------
 sequencer.c        |   9 ++-
 trace2/tr2_tmr.c   |  24 ++------
 transport-helper.c |   2 +-
 upload-pack.c      |  34 +++++------
 10 files changed, 130 insertions(+), 161 deletions(-)


base-commit: 38af977b81bbf8ce8c0004d3f4046a823ecb30a1
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1891%2Fdscho%2Fcodeql-fixes-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1891/dscho/codeql-fixes-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1891
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 01/11] commit: simplify code
  2025-05-15 13:11 [PATCH 00/11] CodeQL-inspired fixes Johannes Schindelin via GitGitGadget
@ 2025-05-15 13:11 ` Johannes Schindelin via GitGitGadget
  2025-05-15 19:48   ` Jeff King
  2025-05-15 13:11 ` [PATCH 02/11] fetch: carefully clear local variable's address after use Johannes Schindelin via GitGitGadget
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2025-05-15 13:11 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The difference of two unsigned integers is defined to be unsigned, and
therefore it is misleading to check whether it is greater than zero
(instead, the more natural way would be to check whether the difference
is zero or not).

Let's instead avoid the subtraction altogether, and compare the two
operands directly, which makes the code more obvious as a side effect.

Pointed out by CodeQL's rule with the ID
`cpp/unsigned-difference-expression-compared-zero`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/commit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 66bd91fd523d..fba0dded64a7 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1022,7 +1022,7 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 			for (i = 0; i < the_repository->index->cache_nr; i++)
 				if (ce_intent_to_add(the_repository->index->cache[i]))
 					ita_nr++;
-			committable = the_repository->index->cache_nr - ita_nr > 0;
+			committable = the_repository->index->cache_nr > ita_nr;
 		} else {
 			/*
 			 * Unless the user did explicitly request a submodule
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 02/11] fetch: carefully clear local variable's address after use
  2025-05-15 13:11 [PATCH 00/11] CodeQL-inspired fixes Johannes Schindelin via GitGitGadget
  2025-05-15 13:11 ` [PATCH 01/11] commit: simplify code Johannes Schindelin via GitGitGadget
@ 2025-05-15 13:11 ` Johannes Schindelin via GitGitGadget
  2025-05-15 19:40   ` Jeff King
  2025-05-15 13:11 ` [PATCH 03/11] commit-graph: avoid malloc'ing a local variable Johannes Schindelin via GitGitGadget
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2025-05-15 13:11 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

As pointed out by CodeQL, it is a potentially dangerous practice to
store local variables' addresses in non-local structs. Yet this is
exactly what happens with the `acked_commits` attribute that is used in
`cmd_fetch()`: The pointer to a local variable is assigned to it.

Now, it is Git's convention that `cmd_*()` functions are essentially
only returning just before exiting the process, therefore there is
little danger that this attribute is used after the code flow returns
from that function.

However, code in `cmd_*()` function is often so useful that it gets
lifted into a library function, at which point this issue could become a
real problem.

Let's make sure to clear the `acked_commits` attribute out after it was
used, and before the function returns (at which point the address would
go stale).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/fetch.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index cda6eaf1fd6e..c1a1434c7096 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -2560,6 +2560,7 @@ int cmd_fetch(int argc,
 		if (server_options.nr)
 			gtransport->server_options = &server_options;
 		result = transport_fetch_refs(gtransport, NULL);
+		gtransport->smart_options->acked_commits = NULL;
 
 		oidset_iter_init(&acked_commits, &iter);
 		while ((oid = oidset_iter_next(&iter)))
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 03/11] commit-graph: avoid malloc'ing a local variable
  2025-05-15 13:11 [PATCH 00/11] CodeQL-inspired fixes Johannes Schindelin via GitGitGadget
  2025-05-15 13:11 ` [PATCH 01/11] commit: simplify code Johannes Schindelin via GitGitGadget
  2025-05-15 13:11 ` [PATCH 02/11] fetch: carefully clear local variable's address after use Johannes Schindelin via GitGitGadget
@ 2025-05-15 13:11 ` Johannes Schindelin via GitGitGadget
  2025-05-15 19:54   ` Jeff King
  2025-05-15 13:11 ` [PATCH 04/11] upload-pack: rename `enum` to reflect the operation Johannes Schindelin via GitGitGadget
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2025-05-15 13:11 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

We do need a context to write the commit graph, but that context is only
needed during the life time of `commit_graph_write()`, therefore it can
easily be a stack variable.

This also helps CodeQL recognize that it is safe to assign the address
of other local variables to the context's fields.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 commit-graph.c | 141 ++++++++++++++++++++++++-------------------------
 1 file changed, 69 insertions(+), 72 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 6394752b0b08..9f0115dac9b5 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -2509,7 +2509,17 @@ int write_commit_graph(struct object_directory *odb,
 		       const struct commit_graph_opts *opts)
 {
 	struct repository *r = the_repository;
-	struct write_commit_graph_context *ctx;
+	struct write_commit_graph_context ctx = {
+		.r = r,
+		.odb = odb,
+		.append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0,
+		.report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0,
+		.split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0,
+		.opts = opts,
+		.total_bloom_filter_data_size = 0,
+		.write_generation_data = (get_configured_generation_version(r) == 2),
+		.num_generation_data_overflows = 0,
+	};
 	uint32_t i;
 	int res = 0;
 	int replace = 0;
@@ -2531,17 +2541,6 @@ int write_commit_graph(struct object_directory *odb,
 		return 0;
 	}
 
-	CALLOC_ARRAY(ctx, 1);
-	ctx->r = r;
-	ctx->odb = odb;
-	ctx->append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0;
-	ctx->report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0;
-	ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0;
-	ctx->opts = opts;
-	ctx->total_bloom_filter_data_size = 0;
-	ctx->write_generation_data = (get_configured_generation_version(r) == 2);
-	ctx->num_generation_data_overflows = 0;
-
 	bloom_settings.hash_version = r->settings.commit_graph_changed_paths_version;
 	bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY",
 						      bloom_settings.bits_per_entry);
@@ -2549,14 +2548,14 @@ int write_commit_graph(struct object_directory *odb,
 						  bloom_settings.num_hashes);
 	bloom_settings.max_changed_paths = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS",
 							 bloom_settings.max_changed_paths);
-	ctx->bloom_settings = &bloom_settings;
+	ctx.bloom_settings = &bloom_settings;
 
 	init_topo_level_slab(&topo_levels);
-	ctx->topo_levels = &topo_levels;
+	ctx.topo_levels = &topo_levels;
 
-	prepare_commit_graph(ctx->r);
-	if (ctx->r->objects->commit_graph) {
-		struct commit_graph *g = ctx->r->objects->commit_graph;
+	prepare_commit_graph(ctx.r);
+	if (ctx.r->objects->commit_graph) {
+		struct commit_graph *g = ctx.r->objects->commit_graph;
 
 		while (g) {
 			g->topo_levels = &topo_levels;
@@ -2565,15 +2564,15 @@ int write_commit_graph(struct object_directory *odb,
 	}
 
 	if (flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS)
-		ctx->changed_paths = 1;
+		ctx.changed_paths = 1;
 	if (!(flags & COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS)) {
 		struct commit_graph *g;
 
-		g = ctx->r->objects->commit_graph;
+		g = ctx.r->objects->commit_graph;
 
 		/* We have changed-paths already. Keep them in the next graph */
 		if (g && g->bloom_filter_settings) {
-			ctx->changed_paths = 1;
+			ctx.changed_paths = 1;
 
 			/* don't propagate the hash_version unless unspecified */
 			if (bloom_settings.hash_version == -1)
@@ -2586,116 +2585,114 @@ int write_commit_graph(struct object_directory *odb,
 
 	bloom_settings.hash_version = bloom_settings.hash_version == 2 ? 2 : 1;
 
-	if (ctx->split) {
-		struct commit_graph *g = ctx->r->objects->commit_graph;
+	if (ctx.split) {
+		struct commit_graph *g = ctx.r->objects->commit_graph;
 
 		while (g) {
-			ctx->num_commit_graphs_before++;
+			ctx.num_commit_graphs_before++;
 			g = g->base_graph;
 		}
 
-		if (ctx->num_commit_graphs_before) {
-			ALLOC_ARRAY(ctx->commit_graph_filenames_before, ctx->num_commit_graphs_before);
-			i = ctx->num_commit_graphs_before;
-			g = ctx->r->objects->commit_graph;
+		if (ctx.num_commit_graphs_before) {
+			ALLOC_ARRAY(ctx.commit_graph_filenames_before, ctx.num_commit_graphs_before);
+			i = ctx.num_commit_graphs_before;
+			g = ctx.r->objects->commit_graph;
 
 			while (g) {
-				ctx->commit_graph_filenames_before[--i] = xstrdup(g->filename);
+				ctx.commit_graph_filenames_before[--i] = xstrdup(g->filename);
 				g = g->base_graph;
 			}
 		}
 
-		if (ctx->opts)
-			replace = ctx->opts->split_flags & COMMIT_GRAPH_SPLIT_REPLACE;
+		if (ctx.opts)
+			replace = ctx.opts->split_flags & COMMIT_GRAPH_SPLIT_REPLACE;
 	}
 
-	ctx->approx_nr_objects = repo_approximate_object_count(the_repository);
+	ctx.approx_nr_objects = repo_approximate_object_count(the_repository);
 
-	if (ctx->append && ctx->r->objects->commit_graph) {
-		struct commit_graph *g = ctx->r->objects->commit_graph;
+	if (ctx.append && ctx.r->objects->commit_graph) {
+		struct commit_graph *g = ctx.r->objects->commit_graph;
 		for (i = 0; i < g->num_commits; i++) {
 			struct object_id oid;
 			oidread(&oid, g->chunk_oid_lookup + st_mult(g->hash_len, i),
 				the_repository->hash_algo);
-			oid_array_append(&ctx->oids, &oid);
+			oid_array_append(&ctx.oids, &oid);
 		}
 	}
 
 	if (pack_indexes) {
-		ctx->order_by_pack = 1;
-		if ((res = fill_oids_from_packs(ctx, pack_indexes)))
+		ctx.order_by_pack = 1;
+		if ((res = fill_oids_from_packs(&ctx, pack_indexes)))
 			goto cleanup;
 	}
 
 	if (commits) {
-		if ((res = fill_oids_from_commits(ctx, commits)))
+		if ((res = fill_oids_from_commits(&ctx, commits)))
 			goto cleanup;
 	}
 
 	if (!pack_indexes && !commits) {
-		ctx->order_by_pack = 1;
-		fill_oids_from_all_packs(ctx);
+		ctx.order_by_pack = 1;
+		fill_oids_from_all_packs(&ctx);
 	}
 
-	close_reachable(ctx);
+	close_reachable(&ctx);
 
-	copy_oids_to_commits(ctx);
+	copy_oids_to_commits(&ctx);
 
-	if (ctx->commits.nr >= GRAPH_EDGE_LAST_MASK) {
+	if (ctx.commits.nr >= GRAPH_EDGE_LAST_MASK) {
 		error(_("too many commits to write graph"));
 		res = -1;
 		goto cleanup;
 	}
 
-	if (!ctx->commits.nr && !replace)
+	if (!ctx.commits.nr && !replace)
 		goto cleanup;
 
-	if (ctx->split) {
-		split_graph_merge_strategy(ctx);
+	if (ctx.split) {
+		split_graph_merge_strategy(&ctx);
 
 		if (!replace)
-			merge_commit_graphs(ctx);
+			merge_commit_graphs(&ctx);
 	} else
-		ctx->num_commit_graphs_after = 1;
+		ctx.num_commit_graphs_after = 1;
 
-	ctx->trust_generation_numbers = validate_mixed_generation_chain(ctx->r->objects->commit_graph);
+	ctx.trust_generation_numbers = validate_mixed_generation_chain(ctx.r->objects->commit_graph);
 
-	compute_topological_levels(ctx);
-	if (ctx->write_generation_data)
-		compute_generation_numbers(ctx);
+	compute_topological_levels(&ctx);
+	if (ctx.write_generation_data)
+		compute_generation_numbers(&ctx);
 
-	if (ctx->changed_paths)
-		compute_bloom_filters(ctx);
+	if (ctx.changed_paths)
+		compute_bloom_filters(&ctx);
 
-	res = write_commit_graph_file(ctx);
+	res = write_commit_graph_file(&ctx);
 
-	if (ctx->changed_paths)
+	if (ctx.changed_paths)
 		deinit_bloom_filters();
 
-	if (ctx->split)
-		mark_commit_graphs(ctx);
+	if (ctx.split)
+		mark_commit_graphs(&ctx);
 
-	expire_commit_graphs(ctx);
+	expire_commit_graphs(&ctx);
 
 cleanup:
-	free(ctx->graph_name);
-	free(ctx->base_graph_name);
-	free(ctx->commits.list);
-	oid_array_clear(&ctx->oids);
+	free(ctx.graph_name);
+	free(ctx.base_graph_name);
+	free(ctx.commits.list);
+	oid_array_clear(&ctx.oids);
 	clear_topo_level_slab(&topo_levels);
 
-	for (i = 0; i < ctx->num_commit_graphs_before; i++)
-		free(ctx->commit_graph_filenames_before[i]);
-	free(ctx->commit_graph_filenames_before);
+	for (i = 0; i < ctx.num_commit_graphs_before; i++)
+		free(ctx.commit_graph_filenames_before[i]);
+	free(ctx.commit_graph_filenames_before);
 
-	for (i = 0; i < ctx->num_commit_graphs_after; i++) {
-		free(ctx->commit_graph_filenames_after[i]);
-		free(ctx->commit_graph_hash_after[i]);
+	for (i = 0; i < ctx.num_commit_graphs_after; i++) {
+		free(ctx.commit_graph_filenames_after[i]);
+		free(ctx.commit_graph_hash_after[i]);
 	}
-	free(ctx->commit_graph_filenames_after);
-	free(ctx->commit_graph_hash_after);
-
-	free(ctx);
+	free(ctx.commit_graph_filenames_after);
+	free(ctx.commit_graph_hash_after);
 
 	return res;
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 04/11] upload-pack: rename `enum` to reflect the operation
  2025-05-15 13:11 [PATCH 00/11] CodeQL-inspired fixes Johannes Schindelin via GitGitGadget
                   ` (2 preceding siblings ...)
  2025-05-15 13:11 ` [PATCH 03/11] commit-graph: avoid malloc'ing a local variable Johannes Schindelin via GitGitGadget
@ 2025-05-15 13:11 ` Johannes Schindelin via GitGitGadget
  2025-05-15 19:55   ` Jeff King
  2025-05-15 13:11 ` [PATCH 05/11] has_dir_name(): make code more obvious Johannes Schindelin via GitGitGadget
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2025-05-15 13:11 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

While 3145ea957d (upload-pack: introduce fetch server command,
2018-03-15) added support for the `fetch` command, from the server's
point of view it is an upload, and hence the `enum` should really be
called `upload_state` instead of `fetch_state`. Likewise, rename its
values.

This also helps unconfuse CodeQL which would otherwise be at sixes or
sevens about having _two_ non-local definitions of the same `enum` with
the same values.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 upload-pack.c | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 956da5b061a0..26f29b85b551 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -1780,16 +1780,16 @@ static void send_shallow_info(struct upload_pack_data *data)
 	packet_delim(1);
 }
 
-enum fetch_state {
-	FETCH_PROCESS_ARGS = 0,
-	FETCH_SEND_ACKS,
-	FETCH_SEND_PACK,
-	FETCH_DONE,
+enum upload_state {
+	UPLOAD_PROCESS_ARGS = 0,
+	UPLOAD_SEND_ACKS,
+	UPLOAD_SEND_PACK,
+	UPLOAD_DONE,
 };
 
 int upload_pack_v2(struct repository *r, struct packet_reader *request)
 {
-	enum fetch_state state = FETCH_PROCESS_ARGS;
+	enum upload_state state = UPLOAD_PROCESS_ARGS;
 	struct upload_pack_data data;
 
 	clear_object_flags(the_repository, ALL_FLAGS);
@@ -1798,9 +1798,9 @@ int upload_pack_v2(struct repository *r, struct packet_reader *request)
 	data.use_sideband = LARGE_PACKET_MAX;
 	get_upload_pack_config(r, &data);
 
-	while (state != FETCH_DONE) {
+	while (state != UPLOAD_DONE) {
 		switch (state) {
-		case FETCH_PROCESS_ARGS:
+		case UPLOAD_PROCESS_ARGS:
 			process_args(request, &data);
 
 			if (!data.want_obj.nr && !data.wait_for_done) {
@@ -1811,27 +1811,27 @@ int upload_pack_v2(struct repository *r, struct packet_reader *request)
 				 * to just send 'have's without 'want's); guess
 				 * they didn't want anything.
 				 */
-				state = FETCH_DONE;
+				state = UPLOAD_DONE;
 			} else if (data.seen_haves) {
 				/*
 				 * Request had 'have' lines, so lets ACK them.
 				 */
-				state = FETCH_SEND_ACKS;
+				state = UPLOAD_SEND_ACKS;
 			} else {
 				/*
 				 * Request had 'want's but no 'have's so we can
 				 * immediately go to construct and send a pack.
 				 */
-				state = FETCH_SEND_PACK;
+				state = UPLOAD_SEND_PACK;
 			}
 			break;
-		case FETCH_SEND_ACKS:
+		case UPLOAD_SEND_ACKS:
 			if (process_haves_and_send_acks(&data))
-				state = FETCH_SEND_PACK;
+				state = UPLOAD_SEND_PACK;
 			else
-				state = FETCH_DONE;
+				state = UPLOAD_DONE;
 			break;
-		case FETCH_SEND_PACK:
+		case UPLOAD_SEND_PACK:
 			send_wanted_ref_info(&data);
 			send_shallow_info(&data);
 
@@ -1841,9 +1841,9 @@ int upload_pack_v2(struct repository *r, struct packet_reader *request)
 				packet_writer_write(&data.writer, "packfile\n");
 				create_pack_file(&data, NULL);
 			}
-			state = FETCH_DONE;
+			state = UPLOAD_DONE;
 			break;
-		case FETCH_DONE:
+		case UPLOAD_DONE:
 			continue;
 		}
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 05/11] has_dir_name(): make code more obvious
  2025-05-15 13:11 [PATCH 00/11] CodeQL-inspired fixes Johannes Schindelin via GitGitGadget
                   ` (3 preceding siblings ...)
  2025-05-15 13:11 ` [PATCH 04/11] upload-pack: rename `enum` to reflect the operation Johannes Schindelin via GitGitGadget
@ 2025-05-15 13:11 ` Johannes Schindelin via GitGitGadget
  2025-05-15 20:04   ` Jeff King
  2025-05-15 13:11 ` [PATCH 06/11] fetch: avoid unnecessary work when there is no current branch Johannes Schindelin via GitGitGadget
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2025-05-15 13:11 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

One thing that might be non-obvious to readers (or to analyzers like
CodeQL) is that the function essentially does nothing when the Git index
is empty, and in particular that it does not look at the value of
`len_eq_last` (which would be uninitialized at that point).

Let's make this much easier to understand, by returning early if the Git
index is empty, and by avoiding empty `else` blocks.

This commit changes indentation and is hence best viewed using
`--ignore-space-change`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 read-cache.c | 55 +++++++++++++---------------------------------------
 1 file changed, 13 insertions(+), 42 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 73f83a7e7a11..c0bb760ad473 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1117,48 +1117,19 @@ static int has_dir_name(struct index_state *istate,
 	 *
 	 * Compare the entry's full path with the last path in the index.
 	 */
-	if (istate->cache_nr > 0) {
-		cmp_last = strcmp_offset(name,
-			istate->cache[istate->cache_nr - 1]->name,
-			&len_eq_last);
-		if (cmp_last > 0) {
-			if (name[len_eq_last] != '/') {
-				/*
-				 * The entry sorts AFTER the last one in the
-				 * index.
-				 *
-				 * If there were a conflict with "file", then our
-				 * name would start with "file/" and the last index
-				 * entry would start with "file" but not "file/".
-				 *
-				 * The next character after common prefix is
-				 * not '/', so there can be no conflict.
-				 */
-				return retval;
-			} else {
-				/*
-				 * The entry sorts AFTER the last one in the
-				 * index, and the next character after common
-				 * prefix is '/'.
-				 *
-				 * Either the last index entry is a file in
-				 * conflict with this entry, or it has a name
-				 * which sorts between this entry and the
-				 * potential conflicting file.
-				 *
-				 * In both cases, we fall through to the loop
-				 * below and let the regular search code handle it.
-				 */
-			}
-		} else if (cmp_last == 0) {
-			/*
-			 * The entry exactly matches the last one in the
-			 * index, but because of multiple stage and CE_REMOVE
-			 * items, we fall through and let the regular search
-			 * code handle it.
-			 */
-		}
-	}
+	if (!istate->cache_nr)
+		return 0;
+
+	cmp_last = strcmp_offset(name,
+				 istate->cache[istate->cache_nr - 1]->name,
+				 &len_eq_last);
+	if (cmp_last > 0 && name[len_eq_last] != '/')
+		/*
+		 * The entry sorts AFTER the last one in the
+		 * index and their paths have no common prefix,
+		 * so there cannot be a F/D conflict.
+		 */
+		return 0;
 
 	for (;;) {
 		size_t len;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 06/11] fetch: avoid unnecessary work when there is no current branch
  2025-05-15 13:11 [PATCH 00/11] CodeQL-inspired fixes Johannes Schindelin via GitGitGadget
                   ` (4 preceding siblings ...)
  2025-05-15 13:11 ` [PATCH 05/11] has_dir_name(): make code more obvious Johannes Schindelin via GitGitGadget
@ 2025-05-15 13:11 ` Johannes Schindelin via GitGitGadget
  2025-05-15 20:11   ` Jeff King
  2025-05-15 13:11 ` [PATCH 07/11] Avoid redundant conditions Johannes Schindelin via GitGitGadget
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2025-05-15 13:11 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

As pointed out by CodeQL, `branch_get()` may return `NULL`, in which
case `branch_has_merge_config()` would return early, but we can even
avoid enumerating the refs prefixes in that case, saving even more CPU
cycles.

Technically, we should enclose these two statements in an `if (branch)
{...}` block, but the indentation is already quite deep, therefore I
refrained from doing that.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/fetch.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index c1a1434c7096..40a0e8d24434 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1728,7 +1728,7 @@ static int do_fetch(struct transport *transport,
 			if (transport->remote->follow_remote_head != FOLLOW_REMOTE_NEVER)
 				do_set_head = 1;
 		}
-		if (branch_has_merge_config(branch) &&
+		if (branch && branch_has_merge_config(branch) &&
 		    !strcmp(branch->remote_name, transport->remote->name)) {
 			int i;
 			for (i = 0; i < branch->merge_nr; i++) {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 07/11] Avoid redundant conditions
  2025-05-15 13:11 [PATCH 00/11] CodeQL-inspired fixes Johannes Schindelin via GitGitGadget
                   ` (5 preceding siblings ...)
  2025-05-15 13:11 ` [PATCH 06/11] fetch: avoid unnecessary work when there is no current branch Johannes Schindelin via GitGitGadget
@ 2025-05-15 13:11 ` Johannes Schindelin via GitGitGadget
  2025-05-15 20:13   ` Jeff King
  2025-05-15 13:11 ` [PATCH 08/11] trace2: avoid "futile conditional" Johannes Schindelin via GitGitGadget
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2025-05-15 13:11 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

While `if (i <= 0) ... else if (i > 0) ...` is technically equivalent to
`if (i <= 0) ... else ...`, the latter is vastly easier to read because
it avoids writing out a condition that is unnecessary. Let's drop such
unnecessary conditions.

Pointed out by CodeQL.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 help.c             | 2 +-
 transport-helper.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/help.c b/help.c
index 6ef90838f128..21b778707a6a 100644
--- a/help.c
+++ b/help.c
@@ -214,7 +214,7 @@ void exclude_cmds(struct cmdnames *cmds, struct cmdnames *excludes)
 		else if (cmp == 0) {
 			ei++;
 			free(cmds->names[ci++]);
-		} else if (cmp > 0)
+		} else
 			ei++;
 	}
 
diff --git a/transport-helper.c b/transport-helper.c
index 69391ee7d28e..0789e5bca532 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -1437,7 +1437,7 @@ static int udt_do_read(struct unidirectional_transfer *t)
 		transfer_debug("%s EOF (with %i bytes in buffer)",
 			t->src_name, (int)t->bufuse);
 		t->state = SSTATE_FLUSHING;
-	} else if (bytes > 0) {
+	} else {
 		t->bufuse += bytes;
 		transfer_debug("Read %i bytes from %s (buffer now at %i)",
 			(int)bytes, t->src_name, (int)t->bufuse);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 08/11] trace2: avoid "futile conditional"
  2025-05-15 13:11 [PATCH 00/11] CodeQL-inspired fixes Johannes Schindelin via GitGitGadget
                   ` (6 preceding siblings ...)
  2025-05-15 13:11 ` [PATCH 07/11] Avoid redundant conditions Johannes Schindelin via GitGitGadget
@ 2025-05-15 13:11 ` Johannes Schindelin via GitGitGadget
  2025-05-15 20:16   ` Jeff King
  2025-05-15 13:11 ` [PATCH 09/11] commit-graph: avoid using stale stack addresses Johannes Schindelin via GitGitGadget
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2025-05-15 13:11 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

CodeQL reports empty `if` blocks that only contain a comment as "futile
conditional". The comment talks about potential plans to turn this into
a warning, but that seems not to have been necessary. Replace the entire
construct with a concise comment.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 trace2/tr2_tmr.c | 24 +++++-------------------
 1 file changed, 5 insertions(+), 19 deletions(-)

diff --git a/trace2/tr2_tmr.c b/trace2/tr2_tmr.c
index 51f564b07a40..038181ad9be0 100644
--- a/trace2/tr2_tmr.c
+++ b/trace2/tr2_tmr.c
@@ -102,25 +102,11 @@ void tr2_update_final_timers(void)
 		struct tr2_timer *t_final = &final_timer_block.timer[tid];
 		struct tr2_timer *t = &ctx->timer_block.timer[tid];
 
-		if (t->recursion_count) {
-			/*
-			 * The current thread is exiting with
-			 * timer[tid] still running.
-			 *
-			 * Technically, this is a bug, but I'm going
-			 * to ignore it.
-			 *
-			 * I don't think it is worth calling die()
-			 * for.  I don't think it is worth killing the
-			 * process for this bookkeeping error.  We
-			 * might want to call warning(), but I'm going
-			 * to wait on that.
-			 *
-			 * The downside here is that total_ns won't
-			 * include the current open interval (now -
-			 * start_ns).  I can live with that.
-			 */
-		}
+		/*
+		 * `t->recursion_count` could technically be non-zero, which
+		 * would constitute a bug. Reporting the bug would potentially
+		 * cause an infinite recursion, though, so let's ignore it.
+		 */
 
 		if (!t->interval_count)
 			continue; /* this timer was not used by this thread */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 09/11] commit-graph: avoid using stale stack addresses
  2025-05-15 13:11 [PATCH 00/11] CodeQL-inspired fixes Johannes Schindelin via GitGitGadget
                   ` (7 preceding siblings ...)
  2025-05-15 13:11 ` [PATCH 08/11] trace2: avoid "futile conditional" Johannes Schindelin via GitGitGadget
@ 2025-05-15 13:11 ` Johannes Schindelin via GitGitGadget
  2025-05-15 20:19   ` Jeff King
  2025-05-15 13:11 ` [PATCH 10/11] bundle-uri: avoid using undefined output of `sscanf()` Johannes Schindelin via GitGitGadget
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2025-05-15 13:11 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The code is a bit too hard to reason about to fully assess whether the
`fill_commit_graph_info()` function is called at all after
`write_commit_graph()` returns (and hence the stack variable
`topo_levels` goes out of context).

Let's simply make sure that the stack address is no longer used at that
stage, thereby making the code quite a bit easier to reason about.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 commit-graph.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 9f0115dac9b5..d052c1bf15c5 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -2683,6 +2683,15 @@ cleanup:
 	oid_array_clear(&ctx.oids);
 	clear_topo_level_slab(&topo_levels);
 
+	if (ctx.r->objects->commit_graph) {
+		struct commit_graph *g = ctx.r->objects->commit_graph;
+
+		while (g) {
+			g->topo_levels = NULL;
+			g = g->base_graph;
+		}
+	}
+
 	for (i = 0; i < ctx.num_commit_graphs_before; i++)
 		free(ctx.commit_graph_filenames_before[i]);
 	free(ctx.commit_graph_filenames_before);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 10/11] bundle-uri: avoid using undefined output of `sscanf()`
  2025-05-15 13:11 [PATCH 00/11] CodeQL-inspired fixes Johannes Schindelin via GitGitGadget
                   ` (8 preceding siblings ...)
  2025-05-15 13:11 ` [PATCH 09/11] commit-graph: avoid using stale stack addresses Johannes Schindelin via GitGitGadget
@ 2025-05-15 13:11 ` Johannes Schindelin via GitGitGadget
  2025-05-15 19:21   ` Junio C Hamano
  2025-05-15 20:25   ` Jeff King
  2025-05-15 13:11 ` [PATCH 11/11] sequencer: stop pretending that an assignment is a condition Johannes Schindelin via GitGitGadget
  2025-05-15 20:26 ` [PATCH 00/11] CodeQL-inspired fixes Jeff King
  11 siblings, 2 replies; 36+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2025-05-15 13:11 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

In c429bed102 (bundle-uri: store fetch.bundleCreationToken, 2023-01-31)
code was introduced that assumes that an `sscanf()` call leaves its
output variables unchanged unless the return value indicates success.

However, the POSIX documentation makes no such guarantee:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/sscanf.html

So let's make sure that the output variable `maxCreationToken` is
always well-defined.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 bundle-uri.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 96d2ba726d99..13a42f92387e 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -532,11 +532,13 @@ static int fetch_bundles_by_token(struct repository *r,
 	 */
 	if (!repo_config_get_value(r,
 				   "fetch.bundlecreationtoken",
-				   &creationTokenStr) &&
-	    sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 &&
-	    bundles.items[0]->creationToken <= maxCreationToken) {
-		free(bundles.items);
-		return 0;
+				   &creationTokenStr)) {
+		if (sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) != 1)
+			maxCreationToken = 0;
+		if (bundles.items[0]->creationToken <= maxCreationToken) {
+			free(bundles.items);
+			return 0;
+		}
 	}
 
 	/*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 11/11] sequencer: stop pretending that an assignment is a condition
  2025-05-15 13:11 [PATCH 00/11] CodeQL-inspired fixes Johannes Schindelin via GitGitGadget
                   ` (9 preceding siblings ...)
  2025-05-15 13:11 ` [PATCH 10/11] bundle-uri: avoid using undefined output of `sscanf()` Johannes Schindelin via GitGitGadget
@ 2025-05-15 13:11 ` Johannes Schindelin via GitGitGadget
  2025-05-15 18:51   ` Junio C Hamano
                     ` (2 more replies)
  2025-05-15 20:26 ` [PATCH 00/11] CodeQL-inspired fixes Jeff King
  11 siblings, 3 replies; 36+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2025-05-15 13:11 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

In 3e81bccdf3 (sequencer: factor out todo command name parsing,
2019-06-27), a `return` statement was introduced that basically was a
long sequence of conditions, combined with `&&`, except for the last
condition which is not really a condition but an assignment.

The point of this construct was to return 1 (i.e. `true`) from the
function if all of those conditions held true, and also assign the `bol`
pointer to the end of the parsed command.

Some static analyzers are really unhappy about such constructs. And
human readers are at least puzzled, if not confused, by seeing a single
`=` inside a chain of conditions where they would have expected to see
`==` instead and, based on experience, immediately suspect a typo.

Let's help all of this by turning this into the more verbose, more
readable form of an `if` construct that both assigns the pointer as well
as returns 1 if all of the conditions hold true.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 sequencer.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index b5c4043757e9..e5e3bc6fa5ea 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -2600,9 +2600,12 @@ static int is_command(enum todo_command command, const char **bol)
 	const char nick = todo_command_info[command].c;
 	const char *p = *bol;
 
-	return (skip_prefix(p, str, &p) || (nick && *p++ == nick)) &&
-		(*p == ' ' || *p == '\t' || *p == '\n' || *p == '\r' || !*p) &&
-		(*bol = p);
+	if ((skip_prefix(p, str, &p) || (nick && *p++ == nick)) &&
+	    (*p == ' ' || *p == '\t' || *p == '\n' || *p == '\r' || !*p)) {
+		*bol = p;
+		return 1;
+	}
+	return 0;
 }
 
 static int check_label_or_ref_arg(enum todo_command command, const char *arg)
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 11/11] sequencer: stop pretending that an assignment is a condition
  2025-05-15 13:11 ` [PATCH 11/11] sequencer: stop pretending that an assignment is a condition Johannes Schindelin via GitGitGadget
@ 2025-05-15 18:51   ` Junio C Hamano
  2025-05-15 20:26   ` Jeff King
  2025-05-16 10:13   ` Phillip Wood
  2 siblings, 0 replies; 36+ messages in thread
From: Junio C Hamano @ 2025-05-15 18:51 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> In 3e81bccdf3 (sequencer: factor out todo command name parsing,
> 2019-06-27), a `return` statement was introduced that basically was a
> long sequence of conditions, combined with `&&`, except for the last
> condition which is not really a condition but an assignment.
>
> The point of this construct was to return 1 (i.e. `true`) from the
> function if all of those conditions held true, and also assign the `bol`
> pointer to the end of the parsed command.

True, as the value of 'p' cannot be NULL at that point where it is
stored to the pointer variable bol points at.  The second paragraph
above does convey what the long expression really wants to achieve.

> Some static analyzers are really unhappy about such constructs. And
> human readers are at least puzzled, if not confused, by seeing a single
> `=` inside a chain of conditions where they would have expected to see
> `==` instead and, based on experience, immediately suspect a typo.

Yes.  Good thing to get rid of.

>
> Let's help all of this by turning this into the more verbose, more
> readable form of an `if` construct that both assigns the pointer as well
> as returns 1 if all of the conditions hold true.

> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  sequencer.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/sequencer.c b/sequencer.c
> index b5c4043757e9..e5e3bc6fa5ea 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -2600,9 +2600,12 @@ static int is_command(enum todo_command command, const char **bol)
>  	const char nick = todo_command_info[command].c;
>  	const char *p = *bol;
>  
> -	return (skip_prefix(p, str, &p) || (nick && *p++ == nick)) &&
> -		(*p == ' ' || *p == '\t' || *p == '\n' || *p == '\r' || !*p) &&
> -		(*bol = p);
> +	if ((skip_prefix(p, str, &p) || (nick && *p++ == nick)) &&
> +	    (*p == ' ' || *p == '\t' || *p == '\n' || *p == '\r' || !*p)) {
> +		*bol = p;
> +		return 1;
> +	}
> +	return 0;
>  }

Perfect.  That's quite a natural way to express the intention.



>  
>  static int check_label_or_ref_arg(enum todo_command command, const char *arg)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] bundle-uri: avoid using undefined output of `sscanf()`
  2025-05-15 13:11 ` [PATCH 10/11] bundle-uri: avoid using undefined output of `sscanf()` Johannes Schindelin via GitGitGadget
@ 2025-05-15 19:21   ` Junio C Hamano
  2025-05-15 20:25   ` Jeff King
  1 sibling, 0 replies; 36+ messages in thread
From: Junio C Hamano @ 2025-05-15 19:21 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> In c429bed102 (bundle-uri: store fetch.bundleCreationToken, 2023-01-31)
> code was introduced that assumes that an `sscanf()` call leaves its
> output variables unchanged unless the return value indicates success.
>
> However, the POSIX documentation makes no such guarantee:
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/sscanf.html
>
> So let's make sure that the output variable `maxCreationToken` is
> always well-defined.
>
> diff --git a/bundle-uri.c b/bundle-uri.c
> index 96d2ba726d99..13a42f92387e 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -532,11 +532,13 @@ static int fetch_bundles_by_token(struct repository *r,
>  	 */
>  	if (!repo_config_get_value(r,
>  				   "fetch.bundlecreationtoken",
> -				   &creationTokenStr) &&
> -	    sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 &&
> -	    bundles.items[0]->creationToken <= maxCreationToken) {
> -		free(bundles.items);
> -		return 0;

The original said "if we successfully parsed and the value of the
token is larger than the token, we are done", which is probably OK,
but the problem is if we were fed garbage and failed to parse it, we
would have smudged maxCreationToken to some unknown value, and the
code path that follows here, which assumes that maxCreationToken is
left as initialized to 0 will be broken.

So the problem is real, but I find the rewritten form a bit hard to
follow.  Namely, when sscanf() failed to grab maxCreationToken, we
never compared it with bundles.items[0]->creationToken, which makes
perfect sense to me, but now...

> +				   &creationTokenStr)) {
> +		if (sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) != 1)
> +			maxCreationToken = 0;
> +		if (bundles.items[0]->creationToken <= maxCreationToken) {
> +			free(bundles.items);
> +			return 0;
> +		}

... the updated code does use the just-assigned-because-we-failed-to-parse
value 0 in comparison.

I have to wonder if the attached patch is simpler to reason about,
more in line with what the original wanted to do, and more correct?

When we fail to grab the configuration, or when the value we grabbed
from the configuration does not parse, then we reset
maxCreationToken to 0, but otherwise we have a valid
maxCreationToken so use it to see if we should return early.

The cases we do not return early here are either (1) we did not have
usable configured value, in which case maxCreationToken is set to 0
before reaching the loop after this code, or (2) the value of
maxCreationToken we grabbed from the configuration is smaller than
the creationToken in the bundle list, in which case that value is
used when entering the loop.

Thanks.

 bundle-uri.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git c/bundle-uri.c w/bundle-uri.c
index f3579e228e..13a43f8e32 100644
--- c/bundle-uri.c
+++ w/bundle-uri.c
@@ -531,11 +531,11 @@ static int fetch_bundles_by_token(struct repository *r,
 	 * is not strictly smaller than the maximum creation token in the
 	 * bundle list, then do not download any bundles.
 	 */
-	if (!repo_config_get_value(r,
-				   "fetch.bundlecreationtoken",
-				   &creationTokenStr) &&
-	    sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 &&
-	    bundles.items[0]->creationToken <= maxCreationToken) {
+	if (repo_config_get_value(r, "fetch.bundlecreationtoken",
+				  &creationTokenStr) ||
+	    sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) != 1)
+		maxCreationToken = 0;
+	else if (bundles.items[0]->creationToken <= maxCreationToken) {
 		free(bundles.items);
 		return 0;
 	}

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 02/11] fetch: carefully clear local variable's address after use
  2025-05-15 13:11 ` [PATCH 02/11] fetch: carefully clear local variable's address after use Johannes Schindelin via GitGitGadget
@ 2025-05-15 19:40   ` Jeff King
  0 siblings, 0 replies; 36+ messages in thread
From: Jeff King @ 2025-05-15 19:40 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

On Thu, May 15, 2025 at 01:11:40PM +0000, Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> As pointed out by CodeQL, it is a potentially dangerous practice to
> store local variables' addresses in non-local structs. Yet this is
> exactly what happens with the `acked_commits` attribute that is used in
> `cmd_fetch()`: The pointer to a local variable is assigned to it.
> 
> Now, it is Git's convention that `cmd_*()` functions are essentially
> only returning just before exiting the process, therefore there is
> little danger that this attribute is used after the code flow returns
> from that function.

I was going to say: the real sin here is using a global variable in the
first place, without which gtransport would not survive outside of
cmd_fetch(). But the issue is even worse than that. The acked_commits
variable is inside a conditional block, so the address is stale for the
rest of cmd_fetch(), too!

It doesn't look like we ever examine it after that, but it's hard to
trace, since it's a global. ;)

> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index cda6eaf1fd6e..c1a1434c7096 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -2560,6 +2560,7 @@ int cmd_fetch(int argc,
>  		if (server_options.nr)
>  			gtransport->server_options = &server_options;
>  		result = transport_fetch_refs(gtransport, NULL);
> +		gtransport->smart_options->acked_commits = NULL;
>  
>  		oidset_iter_init(&acked_commits, &iter);
>  		while ((oid = oidset_iter_next(&iter)))

Here you unset it within that conditional block, which is the right
spot. Looks good.

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/11] commit: simplify code
  2025-05-15 13:11 ` [PATCH 01/11] commit: simplify code Johannes Schindelin via GitGitGadget
@ 2025-05-15 19:48   ` Jeff King
  2025-05-15 20:37     ` Junio C Hamano
  0 siblings, 1 reply; 36+ messages in thread
From: Jeff King @ 2025-05-15 19:48 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

On Thu, May 15, 2025 at 01:11:39PM +0000, Johannes Schindelin via GitGitGadget wrote:

> diff --git a/builtin/commit.c b/builtin/commit.c
> index 66bd91fd523d..fba0dded64a7 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -1022,7 +1022,7 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
>  			for (i = 0; i < the_repository->index->cache_nr; i++)
>  				if (ce_intent_to_add(the_repository->index->cache[i]))
>  					ita_nr++;
> -			committable = the_repository->index->cache_nr - ita_nr > 0;
> +			committable = the_repository->index->cache_nr > ita_nr;

I guess it is not possible for ita_nr to be greater than cache_nr, since
we are counting up entries in the loop above. If ita_nr were greater,
the original would wrap around and set committable to true, but yours
would not.

So really, I think the original was equivalent to:

  committable = cache_nr != ita_nr;

but I think ">" probably expresses the intent better (we want to know if
there are any non-ita entries). Though in that case I'd think:

  committable = 0;
  for (i = 0; i < cache_nr; i++) {
	if (!ce_intent_to_add(...) {
		committable = 1;
		break;
	}
  }

would be the most clear, since we do not otherwise care about the actual
number of ita entries. And lets us break out of the loop early.

I dunno if it is worth refactoring further, though. Your patch does the
correct thing and fixes the codeql complaint (which I do think is a
false positive, because ita_nr must be less than cache_nr).

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] commit-graph: avoid malloc'ing a local variable
  2025-05-15 13:11 ` [PATCH 03/11] commit-graph: avoid malloc'ing a local variable Johannes Schindelin via GitGitGadget
@ 2025-05-15 19:54   ` Jeff King
  2025-05-15 21:40     ` Junio C Hamano
  0 siblings, 1 reply; 36+ messages in thread
From: Jeff King @ 2025-05-15 19:54 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

On Thu, May 15, 2025 at 01:11:41PM +0000, Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> We do need a context to write the commit graph, but that context is only
> needed during the life time of `commit_graph_write()`, therefore it can
> easily be a stack variable.

Yay. I am in favor of using stack variables when possible as a general
rule.

> diff --git a/commit-graph.c b/commit-graph.c
> index 6394752b0b08..9f0115dac9b5 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -2509,7 +2509,17 @@ int write_commit_graph(struct object_directory *odb,
>  		       const struct commit_graph_opts *opts)
>  {
>  	struct repository *r = the_repository;
> -	struct write_commit_graph_context *ctx;
> +	struct write_commit_graph_context ctx = {
> +		.r = r,
> +		.odb = odb,
> +		.append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0,
> +		.report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0,
> +		.split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0,
> +		.opts = opts,
> +		.total_bloom_filter_data_size = 0,
> +		.write_generation_data = (get_configured_generation_version(r) == 2),
> +		.num_generation_data_overflows = 0,
> +	};
>  	uint32_t i;
>  	int res = 0;
>  	int replace = 0;
> @@ -2531,17 +2541,6 @@ int write_commit_graph(struct object_directory *odb,
>  		return 0;
>  	}
>  
> -	CALLOC_ARRAY(ctx, 1);
> -	ctx->r = r;
> -	ctx->odb = odb;
> -	ctx->append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0;
> -	ctx->report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0;
> -	ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0;
> -	ctx->opts = opts;
> -	ctx->total_bloom_filter_data_size = 0;
> -	ctx->write_generation_data = (get_configured_generation_version(r) == 2);
> -	ctx->num_generation_data_overflows = 0;

OK, this moves the initialization to the top of the function. So to
review this for correctness, we must make sure that we do not change the
values of any of those variables between the two spots (i.e., in the
diff context that is omitted).

Most of it looks fine. Our call to get_configured_generation_version()
now happens earlier, before the call to prepare_repo_settings(). I think
that is OK, because the former calls repo_config_get_int() directly. It
does seem like a potential maintenance problem if that call is ever
rolled into prepare_repo_settings().

So maybe OK, but the smaller change would be to just replace the calloc
with a memset(), and s/->/./ on the subsequent lines.

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 04/11] upload-pack: rename `enum` to reflect the operation
  2025-05-15 13:11 ` [PATCH 04/11] upload-pack: rename `enum` to reflect the operation Johannes Schindelin via GitGitGadget
@ 2025-05-15 19:55   ` Jeff King
  0 siblings, 0 replies; 36+ messages in thread
From: Jeff King @ 2025-05-15 19:55 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

On Thu, May 15, 2025 at 01:11:42PM +0000, Johannes Schindelin via GitGitGadget wrote:

> While 3145ea957d (upload-pack: introduce fetch server command,
> 2018-03-15) added support for the `fetch` command, from the server's
> point of view it is an upload, and hence the `enum` should really be
> called `upload_state` instead of `fetch_state`. Likewise, rename its
> values.
> 
> This also helps unconfuse CodeQL which would otherwise be at sixes or
> sevens about having _two_ non-local definitions of the same `enum` with
> the same values.

It unconfuses me, too. Nice change.

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 05/11] has_dir_name(): make code more obvious
  2025-05-15 13:11 ` [PATCH 05/11] has_dir_name(): make code more obvious Johannes Schindelin via GitGitGadget
@ 2025-05-15 20:04   ` Jeff King
  0 siblings, 0 replies; 36+ messages in thread
From: Jeff King @ 2025-05-15 20:04 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

On Thu, May 15, 2025 at 01:11:43PM +0000, Johannes Schindelin via GitGitGadget wrote:

> One thing that might be non-obvious to readers (or to analyzers like
> CodeQL) is that the function essentially does nothing when the Git index
> is empty, and in particular that it does not look at the value of
> `len_eq_last` (which would be uninitialized at that point).
> 
> Let's make this much easier to understand, by returning early if the Git
> index is empty, and by avoiding empty `else` blocks.

OK, so we return early, skipping not only what's in the conditional
that you're touching here, but also the "for(;;)" loop below.

And in that one, we'll look for the next slash (and break if none).
We'll check the name up to that stage via index_name_stage_pos(). And
obviously that will not find a match if there are no index entries. So
we'd do nothing and loop again, looking for the next slash, until we
eventually hit the end.

So yeah, I agree if there are no index entries we can bail immediately.

> This commit changes indentation and is hence best viewed using
> `--ignore-space-change`.

Yeah. I was puzzled at first by the amount of dropped code, but they are
all comments that say "fall through to the code below".

So I think the change here is correct. We are losing some comments that
could be helpful, but I'm not familiar enough with those code to say
whether they would be. Just reading what you've left makes sense to me
on its own.

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 06/11] fetch: avoid unnecessary work when there is no current branch
  2025-05-15 13:11 ` [PATCH 06/11] fetch: avoid unnecessary work when there is no current branch Johannes Schindelin via GitGitGadget
@ 2025-05-15 20:11   ` Jeff King
  0 siblings, 0 replies; 36+ messages in thread
From: Jeff King @ 2025-05-15 20:11 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

On Thu, May 15, 2025 at 01:11:44PM +0000, Johannes Schindelin via GitGitGadget wrote:

> As pointed out by CodeQL, `branch_get()` may return `NULL`, in which
> case `branch_has_merge_config()` would return early, but we can even
> avoid enumerating the refs prefixes in that case, saving even more CPU
> cycles.

I am not sure how this patch changes anything with respect to CPU. If
branch is NULL, then branch_has_merge_config(branch) will always return
false.

I think this is just an issue that CodeQL is not looking inside
branch_has_merge_config(), and thus does not realize we will never hit
the rest of the short-circuit conditional (let alone the body) in that
case?

Still may be worth dealing with, but it makes me a little sad to have to
add an extra redundant check (one place is not a big deal, but as a
general pattern).

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 07/11] Avoid redundant conditions
  2025-05-15 13:11 ` [PATCH 07/11] Avoid redundant conditions Johannes Schindelin via GitGitGadget
@ 2025-05-15 20:13   ` Jeff King
  0 siblings, 0 replies; 36+ messages in thread
From: Jeff King @ 2025-05-15 20:13 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

On Thu, May 15, 2025 at 01:11:45PM +0000, Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> While `if (i <= 0) ... else if (i > 0) ...` is technically equivalent to
> `if (i <= 0) ... else ...`, the latter is vastly easier to read because
> it avoids writing out a condition that is unnecessary. Let's drop such
> unnecessary conditions.
> 
> Pointed out by CodeQL.

Yeah, I'd agree that it is easier (otherwise, you are left wondering if
there is an "else" case you are missing).

>  help.c             | 2 +-
>  transport-helper.c | 2 +-

Both spots look good to me.

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 08/11] trace2: avoid "futile conditional"
  2025-05-15 13:11 ` [PATCH 08/11] trace2: avoid "futile conditional" Johannes Schindelin via GitGitGadget
@ 2025-05-15 20:16   ` Jeff King
  0 siblings, 0 replies; 36+ messages in thread
From: Jeff King @ 2025-05-15 20:16 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

On Thu, May 15, 2025 at 01:11:46PM +0000, Johannes Schindelin via GitGitGadget wrote:

> CodeQL reports empty `if` blocks that only contain a comment as "futile
> conditional". The comment talks about potential plans to turn this into
> a warning, but that seems not to have been necessary. Replace the entire
> construct with a concise comment.

OK...

> -		if (t->recursion_count) {
> -			/*
> -			 * The current thread is exiting with
> -			 * timer[tid] still running.
> -			 *
> -			 * Technically, this is a bug, but I'm going
> -			 * to ignore it.
> -			 *
> -			 * I don't think it is worth calling die()
> -			 * for.  I don't think it is worth killing the
> -			 * process for this bookkeeping error.  We
> -			 * might want to call warning(), but I'm going
> -			 * to wait on that.
> -			 *
> -			 * The downside here is that total_ns won't
> -			 * include the current open interval (now -
> -			 * start_ns).  I can live with that.
> -			 */
> -		}
> +		/*
> +		 * `t->recursion_count` could technically be non-zero, which
> +		 * would constitute a bug. Reporting the bug would potentially
> +		 * cause an infinite recursion, though, so let's ignore it.
> +		 */

The original doesn't talk about infinite recursion at all, though I can
well believe that would be the case, having run into trace->die->trace
types of bugs before. Did you trace out the actual path of recursion? If
so, it might be worth summarizing it.

Obviously the code change itself cannot hurt anything, as it was a noop.

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 09/11] commit-graph: avoid using stale stack addresses
  2025-05-15 13:11 ` [PATCH 09/11] commit-graph: avoid using stale stack addresses Johannes Schindelin via GitGitGadget
@ 2025-05-15 20:19   ` Jeff King
  0 siblings, 0 replies; 36+ messages in thread
From: Jeff King @ 2025-05-15 20:19 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

On Thu, May 15, 2025 at 01:11:47PM +0000, Johannes Schindelin via GitGitGadget wrote:

> The code is a bit too hard to reason about to fully assess whether the
> `fill_commit_graph_info()` function is called at all after
> `write_commit_graph()` returns (and hence the stack variable
> `topo_levels` goes out of context).
> 
> Let's simply make sure that the stack address is no longer used at that
> stage, thereby making the code quite a bit easier to reason about.

Yep, I think this is a good practice in general. If the topo_levels
member is never used outside of writing, I wonder if it could live in a
separate data structure. But that is a much bigger refactor that I don't
think we need to tackle here.

> diff --git a/commit-graph.c b/commit-graph.c
> index 9f0115dac9b5..d052c1bf15c5 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -2683,6 +2683,15 @@ cleanup:
>  	oid_array_clear(&ctx.oids);
>  	clear_topo_level_slab(&topo_levels);
>  
> +	if (ctx.r->objects->commit_graph) {
> +		struct commit_graph *g = ctx.r->objects->commit_graph;
> +
> +		while (g) {
> +			g->topo_levels = NULL;
> +			g = g->base_graph;
> +		}
> +	}

This just clears the pointers to the local variable. Looks good.

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] bundle-uri: avoid using undefined output of `sscanf()`
  2025-05-15 13:11 ` [PATCH 10/11] bundle-uri: avoid using undefined output of `sscanf()` Johannes Schindelin via GitGitGadget
  2025-05-15 19:21   ` Junio C Hamano
@ 2025-05-15 20:25   ` Jeff King
  2025-05-16 10:11     ` Phillip Wood
  1 sibling, 1 reply; 36+ messages in thread
From: Jeff King @ 2025-05-15 20:25 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

On Thu, May 15, 2025 at 01:11:48PM +0000, Johannes Schindelin via GitGitGadget wrote:

> In c429bed102 (bundle-uri: store fetch.bundleCreationToken, 2023-01-31)
> code was introduced that assumes that an `sscanf()` call leaves its
> output variables unchanged unless the return value indicates success.
> 
> However, the POSIX documentation makes no such guarantee:
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/sscanf.html
> 
> So let's make sure that the output variable `maxCreationToken` is
> always well-defined.

Definitely an issue, but...why are we using sscanf() at all?

Wouldn't strtoul() be the usual thing in our code base? Or even just
repo_config_get_ulong()? The behavior of the latter would differ in that
we'd complain about a garbage value in fetch.bundlecreationtoken, but
wouldn't that be a good thing?

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 11/11] sequencer: stop pretending that an assignment is a condition
  2025-05-15 13:11 ` [PATCH 11/11] sequencer: stop pretending that an assignment is a condition Johannes Schindelin via GitGitGadget
  2025-05-15 18:51   ` Junio C Hamano
@ 2025-05-15 20:26   ` Jeff King
  2025-05-16 10:13   ` Phillip Wood
  2 siblings, 0 replies; 36+ messages in thread
From: Jeff King @ 2025-05-15 20:26 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

On Thu, May 15, 2025 at 01:11:49PM +0000, Johannes Schindelin via GitGitGadget wrote:

> Let's help all of this by turning this into the more verbose, more
> readable form of an `if` construct that both assigns the pointer as well
> as returns 1 if all of the conditions hold true.

I see Junio already reviewed this one, but just so I can say I read all
of the patches: yes, I also agree the result is easier to understand. :)

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/11] CodeQL-inspired fixes
  2025-05-15 13:11 [PATCH 00/11] CodeQL-inspired fixes Johannes Schindelin via GitGitGadget
                   ` (10 preceding siblings ...)
  2025-05-15 13:11 ` [PATCH 11/11] sequencer: stop pretending that an assignment is a condition Johannes Schindelin via GitGitGadget
@ 2025-05-15 20:26 ` Jeff King
  2025-05-15 20:58   ` Junio C Hamano
  11 siblings, 1 reply; 36+ messages in thread
From: Jeff King @ 2025-05-15 20:26 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

On Thu, May 15, 2025 at 01:11:38PM +0000, Johannes Schindelin via GitGitGadget wrote:

> CodeQL [https://codeql.github.com/] pointed out a couple of issues, which
> are addressed in this patch series.
> 
> Johannes Schindelin (11):
>   commit: simplify code
>   fetch: carefully clear local variable's address after use
>   commit-graph: avoid malloc'ing a local variable
>   upload-pack: rename `enum` to reflect the operation
>   has_dir_name(): make code more obvious
>   fetch: avoid unnecessary work when there is no current branch
>   Avoid redundant conditions
>   trace2: avoid "futile conditional"
>   commit-graph: avoid using stale stack addresses
>   bundle-uri: avoid using undefined output of `sscanf()`
>   sequencer: stop pretending that an assignment is a condition

I read through all of these and didn't find anything incorrect. I did
leave a few comments that might or might not be worth following up on.
Thanks for fixing these.

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/11] commit: simplify code
  2025-05-15 19:48   ` Jeff King
@ 2025-05-15 20:37     ` Junio C Hamano
  2025-05-15 20:49       ` Jeff King
  0 siblings, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2025-05-15 20:37 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin

Jeff King <peff@peff.net> writes:

> ... there are any non-ita entries). Though in that case I'd think:
>
>   committable = 0;
>   for (i = 0; i < cache_nr; i++) {
> 	if (!ce_intent_to_add(...) {
> 		committable = 1;
> 		break;
> 	}
>   }
>
> would be the most clear, since we do not otherwise care about the actual
> number of ita entries. And lets us break out of the loop early.

Exactly.  If you focus on the warning too narrowly, the minimal
change in the original patch does look OK, but in the original (even
before Dscho's patch, that is) the intent is unclear, as opposed to
what you showed above.  And the update to squelch false positive
does not improve the clarity of the logic as the above rewrite does.

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/11] commit: simplify code
  2025-05-15 20:37     ` Junio C Hamano
@ 2025-05-15 20:49       ` Jeff King
  0 siblings, 0 replies; 36+ messages in thread
From: Jeff King @ 2025-05-15 20:49 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin

On Thu, May 15, 2025 at 01:37:00PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > ... there are any non-ita entries). Though in that case I'd think:
> >
> >   committable = 0;
> >   for (i = 0; i < cache_nr; i++) {
> > 	if (!ce_intent_to_add(...) {
> > 		committable = 1;
> > 		break;
> > 	}
> >   }
> >
> > would be the most clear, since we do not otherwise care about the actual
> > number of ita entries. And lets us break out of the loop early.
> 
> Exactly.  If you focus on the warning too narrowly, the minimal
> change in the original patch does look OK, but in the original (even
> before Dscho's patch, that is) the intent is unclear, as opposed to
> what you showed above.  And the update to squelch false positive
> does not improve the clarity of the logic as the above rewrite does.

OK. If we do want to refactor, I think pulling it into a separate
function is the most descriptive, like:

diff --git a/builtin/commit.c b/builtin/commit.c
index 66bd91fd52..a8d43d223d 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -740,6 +740,15 @@ static void change_data_free(void *util, const char *str UNUSED)
 	free(d);
 }
 
+static int has_non_ita_entries(struct index_state *index)
+{
+	int i;
+	for (i = 0; i < index->cache_nr; i++)
+		if (!ce_intent_to_add(index->cache[i]))
+			return 1;
+	return 0;
+}
+
 static int prepare_to_commit(const char *index_file, const char *prefix,
 			     struct commit *current_head,
 			     struct wt_status *s,
@@ -1015,14 +1024,10 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 			parent = "HEAD^1";
 
 		if (repo_get_oid(the_repository, parent, &oid)) {
-			int i, ita_nr = 0;
-
 			/* TODO: audit for interaction with sparse-index. */
 			ensure_full_index(the_repository->index);
-			for (i = 0; i < the_repository->index->cache_nr; i++)
-				if (ce_intent_to_add(the_repository->index->cache[i]))
-					ita_nr++;
-			committable = the_repository->index->cache_nr - ita_nr > 0;
+			committable =
+				has_non_ita_entries(the_repository->index);
 		} else {
 			/*
 			 * Unless the user did explicitly request a submodule

-Peff

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/11] CodeQL-inspired fixes
  2025-05-15 20:26 ` [PATCH 00/11] CodeQL-inspired fixes Jeff King
@ 2025-05-15 20:58   ` Junio C Hamano
  0 siblings, 0 replies; 36+ messages in thread
From: Junio C Hamano @ 2025-05-15 20:58 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin

Jeff King <peff@peff.net> writes:

> On Thu, May 15, 2025 at 01:11:38PM +0000, Johannes Schindelin via GitGitGadget wrote:
>
>> CodeQL [https://codeql.github.com/] pointed out a couple of issues, which
>> are addressed in this patch series.
>> 
>> Johannes Schindelin (11):
>>   commit: simplify code
>>   fetch: carefully clear local variable's address after use
>>   commit-graph: avoid malloc'ing a local variable
>>   upload-pack: rename `enum` to reflect the operation
>>   has_dir_name(): make code more obvious
>>   fetch: avoid unnecessary work when there is no current branch
>>   Avoid redundant conditions
>>   trace2: avoid "futile conditional"
>>   commit-graph: avoid using stale stack addresses
>>   bundle-uri: avoid using undefined output of `sscanf()`
>>   sequencer: stop pretending that an assignment is a condition
>
> I read through all of these and didn't find anything incorrect. I did
> leave a few comments that might or might not be worth following up on.
> Thanks for fixing these.

Yup, I also looked at them and didn't see any incorrect updates.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] commit-graph: avoid malloc'ing a local variable
  2025-05-15 19:54   ` Jeff King
@ 2025-05-15 21:40     ` Junio C Hamano
  0 siblings, 0 replies; 36+ messages in thread
From: Junio C Hamano @ 2025-05-15 21:40 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin

Jeff King <peff@peff.net> writes:

> So maybe OK, but the smaller change would be to just replace the calloc
> with a memset(), and s/->/./ on the subsequent lines.

True, and it would be a bit easier to merge with other topics in
flight.  The .oid member and parameter are both renamed IIUC.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] bundle-uri: avoid using undefined output of `sscanf()`
  2025-05-15 20:25   ` Jeff King
@ 2025-05-16 10:11     ` Phillip Wood
  2025-05-16 13:40       ` Phillip Wood
  0 siblings, 1 reply; 36+ messages in thread
From: Phillip Wood @ 2025-05-16 10:11 UTC (permalink / raw)
  To: Jeff King, Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

On 15/05/2025 21:25, Jeff King wrote:
> On Thu, May 15, 2025 at 01:11:48PM +0000, Johannes Schindelin via GitGitGadget wrote:
> 
>> In c429bed102 (bundle-uri: store fetch.bundleCreationToken, 2023-01-31)
>> code was introduced that assumes that an `sscanf()` call leaves its
>> output variables unchanged unless the return value indicates success.
>>
>> However, the POSIX documentation makes no such guarantee:
>> https://pubs.opengroup.org/onlinepubs/9699919799/functions/sscanf.html
>>
>> So let's make sure that the output variable `maxCreationToken` is
>> always well-defined.
> 
> Definitely an issue, but...why are we using sscanf() at all?
> 
> Wouldn't strtoul() be the usual thing in our code base? Or even just
> repo_config_get_ulong()? The behavior of the latter would differ in that
> we'd complain about a garbage value in fetch.bundlecreationtoken, but
> wouldn't that be a good thing?

I had a similar thought, though to make sure that we parsed 64 bit 
values correctly on windows so we'd need something based on strtoumax() 
I think. There is another call to sscanf() in this file which the 
analyzer does not complain about because it stores the result in a local 
variable that is not used if the call to sscanf() fails. We should stop 
using sscanf() there as well. I wonder if we should add something about 
not using sscanf() to our coding guidelines. Apart from this file the 
only other use of sscanf() is in a test helper which doesn't seem so bad 
though if we removed that we could add sscanf() to banned.h.

Best Wishes

Phillip


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 11/11] sequencer: stop pretending that an assignment is a condition
  2025-05-15 13:11 ` [PATCH 11/11] sequencer: stop pretending that an assignment is a condition Johannes Schindelin via GitGitGadget
  2025-05-15 18:51   ` Junio C Hamano
  2025-05-15 20:26   ` Jeff King
@ 2025-05-16 10:13   ` Phillip Wood
  2 siblings, 0 replies; 36+ messages in thread
From: Phillip Wood @ 2025-05-16 10:13 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget, git; +Cc: Johannes Schindelin

Hi Johannes

Thanks for cleaning this up - I'm not sure why I didn't just write 
something like this in the first place.

Best Wishes

Phillip

On 15/05/2025 14:11, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> In 3e81bccdf3 (sequencer: factor out todo command name parsing,
> 2019-06-27), a `return` statement was introduced that basically was a
> long sequence of conditions, combined with `&&`, except for the last
> condition which is not really a condition but an assignment.
> 
> The point of this construct was to return 1 (i.e. `true`) from the
> function if all of those conditions held true, and also assign the `bol`
> pointer to the end of the parsed command.
> 
> Some static analyzers are really unhappy about such constructs. And
> human readers are at least puzzled, if not confused, by seeing a single
> `=` inside a chain of conditions where they would have expected to see
> `==` instead and, based on experience, immediately suspect a typo.
> 
> Let's help all of this by turning this into the more verbose, more
> readable form of an `if` construct that both assigns the pointer as well
> as returns 1 if all of the conditions hold true.
> 
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>   sequencer.c | 9 ++++++---
>   1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/sequencer.c b/sequencer.c
> index b5c4043757e9..e5e3bc6fa5ea 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -2600,9 +2600,12 @@ static int is_command(enum todo_command command, const char **bol)
>   	const char nick = todo_command_info[command].c;
>   	const char *p = *bol;
>   
> -	return (skip_prefix(p, str, &p) || (nick && *p++ == nick)) &&
> -		(*p == ' ' || *p == '\t' || *p == '\n' || *p == '\r' || !*p) &&
> -		(*bol = p);
> +	if ((skip_prefix(p, str, &p) || (nick && *p++ == nick)) &&
> +	    (*p == ' ' || *p == '\t' || *p == '\n' || *p == '\r' || !*p)) {
> +		*bol = p;
> +		return 1;
> +	}
> +	return 0;
>   }
>   
>   static int check_label_or_ref_arg(enum todo_command command, const char *arg)


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] bundle-uri: avoid using undefined output of `sscanf()`
  2025-05-16 10:11     ` Phillip Wood
@ 2025-05-16 13:40       ` Phillip Wood
  2025-05-16 15:42         ` Jeff King
  0 siblings, 1 reply; 36+ messages in thread
From: Phillip Wood @ 2025-05-16 13:40 UTC (permalink / raw)
  To: phillip.wood123; +Cc: git, gitgitgadget, johannes.schindelin, peff

On 16/05/2025 11:11, Phillip Wood wrote:

> I had a similar thought, though to make sure that we parsed 64 bit 
> values correctly on windows so we'd need something based on strtoumax() 
> I think.

Perhaps something like the diff below which adds strtoul_u64() in a
similar vein to strtoul_ui(). I think it's debatable whether we really
want to skip leading whitespace so we could perhaps tighten things up
by replacing "if (strchr(s, '-'))" with "if (!isdigit(*s))" though
that would mean this function would behave slightly differently to
strtoul_ui().

Best Wishes

Phillip

---- >8 ----
diff --git a/bundle-uri.c b/bundle-uri.c
index 96d2ba726d9..9dff7a1c09d 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -214,7 +214,7 @@ static int bundle_list_update(const char *key, const char *value,
 	}
 
 	if (!strcmp(subkey, "creationtoken")) {
-		if (sscanf(value, "%"PRIu64, &bundle->creationToken) != 1)
+		if (strtoul_u64(value, 10, &bundle->creationToken))
 			warning(_("could not parse bundle list key %s with value '%s'"),
 				"creationToken", value);
 		return 0;
@@ -533,7 +533,7 @@ static int fetch_bundles_by_token(struct repository *r,
 	if (!repo_config_get_value(r,
 				   "fetch.bundlecreationtoken",
 				   &creationTokenStr) &&
-	    sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 &&
+	    strtoul_u64(creationTokenStr,10, &maxCreationToken) &&
 	    bundles.items[0]->creationToken <= maxCreationToken) {
 		free(bundles.items);
 		return 0;
diff --git a/git-compat-util.h b/git-compat-util.h
index 36b9577c8d4..d34d07fce1e 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -939,6 +939,22 @@ static inline int strtol_i(char const *s, int base, int *result)
 	return 0;
 }
 
+static inline int strtoul_u64(char const *s, int base, uint64_t *result)
+{
+	uintmax_t ul;
+	char *p;
+
+	errno = 0;
+	/* negative values would be accepted by strtoumax */
+	if (strchr(s, '-'))
+		return -1;
+	ul = strtoumax(s, &p, base);
+	if (errno || *p || p == s || (uint64_t) ul != ul)
+		return -1;
+	*result = ul;
+	return 0;
+}
+
 #ifndef REG_STARTEND
 #error "Git requires REG_STARTEND support. Compile with NO_REGEX=NeedsStartEnd"
 #endif
-- 
2.49.0.897.gfad3eb7d210


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] bundle-uri: avoid using undefined output of `sscanf()`
  2025-05-16 13:40       ` Phillip Wood
@ 2025-05-16 15:42         ` Jeff King
  2025-05-19  9:03           ` Phillip Wood
  0 siblings, 1 reply; 36+ messages in thread
From: Jeff King @ 2025-05-16 15:42 UTC (permalink / raw)
  To: Phillip Wood; +Cc: git, gitgitgadget, johannes.schindelin

On Fri, May 16, 2025 at 02:40:54PM +0100, Phillip Wood wrote:

> On 16/05/2025 11:11, Phillip Wood wrote:
> 
> > I had a similar thought, though to make sure that we parsed 64 bit 
> > values correctly on windows so we'd need something based on strtoumax() 
> > I think.
> 
> Perhaps something like the diff below which adds strtoul_u64() in a
> similar vein to strtoul_ui(). I think it's debatable whether we really
> want to skip leading whitespace so we could perhaps tighten things up
> by replacing "if (strchr(s, '-'))" with "if (!isdigit(*s))" though
> that would mean this function would behave slightly differently to
> strtoul_ui().

It feels like we would had to have dealt with this before for other
large values. But poking around at a few obvious suspects (e.g.,
packSizeLimit), it looks like they are all constrained to "unsigned
long".

So yeah, we probably do need something new. IMHO we should probably have
repo_config_get_u64() or similar (with the appropriate underlying
helpers as well) as use it here. But I am happy with any solution.

And I do agree that we should consider banning *scanf(). With numeric
placeholders I don't think they're a security problem (though they are
easy to get wrnog, as this discussion shows). But using them with "%s"
should generally be disallowed.

There is an fscanf() in builtin/gc.c that uses "%s", but it is careful
to construct a custom format string that limits the string size. Yuck.
The usual thing in our code base would be to read into a buffer and
parse from there.

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] bundle-uri: avoid using undefined output of `sscanf()`
  2025-05-16 15:42         ` Jeff King
@ 2025-05-19  9:03           ` Phillip Wood
  2025-05-22  6:03             ` Jeff King
  0 siblings, 1 reply; 36+ messages in thread
From: Phillip Wood @ 2025-05-19  9:03 UTC (permalink / raw)
  To: Jeff King; +Cc: git, gitgitgadget, johannes.schindelin

On 16/05/2025 16:42, Jeff King wrote:
> On Fri, May 16, 2025 at 02:40:54PM +0100, Phillip Wood wrote:
> 
>> On 16/05/2025 11:11, Phillip Wood wrote:
>>
>>> I had a similar thought, though to make sure that we parsed 64 bit
>>> values correctly on windows so we'd need something based on strtoumax()
>>> I think.
>>
>> Perhaps something like the diff below which adds strtoul_u64() in a
>> similar vein to strtoul_ui(). I think it's debatable whether we really
>> want to skip leading whitespace so we could perhaps tighten things up
>> by replacing "if (strchr(s, '-'))" with "if (!isdigit(*s))" though
>> that would mean this function would behave slightly differently to
>> strtoul_ui().
> 
> It feels like we would had to have dealt with this before for other
> large values. But poking around at a few obvious suspects (e.g.,
> packSizeLimit), it looks like they are all constrained to "unsigned
> long".

I was surprised by that as well

> So yeah, we probably do need something new. IMHO we should probably have
> repo_config_get_u64() or similar (with the appropriate underlying
> helpers as well) as use it here. But I am happy with any solution.
I think repo_config_get_ulong() and friends all accept a multiplier 
suffix. That makes sense for things like packSizeLimit but here we're 
expecting a bare integer. It probably doesn't really matter but as one 
of the code paths parses a file that comes from the bundle server we 
might want to be as strict as we can be.

> And I do agree that we should consider banning *scanf(). With numeric
> placeholders I don't think they're a security problem (though they are
> easy to get wrnog, as this discussion shows). But using them with "%s"
> should generally be disallowed.
> 
> There is an fscanf() in builtin/gc.c that uses "%s", but it is careful
> to construct a custom format string that limits the string size. Yuck.

Yes that looks pretty horrid

Phillip

> The usual thing in our code base would be to read into a buffer and
> parse from there.
> 
> -Peff


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/11] bundle-uri: avoid using undefined output of `sscanf()`
  2025-05-19  9:03           ` Phillip Wood
@ 2025-05-22  6:03             ` Jeff King
  0 siblings, 0 replies; 36+ messages in thread
From: Jeff King @ 2025-05-22  6:03 UTC (permalink / raw)
  To: phillip.wood; +Cc: git, gitgitgadget, johannes.schindelin

On Mon, May 19, 2025 at 10:03:47AM +0100, Phillip Wood wrote:

> > So yeah, we probably do need something new. IMHO we should probably have
> > repo_config_get_u64() or similar (with the appropriate underlying
> > helpers as well) as use it here. But I am happy with any solution.
> I think repo_config_get_ulong() and friends all accept a multiplier suffix.
> That makes sense for things like packSizeLimit but here we're expecting a
> bare integer. It probably doesn't really matter but as one of the code paths
> parses a file that comes from the bundle server we might want to be as
> strict as we can be.

Yeah, I agree it's a little funny to allow suffixes where we wouldn't
expect them. But I do think there may be value in just consistently
accepting them for any numeric config value.

I dunno. I am OK with the more limited solution you proposed.

-Peff

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2025-05-22  6:03 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-15 13:11 [PATCH 00/11] CodeQL-inspired fixes Johannes Schindelin via GitGitGadget
2025-05-15 13:11 ` [PATCH 01/11] commit: simplify code Johannes Schindelin via GitGitGadget
2025-05-15 19:48   ` Jeff King
2025-05-15 20:37     ` Junio C Hamano
2025-05-15 20:49       ` Jeff King
2025-05-15 13:11 ` [PATCH 02/11] fetch: carefully clear local variable's address after use Johannes Schindelin via GitGitGadget
2025-05-15 19:40   ` Jeff King
2025-05-15 13:11 ` [PATCH 03/11] commit-graph: avoid malloc'ing a local variable Johannes Schindelin via GitGitGadget
2025-05-15 19:54   ` Jeff King
2025-05-15 21:40     ` Junio C Hamano
2025-05-15 13:11 ` [PATCH 04/11] upload-pack: rename `enum` to reflect the operation Johannes Schindelin via GitGitGadget
2025-05-15 19:55   ` Jeff King
2025-05-15 13:11 ` [PATCH 05/11] has_dir_name(): make code more obvious Johannes Schindelin via GitGitGadget
2025-05-15 20:04   ` Jeff King
2025-05-15 13:11 ` [PATCH 06/11] fetch: avoid unnecessary work when there is no current branch Johannes Schindelin via GitGitGadget
2025-05-15 20:11   ` Jeff King
2025-05-15 13:11 ` [PATCH 07/11] Avoid redundant conditions Johannes Schindelin via GitGitGadget
2025-05-15 20:13   ` Jeff King
2025-05-15 13:11 ` [PATCH 08/11] trace2: avoid "futile conditional" Johannes Schindelin via GitGitGadget
2025-05-15 20:16   ` Jeff King
2025-05-15 13:11 ` [PATCH 09/11] commit-graph: avoid using stale stack addresses Johannes Schindelin via GitGitGadget
2025-05-15 20:19   ` Jeff King
2025-05-15 13:11 ` [PATCH 10/11] bundle-uri: avoid using undefined output of `sscanf()` Johannes Schindelin via GitGitGadget
2025-05-15 19:21   ` Junio C Hamano
2025-05-15 20:25   ` Jeff King
2025-05-16 10:11     ` Phillip Wood
2025-05-16 13:40       ` Phillip Wood
2025-05-16 15:42         ` Jeff King
2025-05-19  9:03           ` Phillip Wood
2025-05-22  6:03             ` Jeff King
2025-05-15 13:11 ` [PATCH 11/11] sequencer: stop pretending that an assignment is a condition Johannes Schindelin via GitGitGadget
2025-05-15 18:51   ` Junio C Hamano
2025-05-15 20:26   ` Jeff King
2025-05-16 10:13   ` Phillip Wood
2025-05-15 20:26 ` [PATCH 00/11] CodeQL-inspired fixes Jeff King
2025-05-15 20:58   ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).