git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] builtin/difftool: intialize some hashmap variables
@ 2024-11-11 16:21 Simon Marchi
  2024-11-11 20:54 ` Jeff King
  0 siblings, 1 reply; 5+ messages in thread
From: Simon Marchi @ 2024-11-11 16:21 UTC (permalink / raw)
  To: git
  Cc: Simon Marchi, Junio C Hamano, René Scharfe, Taylor Blau,
	Patrick Steinhardt

When running a dir-diff command that produces no diff, variables
`wt_modified` and `tmp_modified` are used while uninitialized, causing:

    $ /home/smarchi/src/git/git-difftool --dir-diff master
    free(): invalid pointer
    [1]    334004 IOT instruction (core dumped)  /home/smarchi/src/git/git-difftool --dir-diff master
    $ valgrind --track-origins=yes /home/smarchi/src/git/git-difftool --dir-diff master
    ...
    Invalid free() / delete / delete[] / realloc()
       at 0x48478EF: free (vg_replace_malloc.c:989)
       by 0x422CAC: hashmap_clear_ (hashmap.c:208)
       by 0x283830: run_dir_diff (difftool.c:667)
       by 0x284103: cmd_difftool (difftool.c:801)
       by 0x238E0F: run_builtin (git.c:484)
       by 0x2392B9: handle_builtin (git.c:750)
       by 0x2399BC: cmd_main (git.c:921)
       by 0x356FEF: main (common-main.c:64)
     Address 0x1ffefff180 is on thread 1's stack
     in frame #2, created by run_dir_diff (difftool.c:358)
    ...

If taking any `goto finish` path before these variables are initialized,
`hashmap_clear_and_free()` operates on uninitialized data, sometimes
causing a crash.

Fix it by zero-initializing these variables, making
`hashmap_clear_and_free()` a no-op in that case.

Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Cc: Junio C Hamano <gitster@pobox.com>
Cc: René Scharfe <l.s.r@web.de>
Cc: Taylor Blau <me@ttaylorr.com>
Cc: Patrick Steinhardt <ps@pks.im>
---
 builtin/difftool.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/difftool.c b/builtin/difftool.c
index ca1b0890659b..b902f5d2ae17 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -376,7 +376,8 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
 	struct checkout lstate, rstate;
 	int err = 0;
 	struct child_process cmd = CHILD_PROCESS_INIT;
-	struct hashmap wt_modified, tmp_modified;
+	struct hashmap wt_modified = {0};
+	struct hashmap tmp_modified = {0};
 	int indices_loaded = 0;
 
 	workdir = repo_get_work_tree(the_repository);

base-commit: b31fb630c0fc6869a33ed717163e8a1210460d94
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] builtin/difftool: intialize some hashmap variables
  2024-11-11 16:21 [PATCH] builtin/difftool: intialize some hashmap variables Simon Marchi
@ 2024-11-11 20:54 ` Jeff King
  2024-11-11 21:22   ` Simon Marchi
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff King @ 2024-11-11 20:54 UTC (permalink / raw)
  To: Simon Marchi
  Cc: git, Junio C Hamano, René Scharfe, Taylor Blau,
	Patrick Steinhardt

On Mon, Nov 11, 2024 at 11:21:44AM -0500, Simon Marchi wrote:

> When running a dir-diff command that produces no diff, variables
> `wt_modified` and `tmp_modified` are used while uninitialized, causing:
> 
>     $ /home/smarchi/src/git/git-difftool --dir-diff master
>     free(): invalid pointer
>     [1]    334004 IOT instruction (core dumped)  /home/smarchi/src/git/git-difftool --dir-diff master
>     $ valgrind --track-origins=yes /home/smarchi/src/git/git-difftool --dir-diff master
>     ...
>     Invalid free() / delete / delete[] / realloc()
>        at 0x48478EF: free (vg_replace_malloc.c:989)
>        by 0x422CAC: hashmap_clear_ (hashmap.c:208)
>        by 0x283830: run_dir_diff (difftool.c:667)
>        by 0x284103: cmd_difftool (difftool.c:801)
>        by 0x238E0F: run_builtin (git.c:484)
>        by 0x2392B9: handle_builtin (git.c:750)
>        by 0x2399BC: cmd_main (git.c:921)
>        by 0x356FEF: main (common-main.c:64)
>      Address 0x1ffefff180 is on thread 1's stack
>      in frame #2, created by run_dir_diff (difftool.c:358)
>     ...
> 
> If taking any `goto finish` path before these variables are initialized,
> `hashmap_clear_and_free()` operates on uninitialized data, sometimes
> causing a crash.
> 
> Fix it by zero-initializing these variables, making
> `hashmap_clear_and_free()` a no-op in that case.

The fix makes sense. I wondered if this had been broken for a long time,
and if so, how we managed not to notice it. But it looks like it is a
recent problem, via 7f795a1715 (builtin/difftool: plug several trivial
memory leaks, 2024-09-26).

> diff --git a/builtin/difftool.c b/builtin/difftool.c
> index ca1b0890659b..b902f5d2ae17 100644
> --- a/builtin/difftool.c
> +++ b/builtin/difftool.c
> @@ -376,7 +376,8 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
>  	struct checkout lstate, rstate;
>  	int err = 0;
>  	struct child_process cmd = CHILD_PROCESS_INIT;
> -	struct hashmap wt_modified, tmp_modified;
> +	struct hashmap wt_modified = {0};
> +	struct hashmap tmp_modified = {0};
>  	int indices_loaded = 0;

That commit likewise frees some other local variables, but they are all
properly initialized. So touching these two are sufficient.

I'm not sure if zero-initialization is being a little too familiar with
the hashmap internals, though. The other variables use HASHMAP_INIT().
Should we do the same here, like this:

diff --git a/builtin/difftool.c b/builtin/difftool.c
index 1a68ab6699..86995390c7 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -374,7 +374,8 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
 	struct checkout lstate, rstate;
 	int err = 0;
 	struct child_process cmd = CHILD_PROCESS_INIT;
-	struct hashmap wt_modified, tmp_modified;
+	struct hashmap wt_modified = HASHMAP_INIT(path_entry_cmp, NULL);
+	struct hashmap tmp_modified = HASHMAP_INIT(path_entry_cmp, NULL);
 	int indices_loaded = 0;
 
 	workdir = get_git_work_tree();
@@ -594,14 +595,7 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
 	 * should be copied back to the working tree.
 	 * Do not copy back files when symlinks are used and the
 	 * external tool did not replace the original link with a file.
-	 *
-	 * These hashes are loaded lazily since they aren't needed
-	 * in the common case of --symlinks and the difftool updating
-	 * files through the symlink.
 	 */
-	hashmap_init(&wt_modified, path_entry_cmp, NULL, wtindex.cache_nr);
-	hashmap_init(&tmp_modified, path_entry_cmp, NULL, wtindex.cache_nr);
-
 	for (i = 0; i < wtindex.cache_nr; i++) {
 		struct hashmap_entry dummy;
 		const char *name = wtindex.cache[i]->name;

That loses the initial table growth that the original had, but I think
letting it grow in the usual way is fine here.

-Peff

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] builtin/difftool: intialize some hashmap variables
  2024-11-11 20:54 ` Jeff King
@ 2024-11-11 21:22   ` Simon Marchi
  2024-11-11 22:09     ` Jeff King
  0 siblings, 1 reply; 5+ messages in thread
From: Simon Marchi @ 2024-11-11 21:22 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, René Scharfe, Taylor Blau,
	Patrick Steinhardt

On 11/11/24 3:54 PM, Jeff King wrote:
> On Mon, Nov 11, 2024 at 11:21:44AM -0500, Simon Marchi wrote:
> 
>> When running a dir-diff command that produces no diff, variables
>> `wt_modified` and `tmp_modified` are used while uninitialized, causing:
>>
>>     $ /home/smarchi/src/git/git-difftool --dir-diff master
>>     free(): invalid pointer
>>     [1]    334004 IOT instruction (core dumped)  /home/smarchi/src/git/git-difftool --dir-diff master
>>     $ valgrind --track-origins=yes /home/smarchi/src/git/git-difftool --dir-diff master
>>     ...
>>     Invalid free() / delete / delete[] / realloc()
>>        at 0x48478EF: free (vg_replace_malloc.c:989)
>>        by 0x422CAC: hashmap_clear_ (hashmap.c:208)
>>        by 0x283830: run_dir_diff (difftool.c:667)
>>        by 0x284103: cmd_difftool (difftool.c:801)
>>        by 0x238E0F: run_builtin (git.c:484)
>>        by 0x2392B9: handle_builtin (git.c:750)
>>        by 0x2399BC: cmd_main (git.c:921)
>>        by 0x356FEF: main (common-main.c:64)
>>      Address 0x1ffefff180 is on thread 1's stack
>>      in frame #2, created by run_dir_diff (difftool.c:358)
>>     ...
>>
>> If taking any `goto finish` path before these variables are initialized,
>> `hashmap_clear_and_free()` operates on uninitialized data, sometimes
>> causing a crash.
>>
>> Fix it by zero-initializing these variables, making
>> `hashmap_clear_and_free()` a no-op in that case.
> 
> The fix makes sense. I wondered if this had been broken for a long time,
> and if so, how we managed not to notice it. But it looks like it is a
> recent problem, via 7f795a1715 (builtin/difftool: plug several trivial
> memory leaks, 2024-09-26).

Are there tests for this specific scenario (no diff between the two
versions)?

> 
>> diff --git a/builtin/difftool.c b/builtin/difftool.c
>> index ca1b0890659b..b902f5d2ae17 100644
>> --- a/builtin/difftool.c
>> +++ b/builtin/difftool.c
>> @@ -376,7 +376,8 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
>>  	struct checkout lstate, rstate;
>>  	int err = 0;
>>  	struct child_process cmd = CHILD_PROCESS_INIT;
>> -	struct hashmap wt_modified, tmp_modified;
>> +	struct hashmap wt_modified = {0};
>> +	struct hashmap tmp_modified = {0};
>>  	int indices_loaded = 0;
> 
> That commit likewise frees some other local variables, but they are all
> properly initialized. So touching these two are sufficient.

Indeed, I checked the other variables, they look fine.

> I'm not sure if zero-initialization is being a little too familiar with
> the hashmap internals, though.

Up to you.  In other C projects I worked on, it was typical that
zero-ing an object would get it in a valid initial empty state, properly
handled by the destruction functions.  This way, a big struct containing
other objects could be initialized simply by zero-ing it, without having
to initialize each component explicitly.

> The other variables use HASHMAP_INIT().
> Should we do the same here, like this:
> 
> diff --git a/builtin/difftool.c b/builtin/difftool.c
> index 1a68ab6699..86995390c7 100644
> --- a/builtin/difftool.c
> +++ b/builtin/difftool.c
> @@ -374,7 +374,8 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
>  	struct checkout lstate, rstate;
>  	int err = 0;
>  	struct child_process cmd = CHILD_PROCESS_INIT;
> -	struct hashmap wt_modified, tmp_modified;
> +	struct hashmap wt_modified = HASHMAP_INIT(path_entry_cmp, NULL);
> +	struct hashmap tmp_modified = HASHMAP_INIT(path_entry_cmp, NULL);
>  	int indices_loaded = 0;
>  
>  	workdir = get_git_work_tree();
> @@ -594,14 +595,7 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
>  	 * should be copied back to the working tree.
>  	 * Do not copy back files when symlinks are used and the
>  	 * external tool did not replace the original link with a file.
> -	 *
> -	 * These hashes are loaded lazily since they aren't needed
> -	 * in the common case of --symlinks and the difftool updating
> -	 * files through the symlink.
>  	 */
> -	hashmap_init(&wt_modified, path_entry_cmp, NULL, wtindex.cache_nr);
> -	hashmap_init(&tmp_modified, path_entry_cmp, NULL, wtindex.cache_nr);
> -
>  	for (i = 0; i < wtindex.cache_nr; i++) {
>  		struct hashmap_entry dummy;
>  		const char *name = wtindex.cache[i]->name;
> 
> That loses the initial table growth that the original had, but I think
> letting it grow in the usual way is fine here.

I thought about it, but was indeed afraid to be told that this removes
an optimization.  If you think it's fine, I'm happy with it too.

Please let me know if you want a v2 or if you are just going to merge an
updated version of this patch.

Thanks,

Simon

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] builtin/difftool: intialize some hashmap variables
  2024-11-11 21:22   ` Simon Marchi
@ 2024-11-11 22:09     ` Jeff King
  2024-11-11 23:51       ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff King @ 2024-11-11 22:09 UTC (permalink / raw)
  To: Simon Marchi
  Cc: git, Junio C Hamano, René Scharfe, Taylor Blau,
	Patrick Steinhardt

On Mon, Nov 11, 2024 at 04:22:26PM -0500, Simon Marchi wrote:

> > The fix makes sense. I wondered if this had been broken for a long time,
> > and if so, how we managed not to notice it. But it looks like it is a
> > recent problem, via 7f795a1715 (builtin/difftool: plug several trivial
> > memory leaks, 2024-09-26).
> 
> Are there tests for this specific scenario (no diff between the two
> versions)?

Presumably not, or they'd have been segfaulting. ;) So it may be worth
adding some to t7800.

> > I'm not sure if zero-initialization is being a little too familiar with
> > the hashmap internals, though.
> 
> Up to you.  In other C projects I worked on, it was typical that
> zero-ing an object would get it in a valid initial empty state, properly
> handled by the destruction functions.  This way, a big struct containing
> other objects could be initialized simply by zero-ing it, without having
> to initialize each component explicitly.

We often follow that rule in git.git, too, but have been increasingly
moving to macro initializers. They make it easier if we ever need a
non-zero state as an invariant (e.g., STRBUF_INIT always points to a
dummy string). In this case, it does look like HASHMAP_INIT sets
do_count_items.

It's not a bug in your program since we still hashmap_init() over the
zero'd state, but it does feel a bit weird to me.

> Please let me know if you want a v2 or if you are just going to merge an
> updated version of this patch.

That's up to the maintainer. IMHO it's worth using HASHMAP_INIT, though,
and perhaps adding a test.

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] builtin/difftool: intialize some hashmap variables
  2024-11-11 22:09     ` Jeff King
@ 2024-11-11 23:51       ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2024-11-11 23:51 UTC (permalink / raw)
  To: Jeff King
  Cc: Simon Marchi, git, René Scharfe, Taylor Blau,
	Patrick Steinhardt

Jeff King <peff@peff.net> writes:

>> Please let me know if you want a v2 or if you are just going to merge an
>> updated version of this patch.
>
> That's up to the maintainer. IMHO it's worth using HASHMAP_INIT, though,
> and perhaps adding a test.

Yup, please do.

Thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-11-11 23:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-11 16:21 [PATCH] builtin/difftool: intialize some hashmap variables Simon Marchi
2024-11-11 20:54 ` Jeff King
2024-11-11 21:22   ` Simon Marchi
2024-11-11 22:09     ` Jeff King
2024-11-11 23:51       ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).