* [PATCH] hashmap: ensure hashmaps are reusable after hashmap_clear()
@ 2025-04-29 15:47 Elijah Newren via GitGitGadget
2025-04-29 16:55 ` Junio C Hamano
0 siblings, 1 reply; 2+ messages in thread
From: Elijah Newren via GitGitGadget @ 2025-04-29 15:47 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Elijah Newren
From: Elijah Newren <newren@gmail.com>
In the series merged at bf0a430f70b5 (Merge branch 'en/strmap',
2020-11-21), strmap was built on top of hashmap and hashmap was extended
in a few ways to support strmap and be more generally useful. One of
the extensions was that hashmap_partial_clear() was introduced to allow
reuse of the hashmap without freeing the table. Peff believed that it
also made sense to introduce a hashmap_clear() which freed everything
while allowing reuse.
I added hashmap_clear(), but in doing so, overlooked the fact that for
a hashmap to be reusable, it needs a defined cmpfn and data (the
HASHMAP_INIT macro requires these fields as parameters, for example).
So, if we want the hashmap to be reusable, we shouldn't zero out those
fields. We probably also shouldn't zero out do_count_items. (We could
zero out grow_at and shrink_at, but whether we zero those or not is
irrelevant as they'll be automatically updated whenever a new entry is
inserted.)
Since clearing is associated with freeing map->table, and the only thing
required for consistency after freeing map->table is zeroing tablesize
and private_size, let's only zero those fields out.
Signed-off-by: Elijah Newren <newren@gmail.com>
---
hashmap: ensure hashmaps are reusable after hashmap_clear()
Ran into a NULL pointer dereference of cmpfn a few months ago when
trying to reuse one of {strmap, strset, strintmap} (don't remember which
one) after calling the relevant ${TYPE}_clear() variant, and tracked the
NULL pointer back to hashmap_clear(). Turned out to not be relevant to
those patches because I ended up not needing to reuse the map after all,
but I kept a note to myself to send in a fix.
I was surprised this wasn't a bug we were already hitting somewhere, but
I looked through the codebase and it appears that the only time we
attempt to reuse a hashmap after clearing is when we specifically use
hashmap_partial_clear(). So, this is just a latent bug waiting as a trap
for someone.
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1911%2Fnewren%2Ffix-hashmap-clear-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1911/newren/fix-hashmap-clear-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1911
hashmap.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/hashmap.c b/hashmap.c
index ee45ef00852..a711377853f 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -205,8 +205,9 @@ void hashmap_clear_(struct hashmap *map, ssize_t entry_offset)
return;
if (entry_offset >= 0) /* called by hashmap_clear_and_free */
free_individual_entries(map, entry_offset);
- free(map->table);
- memset(map, 0, sizeof(*map));
+ FREE_AND_NULL(map->table);
+ map->tablesize = 0;
+ map->private_size = 0;
}
struct hashmap_entry *hashmap_get(const struct hashmap *map,
base-commit: f65182a99e545d2f2bc22e6c1c2da192133b16a3
--
gitgitgadget
^ permalink raw reply related [flat|nested] 2+ messages in thread* Re: [PATCH] hashmap: ensure hashmaps are reusable after hashmap_clear()
2025-04-29 15:47 [PATCH] hashmap: ensure hashmaps are reusable after hashmap_clear() Elijah Newren via GitGitGadget
@ 2025-04-29 16:55 ` Junio C Hamano
0 siblings, 0 replies; 2+ messages in thread
From: Junio C Hamano @ 2025-04-29 16:55 UTC (permalink / raw)
To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Elijah Newren <newren@gmail.com>
>
> In the series merged at bf0a430f70b5 (Merge branch 'en/strmap',
> 2020-11-21), strmap was built on top of hashmap and hashmap was extended
> in a few ways to support strmap and be more generally useful. One of
> the extensions was that hashmap_partial_clear() was introduced to allow
> reuse of the hashmap without freeing the table. Peff believed that it
> also made sense to introduce a hashmap_clear() which freed everything
> while allowing reuse.
>
> I added hashmap_clear(), but in doing so, overlooked the fact that for
> a hashmap to be reusable, it needs a defined cmpfn and data (the
> HASHMAP_INIT macro requires these fields as parameters, for example).
> So, if we want the hashmap to be reusable, we shouldn't zero out those
> fields. We probably also shouldn't zero out do_count_items. (We could
> zero out grow_at and shrink_at, but whether we zero those or not is
> irrelevant as they'll be automatically updated whenever a new entry is
> inserted.)
>
> Since clearing is associated with freeing map->table, and the only thing
> required for consistency after freeing map->table is zeroing tablesize
> and private_size, let's only zero those fields out.
Makes sense. Thanks for finding and fixing.
I do not think we want to patch all the way down to Git 2.30, ...
> diff --git a/hashmap.c b/hashmap.c
> index ee45ef00852..a711377853f 100644
> --- a/hashmap.c
> +++ b/hashmap.c
> @@ -205,8 +205,9 @@ void hashmap_clear_(struct hashmap *map, ssize_t entry_offset)
> return;
> if (entry_offset >= 0) /* called by hashmap_clear_and_free */
> free_individual_entries(map, entry_offset);
> - free(map->table);
> - memset(map, 0, sizeof(*map));
> + FREE_AND_NULL(map->table);
> + map->tablesize = 0;
> + map->private_size = 0;
> }
>
> struct hashmap_entry *hashmap_get(const struct hashmap *map,
>
> base-commit: f65182a99e545d2f2bc22e6c1c2da192133b16a3
... but this part of the code has been fairly quiet and the patch
applies very cleanly. So I'll apply on top of bf0a430f7 and merge
the result---anybody maintaining Git for their LTS distro can then
merge it to their favorite ancient maintenance track ;-)
Thanks.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-04-29 16:55 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-29 15:47 [PATCH] hashmap: ensure hashmaps are reusable after hashmap_clear() Elijah Newren via GitGitGadget
2025-04-29 16:55 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).