* Re: [PATCH] last-modified: fix bug caused by inproper initialized memory
2025-11-28 16:37 [PATCH] last-modified: fix bug caused by inproper initialized memory Toon Claes
@ 2025-11-28 20:55 ` Jeff King
2025-11-28 22:20 ` Anders Kaseorg
2025-12-08 11:47 ` Toon Claes
2025-11-29 2:01 ` Junio C Hamano
2025-12-08 11:46 ` [PATCH v2] last-modified: fix use of uninitialized memory Toon Claes
2 siblings, 2 replies; 14+ messages in thread
From: Jeff King @ 2025-11-28 20:55 UTC (permalink / raw)
To: Toon Claes; +Cc: git, Karthik Nayak, Anders Kaseorg
On Fri, Nov 28, 2025 at 05:37:13PM +0100, Toon Claes wrote:
> git-last-modified(1) uses a scratch bitmap to keep track of paths that
> have been changed between commits. To avoid reallocating a bitmap on
> each call of process_parent(), the scratch bitmap is kept and reused.
> Although, it seems an incorrect length is passed to memset(3).
>
> `struct bitmap` uses `eword_t` to for internal storage. This type is
> typedef'd to uint64_t. To fully zero the memory used by the bitmap,
> multiply the length (saved in `struct bitmap::word_alloc`) by the size
> of `eword_t`.
Good catch! When I was looking for casts that could be the culprit, I
didn't think about the implicit one we get through the void pointer of
memset().
> diff --git a/builtin/last-modified.c b/builtin/last-modified.c
> index b0ecbdc540..cc5fd2e795 100644
> --- a/builtin/last-modified.c
> +++ b/builtin/last-modified.c
> @@ -327,7 +327,7 @@ static void process_parent(struct last_modified *lm,
> if (!(parent->object.flags & PARENT1))
> active_paths_free(lm, parent);
>
> - memset(lm->scratch->words, 0x0, lm->scratch->word_alloc);
> + memset(lm->scratch->words, 0x0, lm->scratch->word_alloc * sizeof(eword_t));
> diff_queue_clear(&diff_queued_diff);
> }
I think this patch makes sense as the most obvious and immediate fix.
But thinking on how we might have avoided this bug:
- We have macros like ALLOC_ARRAY() and COPY_ARRAY() that
automatically multiply the array length by the size of each element
(by looking at the type of the array). We could in theory have a
helper like:
MEMSET_ARRAY(lm->scratch->words, 0x0, lm->scratch->word_alloc);
that would have made this hard to get wrong. But that's actually a
bit of a funny interface, because memset is inherently byte-oriented
under the hood. So we are not setting each element to 0x0, but
rather each byte. For a value of 0x0, that is the same thing. But if
you chose, say "0x1", it is not.
So it would probably have to be limited to something like:
CLEAR_ARRAY(lm->scratch->words, lm->scratch->word_alloc);
which I'd guess would cover most memset cases. But this is getting
specific enough that maybe the macro is making things more confusing
rather than less.
- It's a little gross that we are reaching inside a "struct bitmap" in
the first place, as it's a mostly opaque type. And the code here has
to know that the alloc field is sized in eword_t's, not in bytes.
It feels like there should be a bitmap_clear() function. Its
implementation would also have to remember to multiply by
sizeof(eword_t), but at least it would be encapsulated.
I doubt the leaky abstraction matters that much, though. It seems
unlikely that we would change it (and if we did, we'd perhaps give
the field a new name).
In the same vein, probably using "sizeof(lm->scratch->words)" is
better than "sizeof(eword_t)". But again, I find it an unlikely
detail for us to catch under the hood.
-Peff
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] last-modified: fix bug caused by inproper initialized memory
2025-11-28 20:55 ` Jeff King
@ 2025-11-28 22:20 ` Anders Kaseorg
2025-11-29 10:50 ` Jeff King
2025-12-08 11:47 ` Toon Claes
1 sibling, 1 reply; 14+ messages in thread
From: Anders Kaseorg @ 2025-11-28 22:20 UTC (permalink / raw)
To: Jeff King, Toon Claes; +Cc: git, Karthik Nayak
On 11/28/25 12:55, Jeff King wrote:
> In the same vein, probably using "sizeof(lm->scratch->words)" is
> better than "sizeof(eword_t)". But again, I find it an unlikely
> detail for us to catch under the hood.
As words is a pointer, you must have meant sizeof *lm->scratch->words or
sizeof lm->scratch->words[0].
Anders
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] last-modified: fix bug caused by inproper initialized memory
2025-11-28 22:20 ` Anders Kaseorg
@ 2025-11-29 10:50 ` Jeff King
0 siblings, 0 replies; 14+ messages in thread
From: Jeff King @ 2025-11-29 10:50 UTC (permalink / raw)
To: Anders Kaseorg; +Cc: Toon Claes, git, Karthik Nayak
On Fri, Nov 28, 2025 at 02:20:22PM -0800, Anders Kaseorg wrote:
> On 11/28/25 12:55, Jeff King wrote:
> > In the same vein, probably using "sizeof(lm->scratch->words)" is better
> > than "sizeof(eword_t)". But again, I find it an unlikely detail for us
> > to catch under the hood.
>
> As words is a pointer, you must have meant sizeof *lm->scratch->words or
> sizeof lm->scratch->words[0].
Whoops, yes. I prefer sizeof(*var) over sizeof(type) because it tracks
changes to the type of "var" automatically. But the opportunity to
forget the "*" is perhaps a point against it. :)
-Peff
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] last-modified: fix bug caused by inproper initialized memory
2025-11-28 20:55 ` Jeff King
2025-11-28 22:20 ` Anders Kaseorg
@ 2025-12-08 11:47 ` Toon Claes
2025-12-08 20:15 ` Jeff King
1 sibling, 1 reply; 14+ messages in thread
From: Toon Claes @ 2025-12-08 11:47 UTC (permalink / raw)
To: Jeff King; +Cc: git, Karthik Nayak, Anders Kaseorg
Jeff King <peff@peff.net> writes:
> I think this patch makes sense as the most obvious and immediate fix.
> But thinking on how we might have avoided this bug:
>
> - We have macros like ALLOC_ARRAY() and COPY_ARRAY() that
> automatically multiply the array length by the size of each element
> (by looking at the type of the array). We could in theory have a
> helper like:
>
> MEMSET_ARRAY(lm->scratch->words, 0x0, lm->scratch->word_alloc);
>
> that would have made this hard to get wrong. But that's actually a
> bit of a funny interface, because memset is inherently byte-oriented
> under the hood. So we are not setting each element to 0x0, but
> rather each byte. For a value of 0x0, that is the same thing. But if
> you chose, say "0x1", it is not.
>
> So it would probably have to be limited to something like:
>
> CLEAR_ARRAY(lm->scratch->words, lm->scratch->word_alloc);
>
> which I'd guess would cover most memset cases. But this is getting
> specific enough that maybe the macro is making things more confusing
> rather than less.
I've submitted a v2 that introduces MEMZERO_ARRAY(). I'm curious what
the responses on this proposal are?
--
Cheers,
Toon
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] last-modified: fix bug caused by inproper initialized memory
2025-12-08 11:47 ` Toon Claes
@ 2025-12-08 20:15 ` Jeff King
2025-12-08 22:42 ` Junio C Hamano
0 siblings, 1 reply; 14+ messages in thread
From: Jeff King @ 2025-12-08 20:15 UTC (permalink / raw)
To: Toon Claes; +Cc: Junio C Hamano, git, Karthik Nayak, Anders Kaseorg
On Mon, Dec 08, 2025 at 12:47:20PM +0100, Toon Claes wrote:
> > So it would probably have to be limited to something like:
> >
> > CLEAR_ARRAY(lm->scratch->words, lm->scratch->word_alloc);
> >
> > which I'd guess would cover most memset cases. But this is getting
> > specific enough that maybe the macro is making things more confusing
> > rather than less.
>
> I've submitted a v2 that introduces MEMZERO_ARRAY(). I'm curious what
> the responses on this proposal are?
I think it looks fine, though as Junio noted, the original is already in
next so it would have to be a patch on top.
Is such a macro worth it? I guess we'd be able to see if there are other
possible sites with something like:
git grep 'memset(.*0,.*\* \?sizeof'
that's looking for memsets of "0" that also multiply by sizeof. Looks
like there are a few:
add-patch.c: memset(hunk + 1, 0, (splittable_into - 1) * sizeof(*hunk));
builtin/last-modified.c: memset(lm->scratch->words, 0x0, lm->scratch->word_alloc * sizeof(eword_t));
compat/simple-ipc/ipc-win32.c: memset(ea, 0, NR_EA * sizeof(EXPLICIT_ACCESS));
diff-delta.c: memset(hash, 0, hsize * sizeof(*hash));
hashmap.c: memset(map->table, 0, map->tablesize * sizeof(struct hashmap_entry *));
pack-revindex.c: memset(pos, 0, BUCKETS * sizeof(*pos));
The first one is an oddball, but the other five could use it. So if we
were to do a patch adding MEMZERO_ARRAY(), it would probably make sense
to convert those spots. I'd be OK either way.
-Peff
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] last-modified: fix bug caused by inproper initialized memory
2025-12-08 20:15 ` Jeff King
@ 2025-12-08 22:42 ` Junio C Hamano
0 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2025-12-08 22:42 UTC (permalink / raw)
To: Jeff King; +Cc: Toon Claes, git, Karthik Nayak, Anders Kaseorg
Jeff King <peff@peff.net> writes:
> git grep 'memset(.*0,.*\* \?sizeof'
>
> that's looking for memsets of "0" that also multiply by sizeof. Looks
> like there are a few:
>
> add-patch.c: memset(hunk + 1, 0, (splittable_into - 1) * sizeof(*hunk));
> builtin/last-modified.c: memset(lm->scratch->words, 0x0, lm->scratch->word_alloc * sizeof(eword_t));
> compat/simple-ipc/ipc-win32.c: memset(ea, 0, NR_EA * sizeof(EXPLICIT_ACCESS));
> diff-delta.c: memset(hash, 0, hsize * sizeof(*hash));
> hashmap.c: memset(map->table, 0, map->tablesize * sizeof(struct hashmap_entry *));
> pack-revindex.c: memset(pos, 0, BUCKETS * sizeof(*pos));
>
> The first one is an oddball, but the other five could use it. So if we
> were to do a patch adding MEMZERO_ARRAY(), it would probably make sense
> to convert those spots. I'd be OK either way.
Thanks for making an excellent suggestion while I was away from the
keyboard ;-)
Between MEMZERO_ARRAY() and CLEAR_ARRAY(), I am on the fence.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] last-modified: fix bug caused by inproper initialized memory
2025-11-28 16:37 [PATCH] last-modified: fix bug caused by inproper initialized memory Toon Claes
2025-11-28 20:55 ` Jeff King
@ 2025-11-29 2:01 ` Junio C Hamano
2025-11-29 2:11 ` Junio C Hamano
2025-12-08 11:46 ` [PATCH v2] last-modified: fix use of uninitialized memory Toon Claes
2 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2025-11-29 2:01 UTC (permalink / raw)
To: Toon Claes; +Cc: git, Jeff King, Karthik Nayak, Anders Kaseorg
Toon Claes <toon@iotcl.com> writes:
> git-last-modified(1) uses a scratch bitmap to keep track of paths that
> have been changed between commits. To avoid reallocating a bitmap on
> each call of process_parent(), the scratch bitmap is kept and reused.
> Although, it seems an incorrect length is passed to memset(3).
>
> `struct bitmap` uses `eword_t` to for internal storage. This type is
> typedef'd to uint64_t. To fully zero the memory used by the bitmap,
> multiply the length (saved in `struct bitmap::word_alloc`) by the size
> of `eword_t`.
>
> Reported-by: Anders Kaseorg <andersk@mit.edu>
> Helped-by: Jeff King <peff@peff.net>
> Signed-off-by: Toon Claes <toon@iotcl.com>
> ---
> It was reported [1] the tests in t8020 fail on s390x. After some
> research, it seems it was related to s390x being big-endian. Well,
> actually, not really. Using big-endian simply uncovered the problem in
> test.
>
> [1]: https://lore.kernel.org/git/4dc4c8cd-c0cc-4784-8fcf-defa3a051087@mit.edu/
> ---
> builtin/last-modified.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
This dates back to v2.52.0~4 and is clearly a maint material.
Thanks for finding and fixing.
>
> diff --git a/builtin/last-modified.c b/builtin/last-modified.c
> index b0ecbdc540..cc5fd2e795 100644
> --- a/builtin/last-modified.c
> +++ b/builtin/last-modified.c
> @@ -327,7 +327,7 @@ static void process_parent(struct last_modified *lm,
> if (!(parent->object.flags & PARENT1))
> active_paths_free(lm, parent);
>
> - memset(lm->scratch->words, 0x0, lm->scratch->word_alloc);
> + memset(lm->scratch->words, 0x0, lm->scratch->word_alloc * sizeof(eword_t));
> diff_queue_clear(&diff_queued_diff);
> }
>
>
> ---
> base-commit: 6ab38b7e9cc7adafc304f3204616a4debd49c6e9
> change-id: 20251126-toon-big-endian-ci-fe62bb361974
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] last-modified: fix bug caused by inproper initialized memory
2025-11-29 2:01 ` Junio C Hamano
@ 2025-11-29 2:11 ` Junio C Hamano
2025-11-29 9:38 ` Toon Claes
0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2025-11-29 2:11 UTC (permalink / raw)
To: Toon Claes; +Cc: git, Jeff King, Karthik Nayak, Anders Kaseorg
Junio C Hamano <gitster@pobox.com> writes:
> This dates back to v2.52.0~4 and is clearly a maint material.
>
> Thanks for finding and fixing.
> Subject: Re: [PATCH] last-modified: fix bug caused by inproper initialized memory
Let's retitle, as inproper is not a word. Is
Subject: [PATCH] last-modified: fix use of uninitialized memory
good enough?
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] last-modified: fix bug caused by inproper initialized memory
2025-11-29 2:11 ` Junio C Hamano
@ 2025-11-29 9:38 ` Toon Claes
0 siblings, 0 replies; 14+ messages in thread
From: Toon Claes @ 2025-11-29 9:38 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Jeff King, Karthik Nayak, Anders Kaseorg
Junio C Hamano <gitster@pobox.com> writes:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> This dates back to v2.52.0~4 and is clearly a maint material.
Makes sense. I appreciate it.
>> Thanks for finding and fixing.
Yes, I'm happy Anders reported this, although I didn't expect it to have
impact on all platforms. It would have been a nasty bug to hunt down if
users would complain "the results are incorrect".
>> Subject: Re: [PATCH] last-modified: fix bug caused by inproper initialized memory
>
> Let's retitle, as inproper is not a word. Is
I wasn't sure about that. But my spell checker didn't pick it up, so I
rolled with it.
> Subject: [PATCH] last-modified: fix use of uninitialized memory
>
> good enough?
Absolutely.
--
Cheers,
Toon
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2] last-modified: fix use of uninitialized memory
2025-11-28 16:37 [PATCH] last-modified: fix bug caused by inproper initialized memory Toon Claes
2025-11-28 20:55 ` Jeff King
2025-11-29 2:01 ` Junio C Hamano
@ 2025-12-08 11:46 ` Toon Claes
2025-12-08 13:26 ` Junio C Hamano
2 siblings, 1 reply; 14+ messages in thread
From: Toon Claes @ 2025-12-08 11:46 UTC (permalink / raw)
To: git; +Cc: Jeff King, Karthik Nayak, Anders Kaseorg, Toon Claes
git-last-modified(1) uses a scratch bitmap to keep track of paths that
have been changed between commits. To avoid reallocating a bitmap on
each call of process_parent(), the scratch bitmap is kept and reused.
Although, between loops, the memory allocated for the 'scratch' bitmap
isn't correctly wiped.
`struct bitmap` uses `eword_t` to for internal storage. This type is
typedef'd to uint64_t. To fully zero the memory used by the bitmap, the
length (saved in `struct bitmap::word_alloc`) should be multiplied by
the size of a single item. To simplify zeroing an array, a macro
MEMZERO_ARRAY() is defined and used.
Reported-by: Anders Kaseorg <andersk@mit.edu>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Toon Claes <toon@iotcl.com>
---
It was reported [1] the tests in t8020 fail on s390x. After some
research, it seems it was related to s390x being big-endian. Well,
actually, not really. Using big-endian simply uncovered the problem in
test.
[1]: https://lore.kernel.org/git/4dc4c8cd-c0cc-4784-8fcf-defa3a051087@mit.edu/
---
Changes in v2:
- Defined and used MEMZERO_ARRAY() macro.
- Fixed up title which used unexisting word
- Link to v1: https://lore.kernel.org/r/20251128-toon-big-endian-ci-v1-1-80da0f629c1e@iotcl.com
---
builtin/last-modified.c | 2 +-
git-compat-util.h | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/builtin/last-modified.c b/builtin/last-modified.c
index b0ecbdc540..ac5387e861 100644
--- a/builtin/last-modified.c
+++ b/builtin/last-modified.c
@@ -327,7 +327,7 @@ static void process_parent(struct last_modified *lm,
if (!(parent->object.flags & PARENT1))
active_paths_free(lm, parent);
- memset(lm->scratch->words, 0x0, lm->scratch->word_alloc);
+ MEMZERO_ARRAY(lm->scratch->words, lm->scratch->word_alloc);
diff_queue_clear(&diff_queued_diff);
}
diff --git a/git-compat-util.h b/git-compat-util.h
index 398e0fac4f..2b8192fd2e 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -726,6 +726,7 @@ static inline uint64_t u64_add(uint64_t a, uint64_t b)
#define ALLOC_ARRAY(x, alloc) (x) = xmalloc(st_mult(sizeof(*(x)), (alloc)))
#define CALLOC_ARRAY(x, alloc) (x) = xcalloc((alloc), sizeof(*(x)))
#define REALLOC_ARRAY(x, alloc) (x) = xrealloc((x), st_mult(sizeof(*(x)), (alloc)))
+#define MEMZERO_ARRAY(x, alloc) memset((x), 0x0, st_mult(sizeof(*(x)), (alloc)))
#define COPY_ARRAY(dst, src, n) copy_array((dst), (src), (n), sizeof(*(dst)) + \
BARF_UNLESS_COPYABLE((dst), (src)))
---
base-commit: bdc5341ff65278a3cc80b2e8a02a2f02aa1fac06
change-id: 20251126-toon-big-endian-ci-fe62bb361974
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2] last-modified: fix use of uninitialized memory
2025-12-08 11:46 ` [PATCH v2] last-modified: fix use of uninitialized memory Toon Claes
@ 2025-12-08 13:26 ` Junio C Hamano
2025-12-09 8:43 ` Toon Claes
0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2025-12-08 13:26 UTC (permalink / raw)
To: Toon Claes; +Cc: git, Jeff King, Karthik Nayak, Anders Kaseorg
Toon Claes <toon@iotcl.com> writes:
> Changes in v2:
> - Defined and used MEMZERO_ARRAY() macro.
> - Fixed up title which used unexisting word
> - Link to v1: https://lore.kernel.org/r/20251128-toon-big-endian-ci-v1-1-80da0f629c1e@iotcl.com
Sorry, but hasn't the old one already been cooking in 'next'?
> ---
> builtin/last-modified.c | 2 +-
> git-compat-util.h | 1 +
> 2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/last-modified.c b/builtin/last-modified.c
> index b0ecbdc540..ac5387e861 100644
> --- a/builtin/last-modified.c
> +++ b/builtin/last-modified.c
> @@ -327,7 +327,7 @@ static void process_parent(struct last_modified *lm,
> if (!(parent->object.flags & PARENT1))
> active_paths_free(lm, parent);
>
> - memset(lm->scratch->words, 0x0, lm->scratch->word_alloc);
> + MEMZERO_ARRAY(lm->scratch->words, lm->scratch->word_alloc);
> diff_queue_clear(&diff_queued_diff);
> }
>
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 398e0fac4f..2b8192fd2e 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -726,6 +726,7 @@ static inline uint64_t u64_add(uint64_t a, uint64_t b)
> #define ALLOC_ARRAY(x, alloc) (x) = xmalloc(st_mult(sizeof(*(x)), (alloc)))
> #define CALLOC_ARRAY(x, alloc) (x) = xcalloc((alloc), sizeof(*(x)))
> #define REALLOC_ARRAY(x, alloc) (x) = xrealloc((x), st_mult(sizeof(*(x)), (alloc)))
> +#define MEMZERO_ARRAY(x, alloc) memset((x), 0x0, st_mult(sizeof(*(x)), (alloc)))
>
> #define COPY_ARRAY(dst, src, n) copy_array((dst), (src), (n), sizeof(*(dst)) + \
> BARF_UNLESS_COPYABLE((dst), (src)))
>
> ---
> base-commit: bdc5341ff65278a3cc80b2e8a02a2f02aa1fac06
> change-id: 20251126-toon-big-endian-ci-fe62bb361974
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] last-modified: fix use of uninitialized memory
2025-12-08 13:26 ` Junio C Hamano
@ 2025-12-09 8:43 ` Toon Claes
2025-12-09 12:18 ` Junio C Hamano
0 siblings, 1 reply; 14+ messages in thread
From: Toon Claes @ 2025-12-09 8:43 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Jeff King, Karthik Nayak, Anders Kaseorg
Junio C Hamano <gitster@pobox.com> writes:
> Sorry, but hasn't the old one already been cooking in 'next'?
Okay, fine by me. Let's abandon this v2 then.
--
Cheers,
Toon
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] last-modified: fix use of uninitialized memory
2025-12-09 8:43 ` Toon Claes
@ 2025-12-09 12:18 ` Junio C Hamano
0 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2025-12-09 12:18 UTC (permalink / raw)
To: Toon Claes; +Cc: git, Jeff King, Karthik Nayak, Anders Kaseorg
Toon Claes <toon@iotcl.com> writes:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> Sorry, but hasn't the old one already been cooking in 'next'?
>
> Okay, fine by me. Let's abandon this v2 then.
Understood. I however agree with Patrick that rewriting
memset(ptr, '\0', sizeof(*ptr) * nr)
to use CLEAR_ARRAY(), not limited to last-modified but everywhere in
the codebase, may not be a bad idea. It would be a good exercise to
hone our Coccinelle skill ;-)
^ permalink raw reply [flat|nested] 14+ messages in thread