From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: Jacob Keller <jacob.e.keller@intel.com>,
git@vger.kernel.org, Patrick Steinhardt <ps@pks.im>
Subject: Re: [PATCH 4/4] Makefile: turn on NO_MMAP when building with LSan
Date: Fri, 06 Mar 2026 17:14:28 -0800 [thread overview]
Message-ID: <xmqqqzpwv3t7.fsf@gitster.g> (raw)
In-Reply-To: <20260305231305.GD2901305@coredump.intra.peff.net> (Jeff King's message of "Thu, 5 Mar 2026 18:13:05 -0500")
Jeff King <peff@peff.net> writes:
> @@ -1600,6 +1600,7 @@ BASIC_CFLAGS += -DSHA1DC_FORCE_ALIGNED_ACCESS
> endif
> ifneq ($(filter leak,$(SANITIZERS)),)
> BASIC_CFLAGS += -O0
> +NO_MMAP = CatchMapLeaks
> SANITIZE_LEAK = YesCompiledWithIt
> endif
> ifneq ($(filter address,$(SANITIZERS)),)
And of course, this "breaks" the leaks job at CI without being the
true culprit.
https://github.com/git/git/actions/runs/22786105918/job/66103114142
My bisection between v2.52.0 and v2.53.0 with the following
$ git bisect start v2.53.0 v2.52.0
$ git bisect run sh :doit
where :doit has the shell script attached at the end of this message
blames this commit. I didn't dig further than that.
commit 4c89d31494bff4bde6079a0e0821f1437e37d07b
Author: Patrick Steinhardt <ps@pks.im>
Date: Sun Nov 23 19:59:37 2025 +0100
streaming: rely on object sources to create object stream
When creating an object stream we first look up the object info and, if
it's present, we call into the respective backend that contains the
object to create a new stream for it.
This has the consequence that, for loose object source, we basically
iterate through the object sources twice: we first discover that the
file exists as a loose object in the first place by iterating through
all sources. And, once we have discovered it, we again walk through all
sources to try and map the object. The same issue will eventually also
surface once the packfile store becomes per-object-source.
Furthermore, it feels rather pointless to first look up the object only
to then try and read it.
Refactor the logic to be centered around sources instead. Instead of
first reading the object, we immediately ask the source to create the
object stream for us. If the object exists we get stream, otherwise
we'll try the next source.
Like this we only have to iterate through sources once. But even more
importantly, this change also helps us to make the whole logic
pluggable. The object read stream subsystem does not need to be aware of
the different source backends anymore, but eventually it'll only have to
call the source's callback function.
Note that at the current point in time we aren't fully there yet:
- The packfile store still sits on the object database level and is
thus agnostic of the sources.
- We still have to call into both the packfile store and the loose
object source.
But both of these issues will soon be addressed.
This refactoring results in a slight change to semantics: previously, it
was `odb_read_object_info_extended()` that picked the source for us, and
it would have favored packed (non-deltified) objects over loose objects.
And while we still favor packed over loose objects for a single source
with the new logic, we'll now favor a loose object from an earlier
source over a packed object from a later source.
Ultimately this shouldn't matter though: the stream doesn't indicate to
the caller which source it is from and whether it was created from a
packed or loose object, so such details are opaque to the caller. And
other than that we should be able to assume that two objects with the
same object ID should refer to the same content, so the streamed data
would be the same, too.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
streaming.c | 65 +++++++++++++++++++++++--------------------------------------
1 file changed, 24 insertions(+), 41 deletions(-)
---- >8 ----
#!/bin/sh
git apply -3 <"$0" || {
git reset --hard
exit 125
}
(
export SANITIZE=leak GIT_TEST_PASSING_SANITIZE_LEAK=true
make NO_MMAP=CatchMapLeaks CC=clang &&
cd t && sh t1060-object-corruption.sh
)
status=$?
make distclean
git reset --hard
exit $status
diff --git i/compat/mmap.c w/compat/mmap.c
index 2fe1c7732e..1a118711f7 100644
--- i/compat/mmap.c
+++ w/compat/mmap.c
@@ -38,7 +38,7 @@ void *git_mmap(void *start, size_t length, int prot, int flags, int fd, off_t of
return start;
}
-int git_munmap(void *start, size_t length)
+int git_munmap(void *start, size_t length UNUSED)
{
free(start);
return 0;
next prev parent reply other threads:[~2026-03-07 1:14 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-05 20:51 memory leak when cloning a repository Jacob Keller
2026-03-05 22:02 ` Jeff King
2026-03-05 23:03 ` [PATCH 0/4] plugging some mmap() leaks Jeff King
2026-03-05 23:08 ` [PATCH 1/4] check_connected(): delay opening new_pack Jeff King
2026-03-05 23:18 ` Jacob Keller
2026-03-05 23:09 ` [PATCH 2/4] check_connected(): fix leak of pack-index mmap Jeff King
2026-03-05 23:20 ` Jacob Keller
2026-03-05 23:12 ` [PATCH 3/4] pack-revindex: avoid double-loading .rev files Jeff King
2026-03-05 23:13 ` [PATCH 4/4] Makefile: turn on NO_MMAP when building with LSan Jeff King
2026-03-06 9:17 ` Jacob Keller
2026-03-06 16:25 ` [PATCH 5/4] meson: " Jeff King
2026-03-06 18:00 ` Ramsay Jones
2026-03-07 1:14 ` Junio C Hamano [this message]
2026-03-07 2:24 ` [PATCH 3.5/4] object-file: fix mmap() leak in odb_source_loose_read_object_stream() Jeff King
2026-03-07 5:35 ` Junio C Hamano
2026-03-10 12:23 ` Patrick Steinhardt
2026-03-06 4:37 ` [PATCH 0/4] plugging some mmap() leaks Ramsay Jones
2026-03-06 16:21 ` Jeff King
2026-03-06 17:49 ` Ramsay Jones
2026-03-06 18:37 ` Junio C Hamano
2026-03-06 18:55 ` Ramsay Jones
2026-03-06 22:05 ` Junio C Hamano
2026-03-06 23:25 ` Ramsay Jones
2026-03-07 1:15 ` Junio C Hamano
2026-03-05 23:16 ` memory leak when cloning a repository Jacob Keller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqqzpwv3t7.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=jacob.e.keller@intel.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox