From: Collin Funk <collin.funk1@gmail.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, correctmost <cmlists@sent.com>,
Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH 3/9] Makefile: turn on NO_MMAP when building with ASan
Date: Wed, 12 Nov 2025 12:06:47 -0800 [thread overview]
Message-ID: <87qzu32ey0.fsf@gmail.com> (raw)
In-Reply-To: <20251112103158.GA983233@coredump.intra.peff.net>
Jeff King <peff@peff.net> writes:
> On Wed, Nov 12, 2025 at 12:17:24AM -0800, Collin Funk wrote:
>
>> I see that an interceptor was added in 2023 [1]. Maybe your compiler is
>> older than that?
>
> No, I'm using gcc 15.2.0 (from Debian unstable).
>
> But I'm not sure if the linked code does anything useful for us.
>
> One, it's not clear to me if it is even kicking in or not. It only does
> anything if the region is "sanitizer managed", according to the details
> at https://reviews.llvm.org/D154659. I'm not sure what that means
> exactly, because I'm fuzzy on how the shadow map works.
>
> But even when it does do something, it seems to round up to the nearest
> page size. But we really want to know if we go even one byte over the
> requested length, because if we touch the 1235th byte of a 1234-byte
> buffer (which is going to be a NUL because of mmap rounding up the
> pages), then there's probably another test case somewhere where we
> access the 4097th byte of a 4096-byte buffer (which is going to
> segfault).
The glibc docs say that the length is rounded up to the nearest page
size [1]:
Thus, addresses for mapping must be page-aligned, and length values
will be rounded up.
This wording in POSIX makes me think that all systems will round up to
the nearest page size [2]:
Thus, while the parameter len need not meet a size or alignment
constraint, the system shall include, in any mapping operation, any
partial page specified by the address range starting at pa and
continuing for len bytes.
>> char *ptr = mmap (NULL, getpagesize (), PROT_READ | PROT_WRITE,
>> MAP_ANONYMOUS, -1, 0);
>> if (ptr == NULL)
>> abort ();
>
> I think you want to check for MAP_FAILED here, not NULL. And I think we
> always get that, because MAP_ANONYMOUS needs to be OR-ed into MAP_SHARED
> or MAP_PRIVATE. So here:
>
>> $ gcc -fsanitize=address main.c && ./a.out 2>&1 | grep ^SUMMARY:
>> SUMMARY: AddressSanitizer: SEGV (/home/collin/a.out+0x400554) (BuildId: 1b7a82189bfffb3f73d420e138b9859add25901a) in main
>> $ clang -fsanitize=address main.c && ./a.out 2>&1 | grep ^SUMMARY:
>> SUMMARY: AddressSanitizer: SEGV (/home/collin/a.out+0x4e9ee6) (BuildId: aca1d168eacebaa239082d8a45ab74c8470f4b31) in main
>
> I don't think this is ASan finding a problem. It is just telling us that
> we segfaulted for other reasons. And the fault here is because the
> broken mmap() invocation returned MAP_FAILED, and we tried to access
> that garbage pointer.
>
>> ptr[getpagesize () + 1] = 'a';
>
> This is also making a map that is a multiple of the page size, and then
> touching a byte that's on the next page. That's the easy-ish case that
> we can often already find, even without ASan (though it depends on what
> comes after the mapped memory; it might be a valid page).
Oops. I clearly don't use mmap much. :)
> A more interesting test for Git is to actually map a file, like:
>
> $ cat main.c
> #include <unistd.h>
> #include <fcntl.h>
> #include <sys/mman.h>
> #include <sys/stat.h>
> #include <stdio.h>
> static void die(const char *msg)
> {
> perror(msg);
> exit(1);
> }
> int main (int argc, const char **argv)
> {
> struct stat st;
> int fd;
> char *ptr;
>
> fd = open(argv[1], O_RDONLY);
> if (fd < 0)
> die("open");
> if (fstat(fd, &st) < 0)
> die("fstat");
> ptr = mmap (NULL, st.st_size, PROT_READ, MAP_SHARED, fd, 0);
> if (ptr == MAP_FAILED)
> die("mmap");
> printf("last byte: %d\n", ptr[st.st_size-1]);
> printf("one byte after: %d\n", ptr[st.st_size]);
> return 0;
> }
> $ yes | head -c 4096 >big
> $ yes | head -c 372 >small
>
> And ASan does often detect the problem for the "big" page-sized file,
> but not consistently! If I do:
>
> gcc -fsanitize=address main.c
> while ./a.out big; do echo ok; done
>
> I may get output like:
>
> last byte: 10
> one byte after: 127
> ok
> last byte: 10
> one byte after: 0
> ok
> last byte: 10
> one byte after: 0
> ok
> last byte: 10
> =================================================================
> ==988617==ERROR: AddressSanitizer: unknown-crash on address 0x7efd40b9f000 at pc 0x564fe77b64eb bp 0x7ffff49e8160 sp 0x7ffff49e8158
> READ of size 1 at 0x7efd40b9f000 thread T0
> #0 0x564fe77b64ea in main (/home/peff/a.out+0x14ea) (BuildId: 8db121bb5c048cb336f8be729e8cefebd6f059a3)
> #1 0x7efd41233ca7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
> #2 0x7efd41233d64 in __libc_start_main_impl ../csu/libc-start.c:360
> #3 0x564fe77b6150 in _start (/home/peff/a.out+0x1150) (BuildId: 8db121bb5c048cb336f8be729e8cefebd6f059a3)
>
> Address 0x7efd40b9f000 is a wild pointer inside of access range of size 0x000000000001.
>
> So it worked three times without ASan noticing the problem (producing
> two different outputs), and then ASan finally crashed. But it didn't
> give us the usual information we get for a malloc overflow. It's just an
> "unknown crash" from a "wild pointer". So I'm not sure if it's even
> finding these through its own poisoning, and not just catching an
> unlucky segfault.
>
> If we switch to the small file, then ASan never reports anything! The OS
> gives us a page-sized chunk, so we consistently read a "0" in from the
> byte after our requested size.
>
> If we swap out the mmap for:
>
> ptr = malloc(st.st_size);
> read(fd, ptr, st.st_size);
>
> (which is roughly what our NO_MMAP wrapper is doing behind the scenes),
> then ASan does catch it consistently, even for the "small" file:
>
> $ ./a.out small
> =================================================================
> ==1008630==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c7d11fe01b4 at pc 0x55cf89b1b4c8 bp 0x7fff471b84e0 sp 0x7fff471b84d8
> READ of size 1 at 0x7c7d11fe01b4 thread T0
> #0 0x55cf89b1b4c7 in main (/home/peff/a.out+0x14c7) (BuildId: 2cebfcd0a00064eaaed750af010fcecdae2f5666)
> #1 0x7f4d12e33ca7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
> #2 0x7f4d12e33d64 in __libc_start_main_impl ../csu/libc-start.c:360
> #3 0x55cf89b1b150 in _start (/home/peff/a.out+0x1150) (BuildId: 2cebfcd0a00064eaaed750af010fcecdae2f5666)
>
> 0x7c7d11fe01b4 is located 0 bytes after 372-byte region [0x7c7d11fe0040,0x7c7d11fe01b4)
> allocated by thread T0 here:
> #0 0x7f4d1311a0ab in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:67
> #1 0x55cf89b1b3a0 in main (/home/peff/a.out+0x13a0) (BuildId: 2cebfcd0a00064eaaed750af010fcecdae2f5666)
> #2 0x7f4d12e33ca7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
Cool, thanks for the actual working example. Your patch makes perfect
sense now.
Collin
[1] https://www.gnu.org/software/libc/manual/html_node/Memory_002dmapped-I_002fO.html
[2] https://pubs.opengroup.org/onlinepubs/9799919799/functions/mmap.html
next prev parent reply other threads:[~2025-11-12 20:06 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-12 7:55 [PATCH 0/9] asan bonanza Jeff King
2025-11-12 7:56 ` [PATCH 1/9] compat/mmap: mark unused argument in git_munmap() Jeff King
2025-11-12 8:01 ` [PATCH 2/9] pack-bitmap: handle name-hash lookups in incremental bitmaps Jeff King
2025-11-12 11:25 ` Patrick Steinhardt
2025-11-13 2:55 ` Taylor Blau
2025-11-18 8:59 ` Jeff King
2025-11-12 8:02 ` [PATCH 3/9] Makefile: turn on NO_MMAP when building with ASan Jeff King
2025-11-12 8:17 ` Collin Funk
2025-11-12 10:31 ` Jeff King
2025-11-12 20:06 ` Collin Funk [this message]
2025-11-12 11:26 ` Patrick Steinhardt
2025-11-13 3:12 ` Taylor Blau
2025-11-13 6:34 ` Patrick Steinhardt
2025-11-18 8:49 ` Jeff King
2025-11-13 16:30 ` Junio C Hamano
2025-11-14 7:00 ` Patrick Steinhardt
2025-11-15 2:13 ` Jeff King
2025-11-12 8:05 ` [PATCH 4/9] cache-tree: avoid strtol() on non-string buffer Jeff King
2025-11-12 11:26 ` Patrick Steinhardt
2025-11-13 3:09 ` Taylor Blau
2025-11-18 8:40 ` Jeff King
2025-11-18 8:38 ` Jeff King
2025-11-12 8:06 ` [PATCH 5/9] fsck: assert newline presence in fsck_ident() Jeff King
2025-11-12 8:06 ` [PATCH 6/9] fsck: avoid strcspn() " Jeff King
2025-11-12 8:06 ` [PATCH 7/9] fsck: remove redundant date timestamp check Jeff King
2025-11-12 8:10 ` [PATCH 8/9] fsck: avoid parse_timestamp() on buffer that isn't NUL-terminated Jeff King
2025-11-12 11:25 ` Patrick Steinhardt
2025-11-12 19:36 ` Junio C Hamano
2025-11-15 2:12 ` Jeff King
2025-11-12 8:10 ` [PATCH 9/9] t: enable ASan's strict_string_checks option Jeff King
2025-11-13 3:17 ` [PATCH 0/9] asan bonanza Taylor Blau
2025-11-18 9:11 ` [PATCH v2 " Jeff King
2025-11-18 9:11 ` [PATCH v2 1/9] compat/mmap: mark unused argument in git_munmap() Jeff King
2025-11-18 9:12 ` [PATCH v2 2/9] pack-bitmap: handle name-hash lookups in incremental bitmaps Jeff King
2025-11-18 9:12 ` [PATCH v2 3/9] Makefile: turn on NO_MMAP when building with ASan Jeff King
2025-11-18 9:12 ` [PATCH v2 4/9] cache-tree: avoid strtol() on non-string buffer Jeff King
2025-11-18 14:30 ` Phillip Wood
2025-11-23 6:19 ` Junio C Hamano
2025-11-23 15:51 ` Phillip Wood
2025-11-23 18:06 ` Junio C Hamano
2025-11-24 22:30 ` Jeff King
2025-11-24 23:09 ` Junio C Hamano
2025-11-26 15:09 ` Jeff King
2025-11-26 17:22 ` Junio C Hamano
2025-11-30 13:13 ` [PATCH 0/4] more robust functions for parsing int from buf Jeff King
2025-11-30 13:14 ` [PATCH 1/4] parse: prefer bool to int for boolean returns Jeff King
2025-12-04 11:23 ` Patrick Steinhardt
2025-11-30 13:15 ` [PATCH 2/4] parse: add functions for parsing from non-string buffers Jeff King
2025-11-30 13:46 ` my complaints with clar Jeff King
2025-12-01 14:16 ` Phillip Wood
2025-12-04 11:09 ` Patrick Steinhardt
2025-12-05 18:30 ` Jeff King
2025-12-04 11:23 ` [PATCH 2/4] parse: add functions for parsing from non-string buffers Patrick Steinhardt
2025-12-05 16:11 ` Phillip Wood
2026-01-20 20:54 ` Junio C Hamano
2026-01-21 5:27 ` Jeff King
2025-11-30 13:15 ` [PATCH 3/4] cache-tree: use parse_int_from_buf() Jeff King
2025-11-30 13:16 ` [PATCH 4/4] fsck: use parse_unsigned_from_buf() for parsing timestamp Jeff King
2025-11-18 9:12 ` [PATCH v2 5/9] fsck: assert newline presence in fsck_ident() Jeff King
2025-11-18 9:12 ` [PATCH v2 6/9] fsck: avoid strcspn() " Jeff King
2025-11-18 9:12 ` [PATCH v2 7/9] fsck: remove redundant date timestamp check Jeff King
2025-11-18 9:12 ` [PATCH v2 8/9] fsck: avoid parse_timestamp() on buffer that isn't NUL-terminated Jeff King
2025-11-18 9:12 ` [PATCH v2 9/9] t: enable ASan's strict_string_checks option Jeff King
2025-11-23 5:49 ` [PATCH v2 0/9] asan bonanza Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87qzu32ey0.fsf@gmail.com \
--to=collin.funk1@gmail.com \
--cc=cmlists@sent.com \
--cc=git@vger.kernel.org \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.