From: Collin Funk <collin.funk1@gmail.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, correctmost <cmlists@sent.com>,
Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH 3/9] Makefile: turn on NO_MMAP when building with ASan
Date: Wed, 12 Nov 2025 12:06:47 -0800 [thread overview]
Message-ID: <87qzu32ey0.fsf@gmail.com> (raw)
In-Reply-To: <20251112103158.GA983233@coredump.intra.peff.net>
Jeff King <peff@peff.net> writes:
> On Wed, Nov 12, 2025 at 12:17:24AM -0800, Collin Funk wrote:
>
>> I see that an interceptor was added in 2023 [1]. Maybe your compiler is
>> older than that?
>
> No, I'm using gcc 15.2.0 (from Debian unstable).
>
> But I'm not sure if the linked code does anything useful for us.
>
> One, it's not clear to me if it is even kicking in or not. It only does
> anything if the region is "sanitizer managed", according to the details
> at https://reviews.llvm.org/D154659. I'm not sure what that means
> exactly, because I'm fuzzy on how the shadow map works.
>
> But even when it does do something, it seems to round up to the nearest
> page size. But we really want to know if we go even one byte over the
> requested length, because if we touch the 1235th byte of a 1234-byte
> buffer (which is going to be a NUL because of mmap rounding up the
> pages), then there's probably another test case somewhere where we
> access the 4097th byte of a 4096-byte buffer (which is going to
> segfault).
The glibc docs say that the length is rounded up to the nearest page
size [1]:
Thus, addresses for mapping must be page-aligned, and length values
will be rounded up.
This wording in POSIX makes me think that all systems will round up to
the nearest page size [2]:
Thus, while the parameter len need not meet a size or alignment
constraint, the system shall include, in any mapping operation, any
partial page specified by the address range starting at pa and
continuing for len bytes.
>> char *ptr = mmap (NULL, getpagesize (), PROT_READ | PROT_WRITE,
>> MAP_ANONYMOUS, -1, 0);
>> if (ptr == NULL)
>> abort ();
>
> I think you want to check for MAP_FAILED here, not NULL. And I think we
> always get that, because MAP_ANONYMOUS needs to be OR-ed into MAP_SHARED
> or MAP_PRIVATE. So here:
>
>> $ gcc -fsanitize=address main.c && ./a.out 2>&1 | grep ^SUMMARY:
>> SUMMARY: AddressSanitizer: SEGV (/home/collin/a.out+0x400554) (BuildId: 1b7a82189bfffb3f73d420e138b9859add25901a) in main
>> $ clang -fsanitize=address main.c && ./a.out 2>&1 | grep ^SUMMARY:
>> SUMMARY: AddressSanitizer: SEGV (/home/collin/a.out+0x4e9ee6) (BuildId: aca1d168eacebaa239082d8a45ab74c8470f4b31) in main
>
> I don't think this is ASan finding a problem. It is just telling us that
> we segfaulted for other reasons. And the fault here is because the
> broken mmap() invocation returned MAP_FAILED, and we tried to access
> that garbage pointer.
>
>> ptr[getpagesize () + 1] = 'a';
>
> This is also making a map that is a multiple of the page size, and then
> touching a byte that's on the next page. That's the easy-ish case that
> we can often already find, even without ASan (though it depends on what
> comes after the mapped memory; it might be a valid page).
Oops. I clearly don't use mmap much. :)
> A more interesting test for Git is to actually map a file, like:
>
> $ cat main.c
> #include <unistd.h>
> #include <fcntl.h>
> #include <sys/mman.h>
> #include <sys/stat.h>
> #include <stdio.h>
> static void die(const char *msg)
> {
> perror(msg);
> exit(1);
> }
> int main (int argc, const char **argv)
> {
> struct stat st;
> int fd;
> char *ptr;
>
> fd = open(argv[1], O_RDONLY);
> if (fd < 0)
> die("open");
> if (fstat(fd, &st) < 0)
> die("fstat");
> ptr = mmap (NULL, st.st_size, PROT_READ, MAP_SHARED, fd, 0);
> if (ptr == MAP_FAILED)
> die("mmap");
> printf("last byte: %d\n", ptr[st.st_size-1]);
> printf("one byte after: %d\n", ptr[st.st_size]);
> return 0;
> }
> $ yes | head -c 4096 >big
> $ yes | head -c 372 >small
>
> And ASan does often detect the problem for the "big" page-sized file,
> but not consistently! If I do:
>
> gcc -fsanitize=address main.c
> while ./a.out big; do echo ok; done
>
> I may get output like:
>
> last byte: 10
> one byte after: 127
> ok
> last byte: 10
> one byte after: 0
> ok
> last byte: 10
> one byte after: 0
> ok
> last byte: 10
> =================================================================
> ==988617==ERROR: AddressSanitizer: unknown-crash on address 0x7efd40b9f000 at pc 0x564fe77b64eb bp 0x7ffff49e8160 sp 0x7ffff49e8158
> READ of size 1 at 0x7efd40b9f000 thread T0
> #0 0x564fe77b64ea in main (/home/peff/a.out+0x14ea) (BuildId: 8db121bb5c048cb336f8be729e8cefebd6f059a3)
> #1 0x7efd41233ca7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
> #2 0x7efd41233d64 in __libc_start_main_impl ../csu/libc-start.c:360
> #3 0x564fe77b6150 in _start (/home/peff/a.out+0x1150) (BuildId: 8db121bb5c048cb336f8be729e8cefebd6f059a3)
>
> Address 0x7efd40b9f000 is a wild pointer inside of access range of size 0x000000000001.
>
> So it worked three times without ASan noticing the problem (producing
> two different outputs), and then ASan finally crashed. But it didn't
> give us the usual information we get for a malloc overflow. It's just an
> "unknown crash" from a "wild pointer". So I'm not sure if it's even
> finding these through its own poisoning, and not just catching an
> unlucky segfault.
>
> If we switch to the small file, then ASan never reports anything! The OS
> gives us a page-sized chunk, so we consistently read a "0" in from the
> byte after our requested size.
>
> If we swap out the mmap for:
>
> ptr = malloc(st.st_size);
> read(fd, ptr, st.st_size);
>
> (which is roughly what our NO_MMAP wrapper is doing behind the scenes),
> then ASan does catch it consistently, even for the "small" file:
>
> $ ./a.out small
> =================================================================
> ==1008630==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c7d11fe01b4 at pc 0x55cf89b1b4c8 bp 0x7fff471b84e0 sp 0x7fff471b84d8
> READ of size 1 at 0x7c7d11fe01b4 thread T0
> #0 0x55cf89b1b4c7 in main (/home/peff/a.out+0x14c7) (BuildId: 2cebfcd0a00064eaaed750af010fcecdae2f5666)
> #1 0x7f4d12e33ca7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
> #2 0x7f4d12e33d64 in __libc_start_main_impl ../csu/libc-start.c:360
> #3 0x55cf89b1b150 in _start (/home/peff/a.out+0x1150) (BuildId: 2cebfcd0a00064eaaed750af010fcecdae2f5666)
>
> 0x7c7d11fe01b4 is located 0 bytes after 372-byte region [0x7c7d11fe0040,0x7c7d11fe01b4)
> allocated by thread T0 here:
> #0 0x7f4d1311a0ab in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:67
> #1 0x55cf89b1b3a0 in main (/home/peff/a.out+0x13a0) (BuildId: 2cebfcd0a00064eaaed750af010fcecdae2f5666)
> #2 0x7f4d12e33ca7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
Cool, thanks for the actual working example. Your patch makes perfect
sense now.
Collin
[1] https://www.gnu.org/software/libc/manual/html_node/Memory_002dmapped-I_002fO.html
[2] https://pubs.opengroup.org/onlinepubs/9799919799/functions/mmap.html
next prev parent reply other threads:[~2025-11-12 20:06 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-12 7:55 [PATCH 0/9] asan bonanza Jeff King
2025-11-12 7:56 ` [PATCH 1/9] compat/mmap: mark unused argument in git_munmap() Jeff King
2025-11-12 8:01 ` [PATCH 2/9] pack-bitmap: handle name-hash lookups in incremental bitmaps Jeff King
2025-11-12 11:25 ` Patrick Steinhardt
2025-11-13 2:55 ` Taylor Blau
2025-11-18 8:59 ` Jeff King
2025-11-12 8:02 ` [PATCH 3/9] Makefile: turn on NO_MMAP when building with ASan Jeff King
2025-11-12 8:17 ` Collin Funk
2025-11-12 10:31 ` Jeff King
2025-11-12 20:06 ` Collin Funk [this message]
2025-11-12 11:26 ` Patrick Steinhardt
2025-11-13 3:12 ` Taylor Blau
2025-11-13 6:34 ` Patrick Steinhardt
2025-11-18 8:49 ` Jeff King
2025-11-13 16:30 ` Junio C Hamano
2025-11-14 7:00 ` Patrick Steinhardt
2025-11-15 2:13 ` Jeff King
2025-11-12 8:05 ` [PATCH 4/9] cache-tree: avoid strtol() on non-string buffer Jeff King
2025-11-12 11:26 ` Patrick Steinhardt
2025-11-13 3:09 ` Taylor Blau
2025-11-18 8:40 ` Jeff King
2025-11-18 8:38 ` Jeff King
2025-11-12 8:06 ` [PATCH 5/9] fsck: assert newline presence in fsck_ident() Jeff King
2025-11-12 8:06 ` [PATCH 6/9] fsck: avoid strcspn() " Jeff King
2025-11-12 8:06 ` [PATCH 7/9] fsck: remove redundant date timestamp check Jeff King
2025-11-12 8:10 ` [PATCH 8/9] fsck: avoid parse_timestamp() on buffer that isn't NUL-terminated Jeff King
2025-11-12 11:25 ` Patrick Steinhardt
2025-11-12 19:36 ` Junio C Hamano
2025-11-15 2:12 ` Jeff King
2025-11-12 8:10 ` [PATCH 9/9] t: enable ASan's strict_string_checks option Jeff King
2025-11-13 3:17 ` [PATCH 0/9] asan bonanza Taylor Blau
2025-11-18 9:11 ` [PATCH v2 " Jeff King
2025-11-18 9:11 ` [PATCH v2 1/9] compat/mmap: mark unused argument in git_munmap() Jeff King
2025-11-18 9:12 ` [PATCH v2 2/9] pack-bitmap: handle name-hash lookups in incremental bitmaps Jeff King
2025-11-18 9:12 ` [PATCH v2 3/9] Makefile: turn on NO_MMAP when building with ASan Jeff King
2025-11-18 9:12 ` [PATCH v2 4/9] cache-tree: avoid strtol() on non-string buffer Jeff King
2025-11-18 14:30 ` Phillip Wood
2025-11-23 6:19 ` Junio C Hamano
2025-11-23 15:51 ` Phillip Wood
2025-11-23 18:06 ` Junio C Hamano
2025-11-24 22:30 ` Jeff King
2025-11-24 23:09 ` Junio C Hamano
2025-11-26 15:09 ` Jeff King
2025-11-26 17:22 ` Junio C Hamano
2025-11-30 13:13 ` [PATCH 0/4] more robust functions for parsing int from buf Jeff King
2025-11-30 13:14 ` [PATCH 1/4] parse: prefer bool to int for boolean returns Jeff King
2025-12-04 11:23 ` Patrick Steinhardt
2025-11-30 13:15 ` [PATCH 2/4] parse: add functions for parsing from non-string buffers Jeff King
2025-11-30 13:46 ` my complaints with clar Jeff King
2025-12-01 14:16 ` Phillip Wood
2025-12-04 11:09 ` Patrick Steinhardt
2025-12-05 18:30 ` Jeff King
2025-12-04 11:23 ` [PATCH 2/4] parse: add functions for parsing from non-string buffers Patrick Steinhardt
2025-12-05 16:11 ` Phillip Wood
2025-11-30 13:15 ` [PATCH 3/4] cache-tree: use parse_int_from_buf() Jeff King
2025-11-30 13:16 ` [PATCH 4/4] fsck: use parse_unsigned_from_buf() for parsing timestamp Jeff King
2025-11-18 9:12 ` [PATCH v2 5/9] fsck: assert newline presence in fsck_ident() Jeff King
2025-11-18 9:12 ` [PATCH v2 6/9] fsck: avoid strcspn() " Jeff King
2025-11-18 9:12 ` [PATCH v2 7/9] fsck: remove redundant date timestamp check Jeff King
2025-11-18 9:12 ` [PATCH v2 8/9] fsck: avoid parse_timestamp() on buffer that isn't NUL-terminated Jeff King
2025-11-18 9:12 ` [PATCH v2 9/9] t: enable ASan's strict_string_checks option Jeff King
2025-11-23 5:49 ` [PATCH v2 0/9] asan bonanza Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87qzu32ey0.fsf@gmail.com \
--to=collin.funk1@gmail.com \
--cc=cmlists@sent.com \
--cc=git@vger.kernel.org \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).