git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Collin Funk <collin.funk1@gmail.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org,  correctmost <cmlists@sent.com>,
	 Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH 3/9] Makefile: turn on NO_MMAP when building with ASan
Date: Wed, 12 Nov 2025 12:06:47 -0800	[thread overview]
Message-ID: <87qzu32ey0.fsf@gmail.com> (raw)
In-Reply-To: <20251112103158.GA983233@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Wed, Nov 12, 2025 at 12:17:24AM -0800, Collin Funk wrote:
>
>> I see that an interceptor was added in 2023 [1]. Maybe your compiler is
>> older than that?
>
> No, I'm using gcc 15.2.0 (from Debian unstable).
>
> But I'm not sure if the linked code does anything useful for us.
>
> One, it's not clear to me if it is even kicking in or not. It only does
> anything if the region is "sanitizer managed", according to the details
> at https://reviews.llvm.org/D154659. I'm not sure what that means
> exactly, because I'm fuzzy on how the shadow map works.
>
> But even when it does do something, it seems to round up to the nearest
> page size. But we really want to know if we go even one byte over the
> requested length, because if we touch the 1235th byte of a 1234-byte
> buffer (which is going to be a NUL because of mmap rounding up the
> pages), then there's probably another test case somewhere where we
> access the 4097th byte of a 4096-byte buffer (which is going to
> segfault).

The glibc docs say that the length is rounded up to the nearest page
size [1]:

    Thus, addresses for mapping must be page-aligned, and length values
    will be rounded up.

This wording in POSIX makes me think that all systems will round up to
the nearest page size [2]:

    Thus, while the parameter len need not meet a size or alignment
    constraint, the system shall include, in any mapping operation, any
    partial page specified by the address range starting at pa and
    continuing for len bytes.

>>       char *ptr = mmap (NULL, getpagesize (), PROT_READ | PROT_WRITE,
>>     		    MAP_ANONYMOUS, -1, 0);
>>       if (ptr == NULL)
>>         abort ();
>
> I think you want to check for MAP_FAILED here, not NULL. And I think we
> always get that, because MAP_ANONYMOUS needs to be OR-ed into MAP_SHARED
> or MAP_PRIVATE. So here:
>
>>     $ gcc -fsanitize=address main.c && ./a.out 2>&1 | grep ^SUMMARY:
>>     SUMMARY: AddressSanitizer: SEGV (/home/collin/a.out+0x400554) (BuildId: 1b7a82189bfffb3f73d420e138b9859add25901a) in main
>>     $ clang -fsanitize=address main.c && ./a.out 2>&1 | grep ^SUMMARY:
>>     SUMMARY: AddressSanitizer: SEGV (/home/collin/a.out+0x4e9ee6) (BuildId: aca1d168eacebaa239082d8a45ab74c8470f4b31) in main
>
> I don't think this is ASan finding a problem. It is just telling us that
> we segfaulted for other reasons. And the fault here is because the
> broken mmap() invocation returned MAP_FAILED, and we tried to access
> that garbage pointer.
>
>>       ptr[getpagesize () + 1] = 'a';
>
> This is also making a map that is a multiple of the page size, and then
> touching a byte that's on the next page. That's the easy-ish case that
> we can often already find, even without ASan (though it depends on what
> comes after the mapped memory; it might be a valid page).

Oops. I clearly don't use mmap much. :)

> A more interesting test for Git is to actually map a file, like:
>
>   $ cat main.c
>   #include <unistd.h>
>   #include <fcntl.h>
>   #include <sys/mman.h>
>   #include <sys/stat.h>
>   #include <stdio.h>
>   static void die(const char *msg)
>   {
>   	perror(msg);
>   	exit(1);
>   }
>   int main (int argc, const char **argv)
>   {
>   	struct stat st;
>   	int fd;
>   	char *ptr;
>   
>   	fd = open(argv[1], O_RDONLY);
>   	if (fd < 0)
>   		die("open");
>   	if (fstat(fd, &st) < 0)
>   		die("fstat");
>   	ptr = mmap (NULL, st.st_size, PROT_READ, MAP_SHARED, fd, 0);
>   	if (ptr == MAP_FAILED)
>   		die("mmap");
>   	printf("last byte: %d\n", ptr[st.st_size-1]);
>   	printf("one byte after: %d\n", ptr[st.st_size]);
>   	return 0;
>   }
>   $ yes | head -c 4096 >big
>   $ yes | head -c 372 >small
>
> And ASan does often detect the problem for the "big" page-sized file,
> but not consistently! If I do:
>
>   gcc -fsanitize=address main.c
>   while ./a.out big; do echo ok; done
>
> I may get output like:
>
>   last byte: 10
>   one byte after: 127
>   ok
>   last byte: 10
>   one byte after: 0
>   ok
>   last byte: 10
>   one byte after: 0
>   ok
>   last byte: 10
>   =================================================================
>   ==988617==ERROR: AddressSanitizer: unknown-crash on address 0x7efd40b9f000 at pc 0x564fe77b64eb bp 0x7ffff49e8160 sp 0x7ffff49e8158
>   READ of size 1 at 0x7efd40b9f000 thread T0
>       #0 0x564fe77b64ea in main (/home/peff/a.out+0x14ea) (BuildId: 8db121bb5c048cb336f8be729e8cefebd6f059a3)
>       #1 0x7efd41233ca7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>       #2 0x7efd41233d64 in __libc_start_main_impl ../csu/libc-start.c:360
>       #3 0x564fe77b6150 in _start (/home/peff/a.out+0x1150) (BuildId: 8db121bb5c048cb336f8be729e8cefebd6f059a3)
>   
>   Address 0x7efd40b9f000 is a wild pointer inside of access range of size 0x000000000001.
>
> So it worked three times without ASan noticing the problem (producing
> two different outputs), and then ASan finally crashed. But it didn't
> give us the usual information we get for a malloc overflow. It's just an
> "unknown crash" from a "wild pointer". So I'm not sure if it's even
> finding these through its own poisoning, and not just catching an
> unlucky segfault.
>
> If we switch to the small file, then ASan never reports anything! The OS
> gives us a page-sized chunk, so we consistently read a "0" in from the
> byte after our requested size.
>
> If we swap out the mmap for:
>
>   ptr = malloc(st.st_size);
>   read(fd, ptr, st.st_size);
>
> (which is roughly what our NO_MMAP wrapper is doing behind the scenes),
> then ASan does catch it consistently, even for the "small" file:
>
>   $ ./a.out small
>   =================================================================
>   ==1008630==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c7d11fe01b4 at pc 0x55cf89b1b4c8 bp 0x7fff471b84e0 sp 0x7fff471b84d8
>   READ of size 1 at 0x7c7d11fe01b4 thread T0
>       #0 0x55cf89b1b4c7 in main (/home/peff/a.out+0x14c7) (BuildId: 2cebfcd0a00064eaaed750af010fcecdae2f5666)
>       #1 0x7f4d12e33ca7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>       #2 0x7f4d12e33d64 in __libc_start_main_impl ../csu/libc-start.c:360
>       #3 0x55cf89b1b150 in _start (/home/peff/a.out+0x1150) (BuildId: 2cebfcd0a00064eaaed750af010fcecdae2f5666)
>   
>   0x7c7d11fe01b4 is located 0 bytes after 372-byte region [0x7c7d11fe0040,0x7c7d11fe01b4)
>   allocated by thread T0 here:
>       #0 0x7f4d1311a0ab in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:67
>       #1 0x55cf89b1b3a0 in main (/home/peff/a.out+0x13a0) (BuildId: 2cebfcd0a00064eaaed750af010fcecdae2f5666)
>       #2 0x7f4d12e33ca7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

Cool, thanks for the actual working example. Your patch makes perfect
sense now.

Collin

[1] https://www.gnu.org/software/libc/manual/html_node/Memory_002dmapped-I_002fO.html
[2] https://pubs.opengroup.org/onlinepubs/9799919799/functions/mmap.html

  reply	other threads:[~2025-11-12 20:06 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-12  7:55 [PATCH 0/9] asan bonanza Jeff King
2025-11-12  7:56 ` [PATCH 1/9] compat/mmap: mark unused argument in git_munmap() Jeff King
2025-11-12  8:01 ` [PATCH 2/9] pack-bitmap: handle name-hash lookups in incremental bitmaps Jeff King
2025-11-12 11:25   ` Patrick Steinhardt
2025-11-13  2:55   ` Taylor Blau
2025-11-18  8:59     ` Jeff King
2025-11-12  8:02 ` [PATCH 3/9] Makefile: turn on NO_MMAP when building with ASan Jeff King
2025-11-12  8:17   ` Collin Funk
2025-11-12 10:31     ` Jeff King
2025-11-12 20:06       ` Collin Funk [this message]
2025-11-12 11:26   ` Patrick Steinhardt
2025-11-13  3:12     ` Taylor Blau
2025-11-13  6:34       ` Patrick Steinhardt
2025-11-18  8:49       ` Jeff King
2025-11-13 16:30     ` Junio C Hamano
2025-11-14  7:00       ` Patrick Steinhardt
2025-11-15  2:13         ` Jeff King
2025-11-12  8:05 ` [PATCH 4/9] cache-tree: avoid strtol() on non-string buffer Jeff King
2025-11-12 11:26   ` Patrick Steinhardt
2025-11-13  3:09     ` Taylor Blau
2025-11-18  8:40       ` Jeff King
2025-11-18  8:38     ` Jeff King
2025-11-12  8:06 ` [PATCH 5/9] fsck: assert newline presence in fsck_ident() Jeff King
2025-11-12  8:06 ` [PATCH 6/9] fsck: avoid strcspn() " Jeff King
2025-11-12  8:06 ` [PATCH 7/9] fsck: remove redundant date timestamp check Jeff King
2025-11-12  8:10 ` [PATCH 8/9] fsck: avoid parse_timestamp() on buffer that isn't NUL-terminated Jeff King
2025-11-12 11:25   ` Patrick Steinhardt
2025-11-12 19:36     ` Junio C Hamano
2025-11-15  2:12     ` Jeff King
2025-11-12  8:10 ` [PATCH 9/9] t: enable ASan's strict_string_checks option Jeff King
2025-11-13  3:17 ` [PATCH 0/9] asan bonanza Taylor Blau
2025-11-18  9:11 ` [PATCH v2 " Jeff King
2025-11-18  9:11   ` [PATCH v2 1/9] compat/mmap: mark unused argument in git_munmap() Jeff King
2025-11-18  9:12   ` [PATCH v2 2/9] pack-bitmap: handle name-hash lookups in incremental bitmaps Jeff King
2025-11-18  9:12   ` [PATCH v2 3/9] Makefile: turn on NO_MMAP when building with ASan Jeff King
2025-11-18  9:12   ` [PATCH v2 4/9] cache-tree: avoid strtol() on non-string buffer Jeff King
2025-11-18 14:30     ` Phillip Wood
2025-11-23  6:19       ` Junio C Hamano
2025-11-23 15:51         ` Phillip Wood
2025-11-23 18:06           ` Junio C Hamano
2025-11-24 22:30         ` Jeff King
2025-11-24 23:09           ` Junio C Hamano
2025-11-26 15:09             ` Jeff King
2025-11-26 17:22               ` Junio C Hamano
2025-11-30 13:13                 ` [PATCH 0/4] more robust functions for parsing int from buf Jeff King
2025-11-30 13:14                   ` [PATCH 1/4] parse: prefer bool to int for boolean returns Jeff King
2025-12-04 11:23                     ` Patrick Steinhardt
2025-11-30 13:15                   ` [PATCH 2/4] parse: add functions for parsing from non-string buffers Jeff King
2025-11-30 13:46                     ` my complaints with clar Jeff King
2025-12-01 14:16                       ` Phillip Wood
2025-12-04 11:09                         ` Patrick Steinhardt
2025-12-05 18:30                           ` Jeff King
2025-12-04 11:23                     ` [PATCH 2/4] parse: add functions for parsing from non-string buffers Patrick Steinhardt
2025-12-05 16:11                     ` Phillip Wood
2025-11-30 13:15                   ` [PATCH 3/4] cache-tree: use parse_int_from_buf() Jeff King
2025-11-30 13:16                   ` [PATCH 4/4] fsck: use parse_unsigned_from_buf() for parsing timestamp Jeff King
2025-11-18  9:12   ` [PATCH v2 5/9] fsck: assert newline presence in fsck_ident() Jeff King
2025-11-18  9:12   ` [PATCH v2 6/9] fsck: avoid strcspn() " Jeff King
2025-11-18  9:12   ` [PATCH v2 7/9] fsck: remove redundant date timestamp check Jeff King
2025-11-18  9:12   ` [PATCH v2 8/9] fsck: avoid parse_timestamp() on buffer that isn't NUL-terminated Jeff King
2025-11-18  9:12   ` [PATCH v2 9/9] t: enable ASan's strict_string_checks option Jeff King
2025-11-23  5:49   ` [PATCH v2 0/9] asan bonanza Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87qzu32ey0.fsf@gmail.com \
    --to=collin.funk1@gmail.com \
    --cc=cmlists@sent.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).