All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Oberndorfer <kumbayo84@arcor.de>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: crash on git diff-tree -Ganything <tree> for new files with textconv filter
Date: Mon, 29 Oct 2012 21:19:48 +0100	[thread overview]
Message-ID: <508EE4E4.1080407@arcor.de> (raw)
In-Reply-To: <20121029060524.GB4457@sigill.intra.peff.net>

On 2012-10-29 07:05, Jeff King wrote:
> On Sun, Oct 28, 2012 at 08:56:39PM +0100, Peter Oberndorfer wrote:
>
>>> The patch below should fix it. I added tests, but please try your
>>> real-world test case on it to double-check.
>> I tested your patch, but now it crashes for another reason :-)
> Well, that's progress, right? :)
Sure :-)
>
>> i have a file with exactly 12288(0x3000) bytes in the repository.
>> When the file is loaded, the data is placed luckily so the data end
>> falls at a page boundary.
>> Later diff_grep() calls regexec() which calls strlen() on the loaded buffer
>> and ends up reading beyond the actual data into the next page
>> which is not allocated and causes a pagefault.
>> Or it could possibly (randomly) match the regex on data that is not
>> actually part of a file...
> Yuck. For the most part, we treat blob content (and generally most
> object content) as a sized buffer. However, there are some spots which,
> either through laziness or because a code interface expects a string, we
> pass the value as a string. This works because the object-reading code
> puts an extra NUL at the end of our buffer to handle just such an
> instance. So we might prematurely end if the object contains embedded
> NULs, but we would never read past the end.
>
> The code to read the output of a textconv filter does not do this
> explicitly. I would think it would get it for free by virtue of reading
> into a strbuf, though. I'll try to investigate.
I could reproduce with my 0x3000 bytes file on linux. The buffer is not
read with a trailing null byte it is mapped by mmap in
diff_populate_filespec...
So i think we will not get away with expecting a trailing null :-/

For me the key to reproduce the problem was to have 2 commits.
Adding the file in the root commit it did not work. [1]

Greetings Peter
> -Peff
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

[1]
kumbayo@home:~/src$ mkdir git_mmap_crash2
kumbayo@home:~/src$ cd git_mmap_crash2
kumbayo@home:~/src/git_mmap_crash2$ git init
kumbayo@home:~/src/git_mmap_crash2$ echo blah>blah
kumbayo@home:~/src/git_mmap_crash2$ git add blah
kumbayo@home:~/src/git_mmap_crash2$ git commit -m blah
[master (Basis-Version) 3458422] blah
diff_populate_filespec -> xmmap for blah size:0x5 returned: 0xb7206000
 1 file changed, 1 insertion(+)
 create mode 100644 blah
kumbayo@home:~/src/git_mmap_crash2$ perl -e 'print "-" x 0x3000 '> asdf.txt
kumbayo@home:~/src/git_mmap_crash2$ git add asdf.txt
kumbayo@home:~/src/git_mmap_crash2$ git commit -m crashy
[master 5cf2c5f] crashy
diff_populate_filespec -> xmmap for asdf.txt size:0x3000 returned:
0xb771e000
 1 file changed, 1 insertion(+)
 create mode 100644 asdf.txt

kumbayo@soybean:~/src/git_mmap_crash2$ valgrind git diff-tree -Ganything HEAD
==8388== Memcheck, a memory error detector
==8388== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==8388== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==8388== Command: git diff-tree -Ganything HEAD
==8388==
==8388== Conditional jump or move depends on uninitialised value(s)
==8388==    at 0x405ADD8: inflateReset2 (in /lib/i386-linux-gnu/libz.so.1.2.3.4)
==8388==    by 0xA0: ???
==8388==
==8388== Conditional jump or move depends on uninitialised value(s)
==8388==    at 0x405ADD8: inflateReset2 (in /lib/i386-linux-gnu/libz.so.1.2.3.4)
==8388==    by 0x7F: ???
==8388==


==8388== Conditional jump or move depends on uninitialised value(s)


==8388==    at 0x405ADD8: inflateReset2 (in /lib/i386-linux-gnu/libz.so.1.2.3.4)


==8388==    by 0x30: ???


==8388==


==8388== Conditional jump or move depends on uninitialised value(s)


==8388==    at 0x405ADD8: inflateReset2 (in /lib/i386-linux-gnu/libz.so.1.2.3.4)


==8388==    by 0x50: ???


==8388==


diffcore_pickaxe_grep


diff_populate_filespec -> xmmap for asdf.txt size:0x3000 returned: 0x4035000


==8388== Invalid read of size 1


==8388==    at 0x402C683: __GI_strlen (in
/usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)


==8388==    by 0x430581F: regexec@@GLIBC_2.3.4 (regexec.c:245)


==8388==    by 0x814489D: diff_grep (diffcore-pickaxe.c:110)
==8388==    by 0x8144B89: pickaxe.constprop.6 (diffcore-pickaxe.c:40)
==8388==    by 0x8144DCD: diffcore_pickaxe_grep (diffcore-pickaxe.c:155)
==8388==    by 0x80DCE64: diffcore_std (diff.c:4638)
==8388==    by 0x80F0B20: log_tree_diff_flush (log-tree.c:696)
==8388==  Address 0x4038000 is not stack'd, malloc'd or (recently) free'd
==8388==
==8388==
==8388== Process terminating with default action of signal 11 (SIGSEGV)
==8388==  Access not within mapped region at address 0x4038000
==8388==    at 0x402C683: __GI_strlen (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==8388==    by 0x430581F: regexec@@GLIBC_2.3.4 (regexec.c:245)
==8388==    by 0x814489D: diff_grep (diffcore-pickaxe.c:110)
==8388==    by 0x8144B89: pickaxe.constprop.6 (diffcore-pickaxe.c:40)
==8388==    by 0x8144DCD: diffcore_pickaxe_grep (diffcore-pickaxe.c:155)
==8388==    by 0x80DCE64: diffcore_std (diff.c:4638)
==8388==    by 0x80F0B20: log_tree_diff_flush (log-tree.c:696)
==8388==  If you believe this happened as a result of a stack
==8388==  overflow in your program's main thread (unlikely but
==8388==  possible), you can try to increase the size of the
==8388==  main thread stack using the --main-stacksize= flag.
==8388==  The main thread stack size used in this run was 8388608.
==8388==
==8388== HEAP SUMMARY:
==8388==     in use at exit: 86,229 bytes in 69 blocks
==8388==   total heap usage: 193 allocs, 124 frees, 259,991 bytes allocated
==8388==
==8388== LEAK SUMMARY:
==8388==    definitely lost: 65 bytes in 1 blocks
==8388==    indirectly lost: 0 bytes in 0 blocks
==8388==      possibly lost: 0 bytes in 0 blocks
==8388==    still reachable: 86,164 bytes in 68 blocks
==8388==         suppressed: 0 bytes in 0 blocks
==8388== Rerun with --leak-check=full to see details of leaked memory
==8388==
==8388== For counts of detected and suppressed errors, rerun with: -v
==8388== Use --track-origins=yes to see where uninitialised values come from
==8388== ERROR SUMMARY: 7 errors from 5 contexts (suppressed: 0 from 0)

  parent reply	other threads:[~2012-10-29 20:20 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-27 18:37 crash on git diff-tree -Ganything <tree> for new files with textconv filter Peter Oberndorfer
2012-10-28 12:01 ` Jeff King
2012-10-28 12:45   ` [PATCH 0/2] textconv support for "log -S" Jeff King
2012-10-28 12:46     ` [PATCH 1/2] pickaxe: hoist empty needle check Jeff King
2012-10-28 12:47     ` [PATCH 2/2] pickaxe: use textconv for -S counting Jeff King
2012-11-13 23:13       ` Junio C Hamano
2012-11-15  1:21         ` Jeff King
2012-11-20  0:31           ` Junio C Hamano
2012-11-20  0:48             ` Junio C Hamano
2012-11-21 20:27               ` Jeff King
2012-10-28 19:56   ` crash on git diff-tree -Ganything <tree> for new files with textconv filter Peter Oberndorfer
2012-10-29  6:05     ` Jeff King
2012-10-29  6:18       ` Jeff King
2012-10-29 20:19       ` Peter Oberndorfer [this message]
2012-10-29 22:35         ` Jeff King
2012-10-29 22:47           ` Jeff King
2012-10-30 12:17             ` Jeff King
2012-10-30 12:46               ` Junio C Hamano
2012-10-30 13:12                 ` Jeff King
2012-11-01 19:19               ` Ramsay Jones
2012-11-07 21:10           ` Peter Oberndorfer
2012-11-07 21:13             ` Jeff King
2013-06-03 17:25               ` Peter Oberndorfer
2013-06-03 22:17                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=508EE4E4.1080407@arcor.de \
    --to=kumbayo84@arcor.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.