"git fsck" fails on malloc of 80 G

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* "git fsck" fails on malloc of 80 G
@ 2013-12-16 16:05 Dale R. Worley
  2013-12-16 19:15 ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Dale R. Worley @ 2013-12-16 16:05 UTC (permalink / raw)
  To: git

I have a large repository (17 GiB of disk used), although no single
file in the repository is over 1 GiB.  (I have pack.packSizeLimit set
to "1g".)  I don't know how many files are in the repository, but it
shouldn't exceed several tens of commits each containing several tens
of thousands of files.

Due to Git crashing while performing an operation, I want to verify
that the repository is consistent.  However, when I run "git fsck" it
fails, apparently because it is trying to allocate 80 G of memory.  (I
can still do adds, commits, etc.)

# git fsck
Checking object directories: 100% (256/256), done.
fatal: Out of memory, malloc failed (tried to allocate 80530636801 bytes)
#

I don't know if this is due to an outright bug or not.  But it seems
to me that "git fsck" should not need to allocate any more memory than
the size (1 GiB) of a single pack file.  And given its purpose, "git
fsck" should be one of the *most* robust Git tools!

Dale

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: "git fsck" fails on malloc of 80 G
  2013-12-16 16:05 "git fsck" fails on malloc of 80 G Dale R. Worley
@ 2013-12-16 19:15 ` Jeff King
  2013-12-18  3:06   ` Dale R. Worley
  2013-12-18 21:08   ` Dale R. Worley
  0 siblings, 2 replies; 6+ messages in thread
From: Jeff King @ 2013-12-16 19:15 UTC (permalink / raw)
  To: Dale R. Worley; +Cc: git

On Mon, Dec 16, 2013 at 11:05:32AM -0500, Dale R. Worley wrote:

> # git fsck
> Checking object directories: 100% (256/256), done.
> fatal: Out of memory, malloc failed (tried to allocate 80530636801 bytes)
> #

Can you give you give us a backtrace from the die() call? It would help
to know what it was trying to allocate 80G for.

> I don't know if this is due to an outright bug or not.  But it seems
> to me that "git fsck" should not need to allocate any more memory than
> the size (1 GiB) of a single pack file.  And given its purpose, "git
> fsck" should be one of the *most* robust Git tools!

Agreed. Fsck tends to be more robust, but there are still many code
paths that can die(). One of the problems I ran into recently is that
corrupt data can cause it to make a large allocation; we notice the
bogus data as soon as we try to start filling the buffer, but sometimes
the bogus allocation is large enough to kill the process.

That was fixed by b039718, which is in master but not yet any released
version. You might see whether that helps.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: "git fsck" fails on malloc of 80 G
  2013-12-16 19:15 ` Jeff King
@ 2013-12-18  3:06   ` Dale R. Worley
  2013-12-18 21:58     ` Jeff King
  2013-12-18 21:08   ` Dale R. Worley
  1 sibling, 1 reply; 6+ messages in thread
From: Dale R. Worley @ 2013-12-18  3:06 UTC (permalink / raw)
  To: Jeff King; +Cc: git

> From: Jeff King <peff@peff.net>
> 
> On Mon, Dec 16, 2013 at 11:05:32AM -0500, Dale R. Worley wrote:
> 
> > # git fsck
> > Checking object directories: 100% (256/256), done.
> > fatal: Out of memory, malloc failed (tried to allocate 80530636801 bytes)
> > #
> 
> Can you give you give us a backtrace from the die() call? It would help
> to know what it was trying to allocate 80G for.

Further information:

    # git --version
    git version 1.8.3.1
    #

Here's the basic backtrace information, and the values of the "size"
variables, which seem to be the immediate culprits:

    # gdb
    GNU gdb (GDB) Fedora 7.6.1-46.fc19
    Copyright (C) 2013 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-redhat-linux-gnu".
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/gdb/bugs/>.
    (gdb) file /usr/bin/git
    Reading symbols from /usr/bin/git...Reading symbols from /usr/lib/debug/usr/bin/git.debug...done.
    done.
    (gdb) break wrapper.c:59
    Breakpoint 1 at 0x4f35ef: file wrapper.c, line 59.
    (gdb) break die_child
    Breakpoint 2 at 0x4d0ca0: file run-command.c, line 211.
    (gdb) break die_async
    Breakpoint 3 at 0x4d1020: file run-command.c, line 604.
    (gdb) run fsck
    Starting program: /usr/bin/git fsck
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib64/libthread_db.so.1".
    Checking object directories: 100% (256/256), done.
    Checking objects:   0% (0/526211)   
    Breakpoint 1, xmalloc (size=size@entry=80530636801) at wrapper.c:59
    59				die("Out of memory, malloc failed (tried to allocate %lu bytes)",
    (gdb) bt
    #0  xmalloc (size=size@entry=80530636801) at wrapper.c:59
    #1  0x00000000004f3633 in xmallocz (size=size@entry=80530636800)
	at wrapper.c:73
    #2  0x00000000004d922f in unpack_compressed_entry (p=p@entry=0x7e4020, 
	w_curs=w_curs@entry=0x7fffffffc9f0, curpos=654214694, size=80530636800)
	at sha1_file.c:1797
    #3  0x00000000004db4cb in unpack_entry (p=p@entry=0x7e4020, 
	obj_offset=654214688, final_type=final_type@entry=0x7fffffffd088, 
	final_size=final_size@entry=0x7fffffffd098) at sha1_file.c:2072
    #4  0x00000000004b1e3f in verify_packfile (base_count=0, progress=0x9bdd80, 
	fn=0x42fc00 <fsck_obj_buffer>, w_curs=0x7fffffffd090, p=0x7e4020)
	at pack-check.c:119
    #5  verify_pack (p=p@entry=0x7e4020, fn=fn@entry=0x42fc00 <fsck_obj_buffer>, 
	progress=0x9bdd80, base_count=base_count@entry=0) at pack-check.c:177
    #6  0x0000000000430724 in cmd_fsck (argc=0, argv=0x7fffffffe400, 
	prefix=<optimized out>) at builtin/fsck.c:678
    #7  0x0000000000405cfd in run_builtin (argv=0x7fffffffe400, argc=1, 
	p=0x75fa68 <commands.23748+840>) at git.c:284
    #8  handle_internal_command (argc=1, argv=0x7fffffffe400) at git.c:446
    #9  0x000000000040511f in run_argv (argv=0x7fffffffe2a0, argcp=0x7fffffffe2ac)
	at git.c:492
    #10 main (argc=1, argv=0x7fffffffe400) at git.c:567
    (gdb) frame 2
    #2  0x00000000004d922f in unpack_compressed_entry (p=p@entry=0x7e4020, 
	w_curs=w_curs@entry=0x7fffffffc9f0, curpos=654214694, size=80530636800)
	at sha1_file.c:1797
    1797		buffer = xmallocz(size);
    (gdb) p size
    $29 = 80530636800
    (gdb) p/x size
    $30 = 0x12c0000000
    (gdb) frame 3
    #3  0x00000000004db4cb in unpack_entry (p=p@entry=0x7e4020, 
	obj_offset=654214688, final_type=final_type@entry=0x7fffffffd088, 
	final_size=final_size@entry=0x7fffffffd098) at sha1_file.c:2072
    2072				data = unpack_compressed_entry(p, &w_curs, curpos, size);
    (gdb) p size
    $31 = 80530636800
    (gdb) p/x size
    $32 = 0x12c0000000
    (gdb) 

I did a further test to see where the value of "size" came from:

    (gdb) break sha1_file.c:2023
    Breakpoint 4 at 0x4db073: file sha1_file.c, line 2023.
    (gdb) cond 4 size == 0x12c0000000
    (gdb) break sha1_file.c:2029
    Breakpoint 5 at 0x4daee7: file sha1_file.c, line 2029.
    (gdb) cond 5 size == 0x12c0000000
    (gdb) break sha1_file.c:2072
    Breakpoint 6 at 0x4db4b4: file sha1_file.c, line 2072.
    (gdb) cond 6 size == 0x12c0000000
    (gdb) break unpack_object_header_buffer
    Breakpoint 7 at 0x4d9ea0: file sha1_file.c, line 1399.
    (gdb) comm 7
    Type commands for breakpoint(s) 7, one per line.
    End with a line saying just "end".
    >continue
    >end
    (gdb) run
    The program being debugged has been started already.
    Start it from the beginning? (y or n) y
    Starting program: /usr/bin/git fsck
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib64/libthread_db.so.1".
    Checking object directories: 100% (256/256), done.

    Breakpoint 7, unpack_object_header_buffer (
	buf=0x7fffc4d3e00c "\265\334\352\277\023x\234", len=733530087, 
	type=type@entry=0x7fffffffc984, sizep=sizep@entry=0x7fffffffca00)
	at sha1_file.c:1399
    1399	{
    Checking objects:   0% (0/526211)   
    Breakpoint 7, unpack_object_header_buffer (
	buf=0x7fffebd26620 "\260\200\200\200\340\022x\234\354\301\001\001", 
	len=79315411, type=type@entry=0x7fffffffc984, 
	sizep=sizep@entry=0x7fffffffca00) at sha1_file.c:1399
    1399	{

    Breakpoint 5, unpack_entry (p=p@entry=0x7e4020, obj_offset=654214688, 
	final_type=final_type@entry=0x7fffffffd088, 
	final_size=final_size@entry=0x7fffffffd098) at sha1_file.c:2029
    2029			if (type != OBJ_OFS_DELTA && type != OBJ_REF_DELTA)
    (gdb) 

If I understand the code correctly, the object header buffer
\260\200\200\200\340\022x\234\354\301\001\001
really does encode the size value 0x12c0000000.

I will see if I can experiment with the new version you mention.

Dale

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: "git fsck" fails on malloc of 80 G
  2013-12-16 19:15 ` Jeff King
  2013-12-18  3:06   ` Dale R. Worley
@ 2013-12-18 21:08   ` Dale R. Worley
  2013-12-18 22:09     ` Jeff King
  1 sibling, 1 reply; 6+ messages in thread
From: Dale R. Worley @ 2013-12-18 21:08 UTC (permalink / raw)
  To: Jeff King; +Cc: git

> From: Jeff King <peff@peff.net>

> One of the problems I ran into recently is that
> corrupt data can cause it to make a large allocation

One thing I notice is that in unpack_compressed_entry() in
sha1_file.c, there is a mallocz of "size" bytes.  It appears that
"size" is the size of the object that is being unpacked.  If so, this
code cannot be correct, because it assumes that any file that is
stored in the repository can be put into a buffer allocated in RAM.

Dale

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: "git fsck" fails on malloc of 80 G
  2013-12-18  3:06   ` Dale R. Worley
@ 2013-12-18 21:58     ` Jeff King
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2013-12-18 21:58 UTC (permalink / raw)
  To: Dale R. Worley; +Cc: git

On Tue, Dec 17, 2013 at 10:06:20PM -0500, Dale R. Worley wrote:

> Here's the basic backtrace information, and the values of the "size"
> variables, which seem to be the immediate culprits:
> [...]
>     #1  0x00000000004f3633 in xmallocz (size=size@entry=80530636800)
> 	at wrapper.c:73
>     #2  0x00000000004d922f in unpack_compressed_entry (p=p@entry=0x7e4020, 
> 	w_curs=w_curs@entry=0x7fffffffc9f0, curpos=654214694, size=80530636800)
> 	at sha1_file.c:1797
>     #3  0x00000000004db4cb in unpack_entry (p=p@entry=0x7e4020, 
> 	obj_offset=654214688, final_type=final_type@entry=0x7fffffffd088, 
> 	final_size=final_size@entry=0x7fffffffd098) at sha1_file.c:2072
>     #4  0x00000000004b1e3f in verify_packfile (base_count=0, progress=0x9bdd80, 
> 	fn=0x42fc00 <fsck_obj_buffer>, w_curs=0x7fffffffd090, p=0x7e4020)
> 	at pack-check.c:119

Thanks, that's helpful. Unfortunately the patch I mentioned before won't
help you. The packfile format (like the experimental loose format that my patch
dropped) stores the size outside of the zlib crc. So it has the same
problem: we want to allocate the buffer up front to store the zlib
results.

The pack index does store a crc (calculated when we made or received
the pack) over each object's on-disk representation. So we could check
that, though doing it on every access has performance implications.

The pack data itself also has a SHA-1 checksum over the whole thing. We
should probably do a better job in verify-pack of:

  1. Check the whole sha1 checksum before doing anything else.

  2. In the uncommon case that it fails, check each individual object
     crc to find the broken object (and if none, assume either the
     header or the checksum itself is what got munged).

In the meantime, you should be able to do step 1 manually like:

  # check first N-20 bytes of packfile against the checksum in the
  # final 20 bytes. NB: pretty sure this use of "head" is a GNU-ism,
  # and of course you need openssl
  for i in objects/pack/*.pack; do
    tail -c 20 "$i" >want.tmp &&
    head -c -20 "$i" | openssl sha1 -binary >have.tmp &&
    cmp want.tmp have.tmp ||
    echo >&2 "broken: $i"
  done

git-fsck should be doing this check itself, but I wonder if you are not
making it that far.

> If I understand the code correctly, the object header buffer
> \260\200\200\200\340\022x\234\354\301\001\001
> really does encode the size value 0x12c0000000.

If it does, and you do not have an 80G file, then it sounds like you may
have a corrupt packfile.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: "git fsck" fails on malloc of 80 G
  2013-12-18 21:08   ` Dale R. Worley
@ 2013-12-18 22:09     ` Jeff King
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2013-12-18 22:09 UTC (permalink / raw)
  To: Dale R. Worley; +Cc: git

On Wed, Dec 18, 2013 at 04:08:47PM -0500, Dale R. Worley wrote:

> > From: Jeff King <peff@peff.net>
> 
> > One of the problems I ran into recently is that
> > corrupt data can cause it to make a large allocation
> 
> One thing I notice is that in unpack_compressed_entry() in
> sha1_file.c, there is a mallocz of "size" bytes.  It appears that
> "size" is the size of the object that is being unpacked.  If so, this
> code cannot be correct, because it assumes that any file that is
> stored in the repository can be put into a buffer allocated in RAM.

For some definition of correct. Git does load whole-blobs into memory in
several places. Some code paths _can_ stream, but they do not stream
deltas, and the diff engine definitely wants the whole thing in-core.

So you are reading it right. If you want to work on changing it, be my
guest, but it's a non-trivial fix. ;)

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-12-18 22:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-16 16:05 "git fsck" fails on malloc of 80 G Dale R. Worley
2013-12-16 19:15 ` Jeff King
2013-12-18  3:06   ` Dale R. Worley
2013-12-18 21:58     ` Jeff King
2013-12-18 21:08   ` Dale R. Worley
2013-12-18 22:09     ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).