* "git fsck" fails on malloc of 80 G @ 2013-12-16 16:05 Dale R. Worley 2013-12-16 19:15 ` Jeff King 0 siblings, 1 reply; 6+ messages in thread From: Dale R. Worley @ 2013-12-16 16:05 UTC (permalink / raw) To: git I have a large repository (17 GiB of disk used), although no single file in the repository is over 1 GiB. (I have pack.packSizeLimit set to "1g".) I don't know how many files are in the repository, but it shouldn't exceed several tens of commits each containing several tens of thousands of files. Due to Git crashing while performing an operation, I want to verify that the repository is consistent. However, when I run "git fsck" it fails, apparently because it is trying to allocate 80 G of memory. (I can still do adds, commits, etc.) # git fsck Checking object directories: 100% (256/256), done. fatal: Out of memory, malloc failed (tried to allocate 80530636801 bytes) # I don't know if this is due to an outright bug or not. But it seems to me that "git fsck" should not need to allocate any more memory than the size (1 GiB) of a single pack file. And given its purpose, "git fsck" should be one of the *most* robust Git tools! Dale ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: "git fsck" fails on malloc of 80 G 2013-12-16 16:05 "git fsck" fails on malloc of 80 G Dale R. Worley @ 2013-12-16 19:15 ` Jeff King 2013-12-18 3:06 ` Dale R. Worley 2013-12-18 21:08 ` Dale R. Worley 0 siblings, 2 replies; 6+ messages in thread From: Jeff King @ 2013-12-16 19:15 UTC (permalink / raw) To: Dale R. Worley; +Cc: git On Mon, Dec 16, 2013 at 11:05:32AM -0500, Dale R. Worley wrote: > # git fsck > Checking object directories: 100% (256/256), done. > fatal: Out of memory, malloc failed (tried to allocate 80530636801 bytes) > # Can you give you give us a backtrace from the die() call? It would help to know what it was trying to allocate 80G for. > I don't know if this is due to an outright bug or not. But it seems > to me that "git fsck" should not need to allocate any more memory than > the size (1 GiB) of a single pack file. And given its purpose, "git > fsck" should be one of the *most* robust Git tools! Agreed. Fsck tends to be more robust, but there are still many code paths that can die(). One of the problems I ran into recently is that corrupt data can cause it to make a large allocation; we notice the bogus data as soon as we try to start filling the buffer, but sometimes the bogus allocation is large enough to kill the process. That was fixed by b039718, which is in master but not yet any released version. You might see whether that helps. -Peff ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: "git fsck" fails on malloc of 80 G 2013-12-16 19:15 ` Jeff King @ 2013-12-18 3:06 ` Dale R. Worley 2013-12-18 21:58 ` Jeff King 2013-12-18 21:08 ` Dale R. Worley 1 sibling, 1 reply; 6+ messages in thread From: Dale R. Worley @ 2013-12-18 3:06 UTC (permalink / raw) To: Jeff King; +Cc: git > From: Jeff King <peff@peff.net> > > On Mon, Dec 16, 2013 at 11:05:32AM -0500, Dale R. Worley wrote: > > > # git fsck > > Checking object directories: 100% (256/256), done. > > fatal: Out of memory, malloc failed (tried to allocate 80530636801 bytes) > > # > > Can you give you give us a backtrace from the die() call? It would help > to know what it was trying to allocate 80G for. Further information: # git --version git version 1.8.3.1 # Here's the basic backtrace information, and the values of the "size" variables, which seem to be the immediate culprits: # gdb GNU gdb (GDB) Fedora 7.6.1-46.fc19 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. (gdb) file /usr/bin/git Reading symbols from /usr/bin/git...Reading symbols from /usr/lib/debug/usr/bin/git.debug...done. done. (gdb) break wrapper.c:59 Breakpoint 1 at 0x4f35ef: file wrapper.c, line 59. (gdb) break die_child Breakpoint 2 at 0x4d0ca0: file run-command.c, line 211. (gdb) break die_async Breakpoint 3 at 0x4d1020: file run-command.c, line 604. (gdb) run fsck Starting program: /usr/bin/git fsck [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Checking object directories: 100% (256/256), done. Checking objects: 0% (0/526211) Breakpoint 1, xmalloc (size=size@entry=80530636801) at wrapper.c:59 59 die("Out of memory, malloc failed (tried to allocate %lu bytes)", (gdb) bt #0 xmalloc (size=size@entry=80530636801) at wrapper.c:59 #1 0x00000000004f3633 in xmallocz (size=size@entry=80530636800) at wrapper.c:73 #2 0x00000000004d922f in unpack_compressed_entry (p=p@entry=0x7e4020, w_curs=w_curs@entry=0x7fffffffc9f0, curpos=654214694, size=80530636800) at sha1_file.c:1797 #3 0x00000000004db4cb in unpack_entry (p=p@entry=0x7e4020, obj_offset=654214688, final_type=final_type@entry=0x7fffffffd088, final_size=final_size@entry=0x7fffffffd098) at sha1_file.c:2072 #4 0x00000000004b1e3f in verify_packfile (base_count=0, progress=0x9bdd80, fn=0x42fc00 <fsck_obj_buffer>, w_curs=0x7fffffffd090, p=0x7e4020) at pack-check.c:119 #5 verify_pack (p=p@entry=0x7e4020, fn=fn@entry=0x42fc00 <fsck_obj_buffer>, progress=0x9bdd80, base_count=base_count@entry=0) at pack-check.c:177 #6 0x0000000000430724 in cmd_fsck (argc=0, argv=0x7fffffffe400, prefix=<optimized out>) at builtin/fsck.c:678 #7 0x0000000000405cfd in run_builtin (argv=0x7fffffffe400, argc=1, p=0x75fa68 <commands.23748+840>) at git.c:284 #8 handle_internal_command (argc=1, argv=0x7fffffffe400) at git.c:446 #9 0x000000000040511f in run_argv (argv=0x7fffffffe2a0, argcp=0x7fffffffe2ac) at git.c:492 #10 main (argc=1, argv=0x7fffffffe400) at git.c:567 (gdb) frame 2 #2 0x00000000004d922f in unpack_compressed_entry (p=p@entry=0x7e4020, w_curs=w_curs@entry=0x7fffffffc9f0, curpos=654214694, size=80530636800) at sha1_file.c:1797 1797 buffer = xmallocz(size); (gdb) p size $29 = 80530636800 (gdb) p/x size $30 = 0x12c0000000 (gdb) frame 3 #3 0x00000000004db4cb in unpack_entry (p=p@entry=0x7e4020, obj_offset=654214688, final_type=final_type@entry=0x7fffffffd088, final_size=final_size@entry=0x7fffffffd098) at sha1_file.c:2072 2072 data = unpack_compressed_entry(p, &w_curs, curpos, size); (gdb) p size $31 = 80530636800 (gdb) p/x size $32 = 0x12c0000000 (gdb) I did a further test to see where the value of "size" came from: (gdb) break sha1_file.c:2023 Breakpoint 4 at 0x4db073: file sha1_file.c, line 2023. (gdb) cond 4 size == 0x12c0000000 (gdb) break sha1_file.c:2029 Breakpoint 5 at 0x4daee7: file sha1_file.c, line 2029. (gdb) cond 5 size == 0x12c0000000 (gdb) break sha1_file.c:2072 Breakpoint 6 at 0x4db4b4: file sha1_file.c, line 2072. (gdb) cond 6 size == 0x12c0000000 (gdb) break unpack_object_header_buffer Breakpoint 7 at 0x4d9ea0: file sha1_file.c, line 1399. (gdb) comm 7 Type commands for breakpoint(s) 7, one per line. End with a line saying just "end". >continue >end (gdb) run The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /usr/bin/git fsck [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Checking object directories: 100% (256/256), done. Breakpoint 7, unpack_object_header_buffer ( buf=0x7fffc4d3e00c "\265\334\352\277\023x\234", len=733530087, type=type@entry=0x7fffffffc984, sizep=sizep@entry=0x7fffffffca00) at sha1_file.c:1399 1399 { Checking objects: 0% (0/526211) Breakpoint 7, unpack_object_header_buffer ( buf=0x7fffebd26620 "\260\200\200\200\340\022x\234\354\301\001\001", len=79315411, type=type@entry=0x7fffffffc984, sizep=sizep@entry=0x7fffffffca00) at sha1_file.c:1399 1399 { Breakpoint 5, unpack_entry (p=p@entry=0x7e4020, obj_offset=654214688, final_type=final_type@entry=0x7fffffffd088, final_size=final_size@entry=0x7fffffffd098) at sha1_file.c:2029 2029 if (type != OBJ_OFS_DELTA && type != OBJ_REF_DELTA) (gdb) If I understand the code correctly, the object header buffer \260\200\200\200\340\022x\234\354\301\001\001 really does encode the size value 0x12c0000000. I will see if I can experiment with the new version you mention. Dale ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: "git fsck" fails on malloc of 80 G 2013-12-18 3:06 ` Dale R. Worley @ 2013-12-18 21:58 ` Jeff King 0 siblings, 0 replies; 6+ messages in thread From: Jeff King @ 2013-12-18 21:58 UTC (permalink / raw) To: Dale R. Worley; +Cc: git On Tue, Dec 17, 2013 at 10:06:20PM -0500, Dale R. Worley wrote: > Here's the basic backtrace information, and the values of the "size" > variables, which seem to be the immediate culprits: > [...] > #1 0x00000000004f3633 in xmallocz (size=size@entry=80530636800) > at wrapper.c:73 > #2 0x00000000004d922f in unpack_compressed_entry (p=p@entry=0x7e4020, > w_curs=w_curs@entry=0x7fffffffc9f0, curpos=654214694, size=80530636800) > at sha1_file.c:1797 > #3 0x00000000004db4cb in unpack_entry (p=p@entry=0x7e4020, > obj_offset=654214688, final_type=final_type@entry=0x7fffffffd088, > final_size=final_size@entry=0x7fffffffd098) at sha1_file.c:2072 > #4 0x00000000004b1e3f in verify_packfile (base_count=0, progress=0x9bdd80, > fn=0x42fc00 <fsck_obj_buffer>, w_curs=0x7fffffffd090, p=0x7e4020) > at pack-check.c:119 Thanks, that's helpful. Unfortunately the patch I mentioned before won't help you. The packfile format (like the experimental loose format that my patch dropped) stores the size outside of the zlib crc. So it has the same problem: we want to allocate the buffer up front to store the zlib results. The pack index does store a crc (calculated when we made or received the pack) over each object's on-disk representation. So we could check that, though doing it on every access has performance implications. The pack data itself also has a SHA-1 checksum over the whole thing. We should probably do a better job in verify-pack of: 1. Check the whole sha1 checksum before doing anything else. 2. In the uncommon case that it fails, check each individual object crc to find the broken object (and if none, assume either the header or the checksum itself is what got munged). In the meantime, you should be able to do step 1 manually like: # check first N-20 bytes of packfile against the checksum in the # final 20 bytes. NB: pretty sure this use of "head" is a GNU-ism, # and of course you need openssl for i in objects/pack/*.pack; do tail -c 20 "$i" >want.tmp && head -c -20 "$i" | openssl sha1 -binary >have.tmp && cmp want.tmp have.tmp || echo >&2 "broken: $i" done git-fsck should be doing this check itself, but I wonder if you are not making it that far. > If I understand the code correctly, the object header buffer > \260\200\200\200\340\022x\234\354\301\001\001 > really does encode the size value 0x12c0000000. If it does, and you do not have an 80G file, then it sounds like you may have a corrupt packfile. -Peff ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: "git fsck" fails on malloc of 80 G 2013-12-16 19:15 ` Jeff King 2013-12-18 3:06 ` Dale R. Worley @ 2013-12-18 21:08 ` Dale R. Worley 2013-12-18 22:09 ` Jeff King 1 sibling, 1 reply; 6+ messages in thread From: Dale R. Worley @ 2013-12-18 21:08 UTC (permalink / raw) To: Jeff King; +Cc: git > From: Jeff King <peff@peff.net> > One of the problems I ran into recently is that > corrupt data can cause it to make a large allocation One thing I notice is that in unpack_compressed_entry() in sha1_file.c, there is a mallocz of "size" bytes. It appears that "size" is the size of the object that is being unpacked. If so, this code cannot be correct, because it assumes that any file that is stored in the repository can be put into a buffer allocated in RAM. Dale ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: "git fsck" fails on malloc of 80 G 2013-12-18 21:08 ` Dale R. Worley @ 2013-12-18 22:09 ` Jeff King 0 siblings, 0 replies; 6+ messages in thread From: Jeff King @ 2013-12-18 22:09 UTC (permalink / raw) To: Dale R. Worley; +Cc: git On Wed, Dec 18, 2013 at 04:08:47PM -0500, Dale R. Worley wrote: > > From: Jeff King <peff@peff.net> > > > One of the problems I ran into recently is that > > corrupt data can cause it to make a large allocation > > One thing I notice is that in unpack_compressed_entry() in > sha1_file.c, there is a mallocz of "size" bytes. It appears that > "size" is the size of the object that is being unpacked. If so, this > code cannot be correct, because it assumes that any file that is > stored in the repository can be put into a buffer allocated in RAM. For some definition of correct. Git does load whole-blobs into memory in several places. Some code paths _can_ stream, but they do not stream deltas, and the diff engine definitely wants the whole thing in-core. So you are reading it right. If you want to work on changing it, be my guest, but it's a non-trivial fix. ;) -Peff ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-12-18 22:21 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-12-16 16:05 "git fsck" fails on malloc of 80 G Dale R. Worley 2013-12-16 19:15 ` Jeff King 2013-12-18 3:06 ` Dale R. Worley 2013-12-18 21:58 ` Jeff King 2013-12-18 21:08 ` Dale R. Worley 2013-12-18 22:09 ` Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).