git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Uri Moszkowicz <uri@4refs.com>
Cc: git@vger.kernel.org
Subject: Re: error: git-fast-import died of signal 11
Date: Tue, 16 Oct 2012 09:18:43 +0200	[thread overview]
Message-ID: <507D0A53.1030707@alum.mit.edu> (raw)
In-Reply-To: <CAMJd5AQ_vsQBGnMRrZUUqztjYjaHkU0_FOteNpEvE8NTrPPvQQ@mail.gmail.com>

On 10/15/2012 05:53 PM, Uri Moszkowicz wrote:
> I'm trying to convert a CVS repository to Git using cvs2git. I was able to
> generate the dump file without problem but am unable to get Git to
> fast-import it. The dump file is 328GB and I ran git fast-import on a
> machine with 512GB of RAM.
> 
> fatal: Out of memory? mmap failed: Cannot allocate memory
> fast-import: dumping crash report to fast_import_crash_18192
> error: git-fast-import died of signal 11
> 
> How can I import the repository?

What versions of git and of cvs2git are you using?  If not the current
versions, please try with the current versions.

What is the nature of your repository (i.e., why is it so big)?  Does it
consist of extremely large files?  A very deep history?  Extremely many
branches/tags?  Extremely many files?

Did you check whether the RAM usage of git-fast-import process was
growing gradually to fill RAM while it was running vs. whether the usage
seemed reasonable until it suddenly crashed?

There are a few obvious possibilities:

0. There is some reason that too little of your computer's RAM is
available to git-fast-import (e.g., ulimit, other processes running at
the same time, much RAM being used as a ramdisk, etc).

1. Your import is simply too big for git-fast-import to hold in memory
the accumulated things that it has to remember.  I'm not familiar with
the internals of git-fast-import, but I believe that the main thing that
it has to keep in RAM is the list of "marks" (references to git objects
that can be referred to later in the import).  From your crash file, it
looks like there were about 350k marks loaded at the time of the crash.
 Supposing each mark is about 100 bytes, this would only amount to 35
Mb, which should not be a problem (*if* my assumptions are correct).

2. Your import contains a gigantic object which individually is so big
that it overflows some component of the import.  (I don't know whether
large objects are handled streamily; they might be read into memory at
some point.)  But since your computer had so much RAM this is hardly
imaginable.

3. git-fast-import has a memory leak and the accumulated memory leakage
is exhausting your RAM.

4. git-fast-import has some other kind of a bug.

5. The contents of the dumpfile are corrupt in a way that is triggering
the problem.  This could either be invalid input (e.g., an object that
is reported to be quaggabytes large), or some invalid input that
triggers a bug in git-fast-import.

If (1), then you either need a bigger machine or git-fast-import needs
architectural changes.

If (2), then you either need a bigger machine or git-fast-import and/or
git needs architectural changes.

If (3), then it would be good to get more information about the problem
so that the leak can be fixed.  If this is the case, it might be
possible to work around the problem by splitting the dumpfile into
several parts and loading them one after the other (outputting the marks
from one run and loading them into the next).

If (4) or (5), then it would be helpful to narrow down the problem.  It
might be possible to do so by following the instructions in the cvs2svn
FAQ [1] for systematically shrinking a test case to smaller size using
destroy_repository.py and shrink_test_case.py.  If you can create a
small repository that triggers the same problem, then there is a good
chance that it is easy to fix.

Michael
(the cvs2git maintainer)

[1] http://cvs2svn.tigris.org/faq.html#testcase

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

  parent reply	other threads:[~2012-10-16  7:18 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAMJd5ATv5XfTK++4=Rs+RUkgb7F-ssrz2Lrch_WxvxZt+yF33A@mail.gmail.com>
2012-10-15 15:53 ` error: git-fast-import died of signal 11 Uri Moszkowicz
2012-10-15 21:12   ` Andrew Wong
2012-10-15 21:28     ` Uri Moszkowicz
2012-10-15 23:00       ` Andrew Wong
     [not found]         ` <CAMJd5AT51oSGer2JAhCPGnjWqCR-M2b1_4ULF7LeTob8xLcjVw@mail.gmail.com>
     [not found]           ` <CADgNjakqUL+66t7=Fkd69GPYOq54Z49RQchBLSSVGRv+4=5eGQ@mail.gmail.com>
     [not found]             ` <CAMJd5AR2gsyymKhT_hK9=4bHbcVnn+qEaDSxrZeJL1dfbmDxTw@mail.gmail.com>
2012-10-16 20:12               ` Andrew Wong
2012-10-16  7:18   ` Michael Haggerty [this message]
2012-10-16 19:27     ` Uri Moszkowicz
2012-10-17 20:06     ` Uri Moszkowicz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=507D0A53.1030707@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=git@vger.kernel.org \
    --cc=uri@4refs.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).