git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Corey Thompson <cmtptr@gmail.com>
To: Pete Wyckoff <pw@padd.com>
Cc: Luke Diamand <luke@diamand.org>, git@vger.kernel.org
Subject: Re: git-p4 out of memory for very large repository
Date: Mon, 26 Aug 2013 09:47:56 -0400	[thread overview]
Message-ID: <20130826134756.GA1335@jerec> (raw)
In-Reply-To: <20130825155001.GA875@padd.com>

On Sun, Aug 25, 2013 at 11:50:01AM -0400, Pete Wyckoff wrote:
> Modern git, including your version, do "streaming" reads from p4,
> so the git-p4 python process never even holds a whole file's
> worth of data.  You're seeing git-fast-import die, it seems.  It
> will hold onto the entire file contents.  But just one, not the
> entire repo.  How big is the single largest file?
> 
> You can import in pieces.  See the change numbers like this:
> 
>     p4 changes -m 1000 //depot/big/...
>     p4 changes -m 1000 //depot/big/...@<some-old-change>
> 
> Import something far enough back in history so that it seems
> to work:
> 
>     git p4 clone --destination=big //depot/big@60602
>     cd big
> 
> Sync up a bit at a time:
> 
>     git p4 sync @60700
>     git p4 sync @60800
>     ...
> 
> I don't expect this to get around the problem you describe,
> however.  Sounds like there is one gigantic file that is causing
> git-fast-import to fill all of memory.  You will at least isolate
> the change.
> 
> There are options to git-fast-import to limit max pack size
> and to cause it to skip importing files that are too big, if
> that would help.
> 
> You can also use a client spec to hide the offending files
> from git.
> 
> Can you watch with "top"?  Hit "M" to sort by memory usage, and
> see how big the processes get before falling over.
> 
> 		-- Pete

You are correct that git-fast-import is killed by the OOM killer, but I
was unclear about which process was malloc()ing so much memory that the
OOM killer got invoked (as other completely unrelated processes usually
also get killed when this happens).

Unless there's one gigantic file in one change that gets removed by
another change, I don't think that's the problem; as I mentioned in
another email, the machine has 32GB physical memory and the largest
single file in the current head is only 118MB.  Even if there is a very
large transient file somewhere in the history, I seriously doubt it's
tens of gigabytes in size.

I have tried watching it with top before, but it takes several hours
before it dies.  I haven't been able to see any explosion of memory
usage, even within the final hour, but I've never caught it just before
it dies, either.  I suspect that whatever the issue is here, it happens
very quickly.

If I'm unable to get through this today using the incremental p4 sync
method you described, I'll try running a full-blown clone overnight with
top in batch mode writing to a log file to see whether it catches
anything.

Thanks again,
Corey

  reply	other threads:[~2013-08-26 13:56 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-23  1:12 git-p4 out of memory for very large repository Corey Thompson
2013-08-23  7:16 ` Luke Diamand
2013-08-23 11:48   ` Corey Thompson
2013-08-23 11:59     ` Corey Thompson
2013-08-23 19:42       ` Luke Diamand
2013-08-24  0:56         ` Corey Thompson
2013-08-25 15:50     ` Pete Wyckoff
2013-08-26 13:47       ` Corey Thompson [this message]
2013-08-28 15:41         ` Corey Thompson
2013-08-29 22:46           ` Pete Wyckoff
2013-09-02 19:42             ` Luke Diamand
2013-09-06 19:03               ` Corey Thompson
2013-09-07  8:19                 ` Pete Wyckoff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130826134756.GA1335@jerec \
    --to=cmtptr@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=luke@diamand.org \
    --cc=pw@padd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).