Re: Memory issue with fast-import, why track branches?

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Shawn O. Pearce" <spearce@spearce.org>
To: Felipe Contreras <felipe.contreras@gmail.com>
Cc: git list <git@vger.kernel.org>
Subject: Re: Memory issue with fast-import, why track branches?
Date: Sun, 21 Dec 2008 14:17:02 -0800	[thread overview]
Message-ID: <20081221221702.GC17355@spearce.org> (raw)
In-Reply-To: <94a0d4530812202154l26dfe0dfm49397c63dbfdfdf9@mail.gmail.com>

Felipe Contreras <felipe.contreras@gmail.com> wrote:
> I tracked down an issue I have when importing a big repository. For
> some reason memory usage keeps increasing until there is no more
> memory.
> 
> After looking at the code my guess is that I have a humongous amount
> of branches.
> 
> Actually they are not really branches, but refs. For each git commit
> there's an original mtn ref that I store in 'refs/mtn/sha1', but since
> I'm using 'commit refs/mtn/sha1' to store it, a branch is created for
> every commit.
> 
> I guess there are many ways to fix the issue, but for starters I
> wonder why is fast-import keeping track of all the branches? In my
> case I would like fast-import to work exactly the same if I specify
> branches or not (I'll update them later).

Because fast-import has to buffer them until the pack file is done.
The objects aren't available to the repository until after a
checkpoint is sent or until the stream ends.  Either way until
then fast-import has to buffer the refs so they don't get exposed
to other git processes reading that same repository, because they
would point to objects that the process cannot find.

I guess it could release the brnach memory after it dumps the
branches in a checkpoint, but its memory allocators work under an
assumption that strings (like branch and file names) will be reused
heavily by the frontend and thus they are poooled inside of a string
pool.  The branch objects are also pooled inside of a common alloc
pool, to ammortize the cost of malloc's block headers out over the
data used.

IOW, fast-import was designed for ~5k branches, not ~1 million
unique branches.

-- 
Shawn.

next prev parent reply	other threads:[~2008-12-21 22:18 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-21  5:54 Memory issue with fast-import, why track branches? Felipe Contreras
2008-12-21  8:10 ` John Chapman
2008-12-21 11:23   ` Felipe Contreras
2008-12-21 22:17 ` Shawn O. Pearce [this message]
2008-12-22  2:36   ` Felipe Contreras

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081221221702.GC17355@spearce.org \
    --to=spearce@spearce.org \
    --cc=felipe.contreras@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).