From: "Shawn O. Pearce" <spearce@spearce.org>
To: Felipe Contreras <felipe.contreras@gmail.com>
Cc: git list <git@vger.kernel.org>
Subject: Re: Memory issue with fast-import, why track branches?
Date: Sun, 21 Dec 2008 14:17:02 -0800 [thread overview]
Message-ID: <20081221221702.GC17355@spearce.org> (raw)
In-Reply-To: <94a0d4530812202154l26dfe0dfm49397c63dbfdfdf9@mail.gmail.com>
Felipe Contreras <felipe.contreras@gmail.com> wrote:
> I tracked down an issue I have when importing a big repository. For
> some reason memory usage keeps increasing until there is no more
> memory.
>
> After looking at the code my guess is that I have a humongous amount
> of branches.
>
> Actually they are not really branches, but refs. For each git commit
> there's an original mtn ref that I store in 'refs/mtn/sha1', but since
> I'm using 'commit refs/mtn/sha1' to store it, a branch is created for
> every commit.
>
> I guess there are many ways to fix the issue, but for starters I
> wonder why is fast-import keeping track of all the branches? In my
> case I would like fast-import to work exactly the same if I specify
> branches or not (I'll update them later).
Because fast-import has to buffer them until the pack file is done.
The objects aren't available to the repository until after a
checkpoint is sent or until the stream ends. Either way until
then fast-import has to buffer the refs so they don't get exposed
to other git processes reading that same repository, because they
would point to objects that the process cannot find.
I guess it could release the brnach memory after it dumps the
branches in a checkpoint, but its memory allocators work under an
assumption that strings (like branch and file names) will be reused
heavily by the frontend and thus they are poooled inside of a string
pool. The branch objects are also pooled inside of a common alloc
pool, to ammortize the cost of malloc's block headers out over the
data used.
IOW, fast-import was designed for ~5k branches, not ~1 million
unique branches.
--
Shawn.
next prev parent reply other threads:[~2008-12-21 22:18 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-21 5:54 Memory issue with fast-import, why track branches? Felipe Contreras
2008-12-21 8:10 ` John Chapman
2008-12-21 11:23 ` Felipe Contreras
2008-12-21 22:17 ` Shawn O. Pearce [this message]
2008-12-22 2:36 ` Felipe Contreras
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081221221702.GC17355@spearce.org \
--to=spearce@spearce.org \
--cc=felipe.contreras@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.