From: "Felipe Contreras" <felipe.contreras@gmail.com>
To: "Shawn O. Pearce" <spearce@spearce.org>
Cc: "git list" <git@vger.kernel.org>
Subject: Re: Memory issue with fast-import, why track branches?
Date: Mon, 22 Dec 2008 04:36:11 +0200 [thread overview]
Message-ID: <94a0d4530812211836w16933c7dv52ffa099ca18f731@mail.gmail.com> (raw)
In-Reply-To: <20081221221702.GC17355@spearce.org>
On Mon, Dec 22, 2008 at 12:17 AM, Shawn O. Pearce <spearce@spearce.org> wrote:
> Felipe Contreras <felipe.contreras@gmail.com> wrote:
>> I tracked down an issue I have when importing a big repository. For
>> some reason memory usage keeps increasing until there is no more
>> memory.
>>
>> After looking at the code my guess is that I have a humongous amount
>> of branches.
>>
>> Actually they are not really branches, but refs. For each git commit
>> there's an original mtn ref that I store in 'refs/mtn/sha1', but since
>> I'm using 'commit refs/mtn/sha1' to store it, a branch is created for
>> every commit.
>>
>> I guess there are many ways to fix the issue, but for starters I
>> wonder why is fast-import keeping track of all the branches? In my
>> case I would like fast-import to work exactly the same if I specify
>> branches or not (I'll update them later).
>
> Because fast-import has to buffer them until the pack file is done.
> The objects aren't available to the repository until after a
> checkpoint is sent or until the stream ends. Either way until
> then fast-import has to buffer the refs so they don't get exposed
> to other git processes reading that same repository, because they
> would point to objects that the process cannot find.
>
> I guess it could release the brnach memory after it dumps the
> branches in a checkpoint, but its memory allocators work under an
> assumption that strings (like branch and file names) will be reused
> heavily by the frontend and thus they are poooled inside of a string
> pool. The branch objects are also pooled inside of a common alloc
> pool, to ammortize the cost of malloc's block headers out over the
> data used.
>
> IOW, fast-import was designed for ~5k branches, not ~1 million
> unique branches.
My point is: why is it not designed for 0 branches? In many places in
the code there's the assumption that the tree = branch, but that's not
always the case. You can specify a 'from sha1' and then the branch
becomes irrelevant.
In fact in monotone some commits are not part of any branch, and many
are part of multiple branches. Those cases can't be handled by
fast-import right now. Not to mention random refs like 'ref/mtn/foo'
which would come in handy for my script.
Now my question is: would it be possible to get rid of the notion of
branches on fast-import and go for refs instead?
On the other hand if branch memory is freed after a checkpoint then
there's no limit to how many 'branches' can be handled.
--
Felipe Contreras
prev parent reply other threads:[~2008-12-22 2:37 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-21 5:54 Memory issue with fast-import, why track branches? Felipe Contreras
2008-12-21 8:10 ` John Chapman
2008-12-21 11:23 ` Felipe Contreras
2008-12-21 22:17 ` Shawn O. Pearce
2008-12-22 2:36 ` Felipe Contreras [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=94a0d4530812211836w16933c7dv52ffa099ca18f731@mail.gmail.com \
--to=felipe.contreras@gmail.com \
--cc=git@vger.kernel.org \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).