From: Shawn Pearce <spearce@spearce.org>
To: Jon Smirl <jonsmirl@gmail.com>
Cc: git <git@vger.kernel.org>
Subject: Re: fast-import and unique objects.
Date: Mon, 7 Aug 2006 01:04:23 -0400 [thread overview]
Message-ID: <20060807050422.GD20514@spearce.org> (raw)
In-Reply-To: <9e4733910608062148u4341dabag451c3f49f1a792a1@mail.gmail.com>
Jon Smirl <jonsmirl@gmail.com> wrote:
> On 8/6/06, Shawn Pearce <spearce@spearce.org> wrote:
> >So the new version should take about 20 MB of memory and should
> >produce a valid pack and index in the same time as it does only
> >the pack now. Plus it won't generate duplicates.
>
> I did a run with this and it works great.
Good. :-) On my drive in to work this afternoon I realized
that making you specify the size of the object table is stupid,
I could easily allocate a thousand objects at a time rather than
preallocating the whole thing. Oh well. fast-import thus far
hasn't been meant as production code for inclusion in core GIT,
but maybe it will get cleaned up and submitted as such if your
conversion efforts go well and produce a better CVS importer.
> I'm staring at the cvs2svn code now trying to figure out how to modify
> it without rewriting everything. I may just leave it all alone and
> build a table with cvs_file:rev to sha-1 mappings. It would be much
> more efficient to carry sha-1 throughout the stages but that may
> require significant rework.
Does it matter? How long does the cvs2svn processing take,
excluding the GIT blob processing that's now known to take 2 hours?
What's your target for an acceptable conversion time on the system
you are working on?
Any thoughts yet on how you might want to feed trees and commits
to a fast pack writer? I was thinking about doing a stream into
fast-import such as:
<4 byte length of commit><commit><treeent>*<null>
where <commit> is the raw commit minus the first "tree nnn\n" line, and
<treeent> is:
<type><sp><sha1><sp><path><null>
where <type> is one of 'B' (normal blob), 'L' (symlink), 'X'
(executable blob), <sha1> is the 40 byte hex, <path> is the file from
the root of the repository ("src/module/foo.c"), and <sp> and <null>
are the obvious values. You would feed all tree entries and the pack
writer would split the stream up into the individual tree objects.
fast-import would generate the tree(s) delta'ing them against the
prior tree of the same path, prefix "tree nnn\n" to the commit
blob you supplied, generate the commit, and print out its ID.
By working from the first commit up to the most recent each tree
deltas would be using the older tree as the base which may not be
ideal if a large number of items get added to a tree but should be
effective enough to generate a reasonably sized initial pack.
It would however mean you need to monitor the output pipe from
fast-import to get back the commit id so you can use it to prep
the next commit's parent(s) as you can't produce that in Python.
--
Shawn.
next prev parent reply other threads:[~2006-08-07 5:05 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-06 12:32 fast-import and unique objects Jon Smirl
2006-08-06 15:53 ` Jon Smirl
2006-08-06 18:03 ` Shawn Pearce
2006-08-07 4:48 ` Jon Smirl
2006-08-07 5:04 ` Shawn Pearce [this message]
2006-08-07 14:37 ` Jon Smirl
2006-08-07 14:48 ` Jakub Narebski
2006-08-07 18:45 ` Jon Smirl
2006-08-08 3:12 ` Shawn Pearce
2006-08-08 12:11 ` Jon Smirl
2006-08-08 22:45 ` Shawn Pearce
2006-08-08 23:56 ` Jon Smirl
2006-08-07 5:10 ` Martin Langhoff
2006-08-07 7:57 ` Ryan Anderson
2006-08-07 23:02 ` Shawn Pearce
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060807050422.GD20514@spearce.org \
--to=spearce@spearce.org \
--cc=git@vger.kernel.org \
--cc=jonsmirl@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).