From: Shawn Pearce <spearce@spearce.org>
To: Jon Smirl <jonsmirl@gmail.com>
Cc: Martin Langhoff <martin.langhoff@gmail.com>,
Linus Torvalds <torvalds@osdl.org>, git <git@vger.kernel.org>
Subject: Re: Creating objects manually and repack
Date: Sat, 5 Aug 2006 01:21:36 -0400 [thread overview]
Message-ID: <20060805052135.GA18679@spearce.org> (raw)
In-Reply-To: <9e4733910608042212p6bf56224ye0ecf3f06b2840cf@mail.gmail.com>
Jon Smirl <jonsmirl@gmail.com> wrote:
> On 8/5/06, Martin Langhoff <martin.langhoff@gmail.com> wrote:
> >On 8/5/06, Jon Smirl <jonsmirl@gmail.com> wrote:
> >> On 8/4/06, Linus Torvalds <torvalds@osdl.org> wrote:
> >> > and you're basically all done. The above would turn each *,v file into
> >a
> >> > *-<sha>.pack/*-<sha>.idx file pair, so you'd have exactly as many
> >> > pack-files as you have *,v files.
> >>
> >> I'll end up with 110,000 pack files.
> >
> >Then just do it every 100 files, and you'll only have 1,100 pack
> >files, and it'll be fine.
>
> This is something that has to be tuned. If you wait too long
> everything spills out of RAM and you go totally IO bound for days. If
> you do it too often you end up with too many packs and it takes a day
> to repack them.
>
> If I had a way to pipe the all of the objects into repack one at a
> time without repack doing multiple passes none of this tuning would be
> necessary. In this model the standalone objects never get created in
> the first place. The fastest IO is IO that has been eliminated.
I'm almost done with what I'm calling `git-fast-import`. It takes
a stream of blobs on STDIN and writes the pack to a file, printing
SHA1s in hex format to STDOUT. The basic format for STDIN is a 4
byte length (native format) followed by that many bytes of blob data.
It prints the SHA1 for that blob to STDOUT, then waits for another
length.
It naively deltas each object against the prior object, thus it
would be best to feed it one ,v file at a time working from the most
recent revision back to the oldest revision. This works well for
an RCS file as that's the natural order to process the file in. :-)
When done you close STDIN and it'll rip through and update the pack
object count and the trailing checksum. This should let you pack
the entire repository in delta format using only two passes over the
data: one to write out the pack file and one to compute its checksum.
I'll post the code in a couple of hours.
--
Shawn.
next prev parent reply other threads:[~2006-08-05 5:21 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-04 3:43 Creating objects manually and repack Jon Smirl
2006-08-04 3:58 ` Jeff King
2006-08-04 4:01 ` Linus Torvalds
2006-08-04 4:24 ` Jon Smirl
2006-08-04 4:46 ` Linus Torvalds
2006-08-04 5:01 ` Linus Torvalds
2006-08-04 5:11 ` Jon Smirl
2006-08-04 14:40 ` Jon Smirl
2006-08-04 14:50 ` Jon Smirl
2006-08-04 15:22 ` Linus Torvalds
2006-08-04 15:41 ` Jon Smirl
2006-08-04 16:01 ` A Large Angry SCM
2006-08-04 16:11 ` Jon Smirl
2006-08-04 16:32 ` Linus Torvalds
2006-08-04 16:56 ` Linus Torvalds
2006-08-04 16:39 ` Rogan Dawes
2006-08-04 16:53 ` Jon Smirl
2006-08-04 16:53 ` Linus Torvalds
2006-08-04 17:17 ` Jon Smirl
2006-08-04 17:29 ` Linus Torvalds
2006-08-04 18:06 ` Linus Torvalds
2006-08-04 18:24 ` Junio C Hamano
2006-08-04 19:20 ` Linus Torvalds
2006-08-04 19:31 ` Carl Worth
2006-08-04 19:57 ` Junio C Hamano
2006-08-04 20:08 ` Carl Worth
2006-08-04 20:08 ` Carl Worth
2006-08-04 20:12 ` Jakub Narebski
2006-08-04 20:30 ` Junio C Hamano
2006-08-04 20:37 ` Jakub Narebski
2006-08-05 4:15 ` Martin Langhoff
2006-08-05 5:12 ` Jon Smirl
2006-08-05 5:21 ` Shawn Pearce [this message]
2006-08-05 5:40 ` Jon Smirl
2006-08-05 5:52 ` Shawn Pearce
2006-08-05 5:46 ` Shawn Pearce
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060805052135.GA18679@spearce.org \
--to=spearce@spearce.org \
--cc=git@vger.kernel.org \
--cc=jonsmirl@gmail.com \
--cc=martin.langhoff@gmail.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).