From: Shawn Pearce <spearce@spearce.org>
To: Jon Smirl <jonsmirl@gmail.com>
Cc: Martin Langhoff <martin.langhoff@gmail.com>,
Linus Torvalds <torvalds@osdl.org>, git <git@vger.kernel.org>
Subject: Re: Creating objects manually and repack
Date: Sat, 5 Aug 2006 01:52:03 -0400 [thread overview]
Message-ID: <20060805055203.GC18679@spearce.org> (raw)
In-Reply-To: <9e4733910608042240u581dd23q3859ebcfe4268ce2@mail.gmail.com>
Jon Smirl <jonsmirl@gmail.com> wrote:
> How about adding a flag to repack that simply says delete the objects
> when done with them? I'd still create all of the objects on disk.
> Repack would assume that they have at least been sorted by filename.
> So repack could read in object names until it sees a change in the
> file name, sort them by size, deltafy, write out the pack and then
> delete the objects from that batch. Then repeat this process for the
> next file name on stdin.
>
> I'm making two assumptions, first that blocks from a deleted file
> don't get written to disk. And that by deleting the file the file
> system will use the same blocks over and over. If those assumptions
> are close to being true then the cache shouldn't thrash. They don't
> have to be totally true, close is good enough.
>
> Of course eliminating the files all together will be even faster.
See the email I just sent you. The only file being written is the
pack file that's being generated. No temporary files, no temporary
inodes, no temporary blocks. Only two passes over the data: one to
write it out and a second to generate the SHA1. I do two passes
vs. keep it all in memory to prevent the program from blowing out
on extremely large inputs.
It may be possible to tweak git-pack-objects to get what you propose
above, but to be honest I think the git-fast-import I just sent
was easier, especially as it avoids the temporary loose object stage.
--
Shawn.
next prev parent reply other threads:[~2006-08-05 5:52 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-04 3:43 Creating objects manually and repack Jon Smirl
2006-08-04 3:58 ` Jeff King
2006-08-04 4:01 ` Linus Torvalds
2006-08-04 4:24 ` Jon Smirl
2006-08-04 4:46 ` Linus Torvalds
2006-08-04 5:01 ` Linus Torvalds
2006-08-04 5:11 ` Jon Smirl
2006-08-04 14:40 ` Jon Smirl
2006-08-04 14:50 ` Jon Smirl
2006-08-04 15:22 ` Linus Torvalds
2006-08-04 15:41 ` Jon Smirl
2006-08-04 16:01 ` A Large Angry SCM
2006-08-04 16:11 ` Jon Smirl
2006-08-04 16:32 ` Linus Torvalds
2006-08-04 16:56 ` Linus Torvalds
2006-08-04 16:39 ` Rogan Dawes
2006-08-04 16:53 ` Jon Smirl
2006-08-04 16:53 ` Linus Torvalds
2006-08-04 17:17 ` Jon Smirl
2006-08-04 17:29 ` Linus Torvalds
2006-08-04 18:06 ` Linus Torvalds
2006-08-04 18:24 ` Junio C Hamano
2006-08-04 19:20 ` Linus Torvalds
2006-08-04 19:31 ` Carl Worth
2006-08-04 19:57 ` Junio C Hamano
2006-08-04 20:08 ` Carl Worth
2006-08-04 20:08 ` Carl Worth
2006-08-04 20:12 ` Jakub Narebski
2006-08-04 20:30 ` Junio C Hamano
2006-08-04 20:37 ` Jakub Narebski
2006-08-05 4:15 ` Martin Langhoff
2006-08-05 5:12 ` Jon Smirl
2006-08-05 5:21 ` Shawn Pearce
2006-08-05 5:40 ` Jon Smirl
2006-08-05 5:52 ` Shawn Pearce [this message]
2006-08-05 5:46 ` Shawn Pearce
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060805055203.GC18679@spearce.org \
--to=spearce@spearce.org \
--cc=git@vger.kernel.org \
--cc=jonsmirl@gmail.com \
--cc=martin.langhoff@gmail.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).