From: Shawn Pearce <spearce@spearce.org>
To: Jon Smirl <jonsmirl@gmail.com>
Cc: Martin Langhoff <martin.langhoff@gmail.com>,
Linus Torvalds <torvalds@osdl.org>, git <git@vger.kernel.org>
Subject: Re: Creating objects manually and repack
Date: Sat, 5 Aug 2006 01:52:03 -0400 [thread overview]
Message-ID: <20060805055203.GC18679@spearce.org> (raw)
In-Reply-To: <9e4733910608042240u581dd23q3859ebcfe4268ce2@mail.gmail.com>
Jon Smirl <jonsmirl@gmail.com> wrote:
> How about adding a flag to repack that simply says delete the objects
> when done with them? I'd still create all of the objects on disk.
> Repack would assume that they have at least been sorted by filename.
> So repack could read in object names until it sees a change in the
> file name, sort them by size, deltafy, write out the pack and then
> delete the objects from that batch. Then repeat this process for the
> next file name on stdin.
>
> I'm making two assumptions, first that blocks from a deleted file
> don't get written to disk. And that by deleting the file the file
> system will use the same blocks over and over. If those assumptions
> are close to being true then the cache shouldn't thrash. They don't
> have to be totally true, close is good enough.
>
> Of course eliminating the files all together will be even faster.
See the email I just sent you. The only file being written is the
pack file that's being generated. No temporary files, no temporary
inodes, no temporary blocks. Only two passes over the data: one to
write it out and a second to generate the SHA1. I do two passes
vs. keep it all in memory to prevent the program from blowing out
on extremely large inputs.
It may be possible to tweak git-pack-objects to get what you propose
above, but to be honest I think the git-fast-import I just sent
was easier, especially as it avoids the temporary loose object stage.
--
Shawn.
next prev parent reply other threads:[~2006-08-05 5:52 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-04 3:43 Creating objects manually and repack Jon Smirl
2006-08-04 3:58 ` Jeff King
2006-08-04 4:01 ` Linus Torvalds
2006-08-04 4:24 ` Jon Smirl
2006-08-04 4:46 ` Linus Torvalds
2006-08-04 5:01 ` Linus Torvalds
2006-08-04 5:11 ` Jon Smirl
2006-08-04 14:40 ` Jon Smirl
2006-08-04 14:50 ` Jon Smirl
2006-08-04 15:22 ` Linus Torvalds
2006-08-04 15:41 ` Jon Smirl
2006-08-04 16:01 ` A Large Angry SCM
2006-08-04 16:11 ` Jon Smirl
2006-08-04 16:32 ` Linus Torvalds
2006-08-04 16:56 ` Linus Torvalds
2006-08-04 16:39 ` Rogan Dawes
2006-08-04 16:53 ` Jon Smirl
2006-08-04 16:53 ` Linus Torvalds
2006-08-04 17:17 ` Jon Smirl
2006-08-04 17:29 ` Linus Torvalds
2006-08-04 18:06 ` Linus Torvalds
2006-08-04 18:24 ` Junio C Hamano
2006-08-04 19:20 ` Linus Torvalds
2006-08-04 19:31 ` Carl Worth
2006-08-04 19:57 ` Junio C Hamano
2006-08-04 20:08 ` Carl Worth
2006-08-04 20:08 ` Carl Worth
2006-08-04 20:12 ` Jakub Narebski
2006-08-04 20:30 ` Junio C Hamano
2006-08-04 20:37 ` Jakub Narebski
2006-08-05 4:15 ` Martin Langhoff
2006-08-05 5:12 ` Jon Smirl
2006-08-05 5:21 ` Shawn Pearce
2006-08-05 5:40 ` Jon Smirl
2006-08-05 5:52 ` Shawn Pearce [this message]
2006-08-05 5:46 ` Shawn Pearce
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060805055203.GC18679@spearce.org \
--to=spearce@spearce.org \
--cc=git@vger.kernel.org \
--cc=jonsmirl@gmail.com \
--cc=martin.langhoff@gmail.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.