From: Nicolas Pitre <nico@cam.org>
To: Dana How <danahow@gmail.com>
Cc: Junio C Hamano <junkio@cox.net>, Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH] Prevent megablobs from gunking up git packs
Date: Tue, 22 May 2007 13:38:03 -0400 (EDT) [thread overview]
Message-ID: <alpine.LFD.0.99.0705221329420.3366@xanadu.home> (raw)
In-Reply-To: <46528A48.9050903@gmail.com>
On Mon, 21 May 2007, Dana How wrote:
>
> Using fast-import and repack with the max-pack-size patch,
> 3628 commits were imported from Perforce comprising
> 100.35GB (uncompressed) in 38829 blobs, and saved in
> 7 packfiles of 12.5GB total (--window=0 and --depth=0 were
> used due to runtime limits). When using these packfiles,
> several git commands showed very large process sizes,
> and some slowdowns (compared to comparable operations
> on the linux kernel repo) were also apparent.
>
> git stores data in loose blobs or in packfiles. The former
> has essentially now become an exception mechanism, to store
> exceptionally *young* blobs. Why not use this to store
> exceptionally *large* blobs as well? This allows us to
> re-use all the "exception" machinery with only a small change.
>
> Repacking the entire repository with a max-blob-size of 256KB
> resulted in a single 13.1MB packfile, as well as 2853 loose
> objects totaling 15.4GB compressed and 100.08GB uncompressed,
> 11 files per objects/xx directory on average. All was created
> in half the runtime of the previous yet with standard
> --window=10 and --depth=50 parameters. The data in the
> packfile was 270MB uncompressed in 35976 blobs. Operations
> such as "git-log --pretty=oneline" were about 30X faster
> on a cold cache and 2 to 3X faster otherwise. Process sizes
> remained reasonable.
>
> This patch implements the following:
> 1. git pack-objects takes a new --max-blob-size=N flag,
> with the effect that only blobs less than N KB are written
> to the packfiles(s). If a blob was in a pack but violates
> this limit (perhaps the packs were created by fast-import
> or max-blob-size was reduced), then a new loose object
> is written out if needed so the data is not lost.
> 2. git repack inspects repack.maxblobsize . If set, its
> value is passed to git pack-objects on the command line.
> The user should change repack.maxblobsize , NOT specify
> --max-blob-size=N .
> 3. No other caller of git pack-objects supplies this new flag,
> so other callers see no change.
>
> This patch is on top of the earlier max-pack-size patch,
> because I thought I needed some behavior it supplied,
> but could be rebased on master if desired.
I think what this patch is missing is a test after all options have been
parsed to prevent --stdout and --max-blob-size to be used together.
Nicolas
next prev parent reply other threads:[~2007-05-22 17:38 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-22 6:14 [PATCH] Prevent megablobs from gunking up git packs Dana How
2007-05-22 6:30 ` Shawn O. Pearce
2007-05-22 7:33 ` Dana How
2007-05-22 6:52 ` Junio C Hamano
2007-05-22 8:00 ` Dana How
2007-05-22 11:05 ` Jakub Narebski
2007-05-22 16:59 ` Dana How
2007-05-22 23:44 ` Jakub Narebski
2007-05-23 0:28 ` Junio C Hamano
2007-05-23 1:58 ` Nicolas Pitre
2007-05-22 17:38 ` Nicolas Pitre [this message]
2007-05-22 18:07 ` Dana How
2007-05-23 22:08 ` Junio C Hamano
2007-05-23 23:55 ` Dana How
2007-05-24 1:44 ` Junio C Hamano
2007-05-24 7:12 ` Shawn O. Pearce
2007-05-24 9:38 ` Johannes Schindelin
2007-05-24 17:23 ` david
2007-05-24 17:29 ` Johannes Schindelin
2007-05-25 0:55 ` Shawn O. Pearce
2007-05-24 20:43 ` Geert Bosch
2007-05-24 23:29 ` Dana How
2007-05-25 2:06 ` Shawn O. Pearce
2007-05-25 5:44 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.0.99.0705221329420.3366@xanadu.home \
--to=nico@cam.org \
--cc=danahow@gmail.com \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).