From: Jakub Narebski <jnareb@gmail.com>
To: "Dana How" <danahow@gmail.com>
Cc: git@vger.kernel.org, "Junio C Hamano" <junkio@cox.net>
Subject: Re: [PATCH] Prevent megablobs from gunking up git packs
Date: Wed, 23 May 2007 01:44:37 +0200 [thread overview]
Message-ID: <200705230144.38290.jnareb@gmail.com> (raw)
In-Reply-To: <56b7f5510705220959x1b37a4adk537cc0cba1a27530@mail.gmail.com>
Dana How wrote:
> On 5/22/07, Jakub Narebski <jnareb@gmail.com> wrote:
>> Dana How wrote:
>>> There's actually an even more extreme example from my day job.
>>> The software team has a project whose files/revisions would be
>>> similar to those in the linux kernel (larger commits, I'm sure).
>>> But they have *ONE* 500MB file they check in because it takes
>>> 2 or 3 days to generate and different people use different versions of it.
>>> I'm sure it has 50+ revisions now. If they converted to git and included
>>> these blobs in their packfile, that's a 25GB uncompressed increase!
>>> *Every* git operation must wade through 10X -- 100X more packfile.
>>> Or it could be kept in 50+ loose objects in objects/xx ,
>>> requiring a few extra syscalls by each user to get a new version.
>>
>> Or keeping those large objects in separate, _kept_ packfile, containing
>> only those objects (which can delta well, even if they are large).
>
> Yes, I experimented with various changes to git-repack and
> having it create .keep files just before coming up with the maxblobsize
> approach. The problem with a 12GB+ repo is not only the large
> repack time, but the fact that the repack time keeps growing with
> the repo size. So, with split packs, I had repack create .keep
> files for all new packs except the last (fragmentary) one. The next
> repack would then only repack new stuff plus the single fragmentary
> pack, keeping repack time from growing (until you deleted the .keep
> files [just the ones with "repack" in them] to start over from scratch).
> But this approach is not going to distribute commits and trees all that well.
No, I was thinking about separate _kept_ pack (so it would be not
repacked unless -f option is given) containing _only_ the large blobs.
The only difference between this and your proposal is that megablobs
would be in their mergablobs pack, but not loose.
--
Jakub Narebski
Poland
next prev parent reply other threads:[~2007-05-22 23:45 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-22 6:14 [PATCH] Prevent megablobs from gunking up git packs Dana How
2007-05-22 6:30 ` Shawn O. Pearce
2007-05-22 7:33 ` Dana How
2007-05-22 6:52 ` Junio C Hamano
2007-05-22 8:00 ` Dana How
2007-05-22 11:05 ` Jakub Narebski
2007-05-22 16:59 ` Dana How
2007-05-22 23:44 ` Jakub Narebski [this message]
2007-05-23 0:28 ` Junio C Hamano
2007-05-23 1:58 ` Nicolas Pitre
2007-05-22 17:38 ` Nicolas Pitre
2007-05-22 18:07 ` Dana How
2007-05-23 22:08 ` Junio C Hamano
2007-05-23 23:55 ` Dana How
2007-05-24 1:44 ` Junio C Hamano
2007-05-24 7:12 ` Shawn O. Pearce
2007-05-24 9:38 ` Johannes Schindelin
2007-05-24 17:23 ` david
2007-05-24 17:29 ` Johannes Schindelin
2007-05-25 0:55 ` Shawn O. Pearce
2007-05-24 20:43 ` Geert Bosch
2007-05-24 23:29 ` Dana How
2007-05-25 2:06 ` Shawn O. Pearce
2007-05-25 5:44 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200705230144.38290.jnareb@gmail.com \
--to=jnareb@gmail.com \
--cc=danahow@gmail.com \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).