From: Jakub Narebski <jnareb@gmail.com>
To: "Dana How" <danahow@gmail.com>
Cc: git@vger.kernel.org, "Junio C Hamano" <junkio@cox.net>
Subject: Re: [PATCH] Prevent megablobs from gunking up git packs
Date: Wed, 23 May 2007 01:44:37 +0200 [thread overview]
Message-ID: <200705230144.38290.jnareb@gmail.com> (raw)
In-Reply-To: <56b7f5510705220959x1b37a4adk537cc0cba1a27530@mail.gmail.com>
Dana How wrote:
> On 5/22/07, Jakub Narebski <jnareb@gmail.com> wrote:
>> Dana How wrote:
>>> There's actually an even more extreme example from my day job.
>>> The software team has a project whose files/revisions would be
>>> similar to those in the linux kernel (larger commits, I'm sure).
>>> But they have *ONE* 500MB file they check in because it takes
>>> 2 or 3 days to generate and different people use different versions of it.
>>> I'm sure it has 50+ revisions now. If they converted to git and included
>>> these blobs in their packfile, that's a 25GB uncompressed increase!
>>> *Every* git operation must wade through 10X -- 100X more packfile.
>>> Or it could be kept in 50+ loose objects in objects/xx ,
>>> requiring a few extra syscalls by each user to get a new version.
>>
>> Or keeping those large objects in separate, _kept_ packfile, containing
>> only those objects (which can delta well, even if they are large).
>
> Yes, I experimented with various changes to git-repack and
> having it create .keep files just before coming up with the maxblobsize
> approach. The problem with a 12GB+ repo is not only the large
> repack time, but the fact that the repack time keeps growing with
> the repo size. So, with split packs, I had repack create .keep
> files for all new packs except the last (fragmentary) one. The next
> repack would then only repack new stuff plus the single fragmentary
> pack, keeping repack time from growing (until you deleted the .keep
> files [just the ones with "repack" in them] to start over from scratch).
> But this approach is not going to distribute commits and trees all that well.
No, I was thinking about separate _kept_ pack (so it would be not
repacked unless -f option is given) containing _only_ the large blobs.
The only difference between this and your proposal is that megablobs
would be in their mergablobs pack, but not loose.
--
Jakub Narebski
Poland
next prev parent reply other threads:[~2007-05-22 23:45 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-22 6:14 [PATCH] Prevent megablobs from gunking up git packs Dana How
2007-05-22 6:30 ` Shawn O. Pearce
2007-05-22 7:33 ` Dana How
2007-05-22 6:52 ` Junio C Hamano
2007-05-22 8:00 ` Dana How
2007-05-22 11:05 ` Jakub Narebski
2007-05-22 16:59 ` Dana How
2007-05-22 23:44 ` Jakub Narebski [this message]
2007-05-23 0:28 ` Junio C Hamano
2007-05-23 1:58 ` Nicolas Pitre
2007-05-22 17:38 ` Nicolas Pitre
2007-05-22 18:07 ` Dana How
2007-05-23 22:08 ` Junio C Hamano
2007-05-23 23:55 ` Dana How
2007-05-24 1:44 ` Junio C Hamano
2007-05-24 7:12 ` Shawn O. Pearce
2007-05-24 9:38 ` Johannes Schindelin
2007-05-24 17:23 ` david
2007-05-24 17:29 ` Johannes Schindelin
2007-05-25 0:55 ` Shawn O. Pearce
2007-05-24 20:43 ` Geert Bosch
2007-05-24 23:29 ` Dana How
2007-05-25 2:06 ` Shawn O. Pearce
2007-05-25 5:44 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200705230144.38290.jnareb@gmail.com \
--to=jnareb@gmail.com \
--cc=danahow@gmail.com \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.