From: Nicolas Pitre <nico@cam.org>
To: Geert Bosch <bosch@adacore.com>
Cc: Andi Kleen <andi@firstfloor.org>, Ken Pratt <ken@kenpratt.net>,
"Shawn O. Pearce" <spearce@spearce.org>,
git@vger.kernel.org
Subject: Re: pack operation is thrashing my server
Date: Wed, 13 Aug 2008 13:26:14 -0400 (EDT) [thread overview]
Message-ID: <alpine.LFD.1.10.0808131304560.4352@xanadu.home> (raw)
In-Reply-To: <3E057C8D-FF72-47A2-BBA8-27A22AD67167@adacore.com>
On Wed, 13 Aug 2008, Geert Bosch wrote:
> On Aug 13, 2008, at 10:35, Nicolas Pitre wrote:
> > On Tue, 12 Aug 2008, Geert Bosch wrote:
> >
> > > I've always felt that keeping largish objects (say anything >1MB)
> > > loose makes perfect sense. These objects are accessed infrequently,
> > > often binary or otherwise poor candidates for the delta algorithm.
> >
> > Or, as I suggested in the past, they can be grouped into a separate
> > pack, or even occupy a pack of their own.
>
> This is fine, as long as we're not trying to create deltas
> of the large objects, or do other things that requires keeping
> the inflated data in memory.
First, there is the delta attribute:
|commit a74db82e15cd8a2c53a4a83e9a36dc7bf7a4c750
|Author: Junio C Hamano <junkio@cox.net>
|Date: Sat May 19 00:39:31 2007 -0700
|
| Teach "delta" attribute to pack-objects.
|
| This teaches pack-objects to use .gitattributes mechanism so
| that the user can specify certain blobs are not worth spending
| CPU cycles to attempt deltification.
|
| The name of the attrbute is "delta", and when it is set to
| false, like this:
|
| == .gitattributes ==
| *.jpg -delta
|
| they are always stored in the plain-compressed base object
| representation.
This could probably be extended to take a size limit argument as well.
> > As soon as you have more than
> > one revision of such largish objects then you lose again by keeping them
> > loose.
>
> Yes, you lose potentially in terms of disk space, but you avoid the
> large memory footprint during pack generation. For very large blobs,
> it is best to degenerate to having each revision of each file on
> its own (whether we call it a single-file pack, loose object or whatever).
> That way, the large file can stay immutable on disk, and will only
> need to be accessed during checkout. GIT will then scale with good
> performance until we run out of disk space.
Loose objects, though, will always be selected for potential delta
generation. Packed objects, deltified or not, are always streamed as is
when serving pull requests. And by default delta compression is not
(re)attempted between objects which are part of the same pack, the
reason being that if they were not deltified on the first packing
attempt then there is no point trying again when streaming them over the
net. So you always benefit from having your large objects packed with
the rest. This, plus the delta prevention mechanism above should cover
most cases.
> > You'll have memory usage issues whenever such objects are accessed,
> > loose or not.
> Why? The only time we'd need to access their contents for checkout
> or when pushing across the network. These should all be steaming operations
> with small memory footprint.
Pushing across the network, or repacking without -f, is streamed.
Checking out currently isn't (although it probably could). Repacking
with -f definitely isn't and probably shouldn't because of complexity
issues.
> > However, once those big objects are packed once, they can
> > be repacked (or streamed over the net) without really "accessing" them.
> > Packed object data is simply copied into a new pack in that case which
> > is less of an issue on memory usage, irrespective of the original pack
> > size.
> Agreed, but still, at least very large objects. If I have a 600MB
> file in my repository, it should just not get in the way. If it gets
> copied around during each repack, that just wastes I/O time for no
> good reason. Even worse, it causes incremental backups or filesystem
> checkpoints to become way more expensive. Just leaving large files
> alone as immutable objects on disk avoids all these issues.
Pack them in a pack of their own and stick a .keep file along with it.
At that point they will never be rewritten.
Nicolas
next prev parent reply other threads:[~2008-08-13 17:27 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-10 19:47 pack operation is thrashing my server Ken Pratt
2008-08-10 23:06 ` Martin Langhoff
2008-08-10 23:12 ` Ken Pratt
2008-08-10 23:30 ` Martin Langhoff
2008-08-10 23:34 ` Ken Pratt
2008-08-11 3:04 ` Shawn O. Pearce
2008-08-11 7:43 ` Ken Pratt
2008-08-11 15:01 ` Shawn O. Pearce
2008-08-11 15:40 ` Avery Pennarun
2008-08-11 15:59 ` Shawn O. Pearce
2008-08-11 19:13 ` Ken Pratt
2008-08-11 19:10 ` Andi Kleen
2008-08-11 19:15 ` Ken Pratt
2008-08-13 2:38 ` Nicolas Pitre
2008-08-13 2:50 ` Andi Kleen
2008-08-13 2:57 ` Shawn O. Pearce
2008-08-11 19:22 ` Shawn O. Pearce
2008-08-11 19:29 ` Ken Pratt
2008-08-11 19:34 ` Shawn O. Pearce
2008-08-11 20:10 ` Andi Kleen
2008-08-13 3:12 ` Geert Bosch
2008-08-13 3:15 ` Shawn O. Pearce
2008-08-13 3:58 ` Geert Bosch
2008-08-13 14:37 ` Nicolas Pitre
2008-08-13 14:56 ` Jakub Narebski
2008-08-13 15:04 ` Shawn O. Pearce
2008-08-13 15:26 ` David Tweed
2008-08-13 23:54 ` Martin Langhoff
2008-08-14 9:04 ` David Tweed
2008-08-13 16:10 ` Johan Herland
2008-08-13 17:38 ` Ken Pratt
2008-08-13 17:57 ` Nicolas Pitre
2008-08-13 14:35 ` Nicolas Pitre
2008-08-13 14:59 ` Shawn O. Pearce
2008-08-13 15:43 ` Nicolas Pitre
2008-08-13 15:50 ` Shawn O. Pearce
2008-08-13 17:04 ` Nicolas Pitre
2008-08-13 17:19 ` Shawn O. Pearce
2008-08-14 6:33 ` Andreas Ericsson
2008-08-14 10:04 ` Thomas Rast
2008-08-14 10:15 ` Andreas Ericsson
2008-08-14 22:33 ` Shawn O. Pearce
2008-08-15 1:46 ` Nicolas Pitre
2008-08-14 14:01 ` Nicolas Pitre
2008-08-14 17:21 ` Linus Torvalds
2008-08-14 17:58 ` Linus Torvalds
2008-08-14 19:04 ` Nicolas Pitre
2008-08-14 19:44 ` Linus Torvalds
2008-08-14 21:30 ` Andi Kleen
2008-08-15 16:15 ` Linus Torvalds
2008-08-14 21:50 ` Nicolas Pitre
2008-08-14 23:14 ` Linus Torvalds
2008-08-14 23:39 ` Björn Steinbrink
2008-08-15 0:06 ` Linus Torvalds
2008-08-15 0:25 ` Linus Torvalds
2008-08-16 12:47 ` Björn Steinbrink
2008-08-16 0:34 ` Linus Torvalds
2008-09-07 1:03 ` Junio C Hamano
2008-09-07 1:46 ` Linus Torvalds
2008-09-07 2:33 ` Junio C Hamano
2008-09-07 17:11 ` Nicolas Pitre
2008-09-07 17:41 ` Junio C Hamano
2008-09-07 2:50 ` Jon Smirl
2008-09-07 3:07 ` Linus Torvalds
2008-09-07 3:43 ` Jon Smirl
2008-09-07 4:50 ` Linus Torvalds
2008-09-07 13:58 ` Jon Smirl
2008-09-07 17:08 ` Nicolas Pitre
2008-09-07 20:33 ` Jon Smirl
2008-09-08 14:17 ` Nicolas Pitre
2008-09-08 15:12 ` Jon Smirl
2008-09-08 16:01 ` Jon Smirl
2008-09-07 8:18 ` Andreas Ericsson
2008-09-07 7:45 ` Mike Hommey
2008-08-14 18:38 ` Nicolas Pitre
2008-08-14 18:55 ` Linus Torvalds
2008-08-13 16:01 ` Geert Bosch
2008-08-13 17:13 ` Dana How
2008-08-13 17:26 ` Nicolas Pitre [this message]
2008-08-13 12:43 ` Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.1.10.0808131304560.4352@xanadu.home \
--to=nico@cam.org \
--cc=andi@firstfloor.org \
--cc=bosch@adacore.com \
--cc=git@vger.kernel.org \
--cc=ken@kenpratt.net \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).