From: Johan Herland <johan@herland.net>
To: git@vger.kernel.org
Cc: "Shawn O. Pearce" <spearce@spearce.org>,
Jakub Narebski <jnareb@gmail.com>, Nicolas Pitre <nico@cam.org>,
Geert Bosch <bosch@adacore.com>, Andi Kleen <andi@firstfloor.org>,
Ken Pratt <ken@kenpratt.net>
Subject: Re: pack operation is thrashing my server
Date: Wed, 13 Aug 2008 18:10:19 +0200 [thread overview]
Message-ID: <200808131810.19914.johan@herland.net> (raw)
In-Reply-To: <20080813150425.GC3782@spearce.org>
On Wednesday 13 August 2008, Shawn O. Pearce wrote:
> Jakub Narebski <jnareb@gmail.com> wrote:
> > Nicolas Pitre <nico@cam.org> writes:
> > > On Tue, 12 Aug 2008, Geert Bosch wrote:
> > > > One nice optimization we could do for those pesky binary large
> > > > objects (like PDF, JPG and GZIP-ed data), is to detect such
> > > > files and revert to compression level 0. This should be
> > > > especially beneficial since already compressed data takes most
> > > > time to compress again.
> > >
> > > That would be a good thing indeed.
> >
> > Perhaps take a sample of some given size and calculate entropy in
> > it? Or just simply add gitattribute for per file compression
> > ratio...
>
> Estimating the entropy would make it "just magic". Most of Git is
> "just magic" so that's a good direction to take. I'm not familiar
> enough with the PDF/JPG/GZIP/ZIP stream formats to know what the
> first 4-8k looks like to know if it would give a good indication
> of being already compressed.
>
> Though I'd imagine looking at the first 4k should be sufficient
> for any compressed file. Having a header composed of 4k of _text_
> before binary compressed data would be nuts. Or a git-bundle with
> a large refs listing. ;-)
As for how to estimate entropy, isn't that just a matter of feeding it
through zlib and compare the output size to the input size? Especially
if we're already about to feed it through zlib anyway... In other
words, feed (an initial part of) the data through zlib, and if the
compression ratio so far looks good, keep going and write out the
compressed object, otherwise abort zlib and write out the original
object with compression level 0.
> Hence, "just magic" is probably the better route.
Agreed.
Have fun!
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
next prev parent reply other threads:[~2008-08-13 16:17 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-10 19:47 pack operation is thrashing my server Ken Pratt
2008-08-10 23:06 ` Martin Langhoff
2008-08-10 23:12 ` Ken Pratt
2008-08-10 23:30 ` Martin Langhoff
2008-08-10 23:34 ` Ken Pratt
2008-08-11 3:04 ` Shawn O. Pearce
2008-08-11 7:43 ` Ken Pratt
2008-08-11 15:01 ` Shawn O. Pearce
2008-08-11 15:40 ` Avery Pennarun
2008-08-11 15:59 ` Shawn O. Pearce
2008-08-11 19:13 ` Ken Pratt
2008-08-11 19:10 ` Andi Kleen
2008-08-11 19:15 ` Ken Pratt
2008-08-13 2:38 ` Nicolas Pitre
2008-08-13 2:50 ` Andi Kleen
2008-08-13 2:57 ` Shawn O. Pearce
2008-08-11 19:22 ` Shawn O. Pearce
2008-08-11 19:29 ` Ken Pratt
2008-08-11 19:34 ` Shawn O. Pearce
2008-08-11 20:10 ` Andi Kleen
2008-08-13 3:12 ` Geert Bosch
2008-08-13 3:15 ` Shawn O. Pearce
2008-08-13 3:58 ` Geert Bosch
2008-08-13 14:37 ` Nicolas Pitre
2008-08-13 14:56 ` Jakub Narebski
2008-08-13 15:04 ` Shawn O. Pearce
2008-08-13 15:26 ` David Tweed
2008-08-13 23:54 ` Martin Langhoff
2008-08-14 9:04 ` David Tweed
2008-08-13 16:10 ` Johan Herland [this message]
2008-08-13 17:38 ` Ken Pratt
2008-08-13 17:57 ` Nicolas Pitre
2008-08-13 14:35 ` Nicolas Pitre
2008-08-13 14:59 ` Shawn O. Pearce
2008-08-13 15:43 ` Nicolas Pitre
2008-08-13 15:50 ` Shawn O. Pearce
2008-08-13 17:04 ` Nicolas Pitre
2008-08-13 17:19 ` Shawn O. Pearce
2008-08-14 6:33 ` Andreas Ericsson
2008-08-14 10:04 ` Thomas Rast
2008-08-14 10:15 ` Andreas Ericsson
2008-08-14 22:33 ` Shawn O. Pearce
2008-08-15 1:46 ` Nicolas Pitre
2008-08-14 14:01 ` Nicolas Pitre
2008-08-14 17:21 ` Linus Torvalds
2008-08-14 17:58 ` Linus Torvalds
2008-08-14 19:04 ` Nicolas Pitre
2008-08-14 19:44 ` Linus Torvalds
2008-08-14 21:30 ` Andi Kleen
2008-08-15 16:15 ` Linus Torvalds
2008-08-14 21:50 ` Nicolas Pitre
2008-08-14 23:14 ` Linus Torvalds
2008-08-14 23:39 ` Björn Steinbrink
2008-08-15 0:06 ` Linus Torvalds
2008-08-15 0:25 ` Linus Torvalds
2008-08-16 12:47 ` Björn Steinbrink
2008-08-16 0:34 ` Linus Torvalds
2008-09-07 1:03 ` Junio C Hamano
2008-09-07 1:46 ` Linus Torvalds
2008-09-07 2:33 ` Junio C Hamano
2008-09-07 17:11 ` Nicolas Pitre
2008-09-07 17:41 ` Junio C Hamano
2008-09-07 2:50 ` Jon Smirl
2008-09-07 3:07 ` Linus Torvalds
2008-09-07 3:43 ` Jon Smirl
2008-09-07 4:50 ` Linus Torvalds
2008-09-07 13:58 ` Jon Smirl
2008-09-07 17:08 ` Nicolas Pitre
2008-09-07 20:33 ` Jon Smirl
2008-09-08 14:17 ` Nicolas Pitre
2008-09-08 15:12 ` Jon Smirl
2008-09-08 16:01 ` Jon Smirl
2008-09-07 8:18 ` Andreas Ericsson
2008-09-07 7:45 ` Mike Hommey
2008-08-14 18:38 ` Nicolas Pitre
2008-08-14 18:55 ` Linus Torvalds
2008-08-13 16:01 ` Geert Bosch
2008-08-13 17:13 ` Dana How
2008-08-13 17:26 ` Nicolas Pitre
2008-08-13 12:43 ` Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200808131810.19914.johan@herland.net \
--to=johan@herland.net \
--cc=andi@firstfloor.org \
--cc=bosch@adacore.com \
--cc=git@vger.kernel.org \
--cc=jnareb@gmail.com \
--cc=ken@kenpratt.net \
--cc=nico@cam.org \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).