From: "David Tweed" <david.tweed@gmail.com>
To: "Shawn O. Pearce" <spearce@spearce.org>
Cc: "Jakub Narebski" <jnareb@gmail.com>,
"Nicolas Pitre" <nico@cam.org>, "Geert Bosch" <bosch@adacore.com>,
"Andi Kleen" <andi@firstfloor.org>,
"Ken Pratt" <ken@kenpratt.net>,
git@vger.kernel.org
Subject: Re: pack operation is thrashing my server
Date: Wed, 13 Aug 2008 16:26:48 +0100 [thread overview]
Message-ID: <e1dab3980808130826m4870df3ctf09ecf0062ef6e7c@mail.gmail.com> (raw)
In-Reply-To: <20080813150425.GC3782@spearce.org>
On Wed, Aug 13, 2008 at 4:04 PM, Shawn O. Pearce <spearce@spearce.org> wrote:
> Jakub Narebski <jnareb@gmail.com> wrote:
>> Nicolas Pitre <nico@cam.org> writes:
>> > On Tue, 12 Aug 2008, Geert Bosch wrote:
>> >
>> > > One nice optimization we could do for those pesky binary large objects
>> > > (like PDF, JPG and GZIP-ed data), is to detect such files and revert
>> > > to compression level 0. This should be especially beneficial
>> > > since already compressed data takes most time to compress again.
>> >
>> > That would be a good thing indeed.
>>
>> Perhaps take a sample of some given size and calculate entropy in it?
>> Or just simply add gitattribute for per file compression ratio...
>
> Estimating the entropy would make it "just magic". Most of Git is
> "just magic" so that's a good direction to take. I'm not familiar
> enough with the PDF/JPG/GZIP/ZIP stream formats to know what the
> first 4-8k looks like to know if it would give a good indication
> of being already compressed.
>
> Though I'd imagine looking at the first 4k should be sufficient
> for any compressed file. Having a header composed of 4k of _text_
> before binary compressed data would be nuts. Or a git-bundle with
> a large refs listing. ;-)
FWIW, PDF format is a mix of sections of uncompressed higher level
ASCII notation and sections of compressed actual glyph/location data
for individual pages, and I don't think the rules are very strict
about what goes where. Looking at some academic papers some contain
compressed data within the first hundred characters whilst I've got a
couple with the first compressed byte 1968 and 12304; I'm sure if I
had a longer pdf to look at I'd find one where compression data first
occurred even later. I leave discussions of whether this is nuts to
others ;-) .
JPG is pretty much guaranteed to contain compressed data after a
couple of metadata lines.
--
cheers, dave tweed__________________________
david.tweed@gmail.com
Rm 124, School of Systems Engineering, University of Reading.
"while having code so boring anyone can maintain it, use Python." --
attempted insult seen on slashdot
next prev parent reply other threads:[~2008-08-13 15:27 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-10 19:47 pack operation is thrashing my server Ken Pratt
2008-08-10 23:06 ` Martin Langhoff
2008-08-10 23:12 ` Ken Pratt
2008-08-10 23:30 ` Martin Langhoff
2008-08-10 23:34 ` Ken Pratt
2008-08-11 3:04 ` Shawn O. Pearce
2008-08-11 7:43 ` Ken Pratt
2008-08-11 15:01 ` Shawn O. Pearce
2008-08-11 15:40 ` Avery Pennarun
2008-08-11 15:59 ` Shawn O. Pearce
2008-08-11 19:13 ` Ken Pratt
2008-08-11 19:10 ` Andi Kleen
2008-08-11 19:15 ` Ken Pratt
2008-08-13 2:38 ` Nicolas Pitre
2008-08-13 2:50 ` Andi Kleen
2008-08-13 2:57 ` Shawn O. Pearce
2008-08-11 19:22 ` Shawn O. Pearce
2008-08-11 19:29 ` Ken Pratt
2008-08-11 19:34 ` Shawn O. Pearce
2008-08-11 20:10 ` Andi Kleen
2008-08-13 3:12 ` Geert Bosch
2008-08-13 3:15 ` Shawn O. Pearce
2008-08-13 3:58 ` Geert Bosch
2008-08-13 14:37 ` Nicolas Pitre
2008-08-13 14:56 ` Jakub Narebski
2008-08-13 15:04 ` Shawn O. Pearce
2008-08-13 15:26 ` David Tweed [this message]
2008-08-13 23:54 ` Martin Langhoff
2008-08-14 9:04 ` David Tweed
2008-08-13 16:10 ` Johan Herland
2008-08-13 17:38 ` Ken Pratt
2008-08-13 17:57 ` Nicolas Pitre
2008-08-13 14:35 ` Nicolas Pitre
2008-08-13 14:59 ` Shawn O. Pearce
2008-08-13 15:43 ` Nicolas Pitre
2008-08-13 15:50 ` Shawn O. Pearce
2008-08-13 17:04 ` Nicolas Pitre
2008-08-13 17:19 ` Shawn O. Pearce
2008-08-14 6:33 ` Andreas Ericsson
2008-08-14 10:04 ` Thomas Rast
2008-08-14 10:15 ` Andreas Ericsson
2008-08-14 22:33 ` Shawn O. Pearce
2008-08-15 1:46 ` Nicolas Pitre
2008-08-14 14:01 ` Nicolas Pitre
2008-08-14 17:21 ` Linus Torvalds
2008-08-14 17:58 ` Linus Torvalds
2008-08-14 19:04 ` Nicolas Pitre
2008-08-14 19:44 ` Linus Torvalds
2008-08-14 21:30 ` Andi Kleen
2008-08-15 16:15 ` Linus Torvalds
2008-08-14 21:50 ` Nicolas Pitre
2008-08-14 23:14 ` Linus Torvalds
2008-08-14 23:39 ` Björn Steinbrink
2008-08-15 0:06 ` Linus Torvalds
2008-08-15 0:25 ` Linus Torvalds
2008-08-16 12:47 ` Björn Steinbrink
2008-08-16 0:34 ` Linus Torvalds
2008-09-07 1:03 ` Junio C Hamano
2008-09-07 1:46 ` Linus Torvalds
2008-09-07 2:33 ` Junio C Hamano
2008-09-07 17:11 ` Nicolas Pitre
2008-09-07 17:41 ` Junio C Hamano
2008-09-07 2:50 ` Jon Smirl
2008-09-07 3:07 ` Linus Torvalds
2008-09-07 3:43 ` Jon Smirl
2008-09-07 4:50 ` Linus Torvalds
2008-09-07 13:58 ` Jon Smirl
2008-09-07 17:08 ` Nicolas Pitre
2008-09-07 20:33 ` Jon Smirl
2008-09-08 14:17 ` Nicolas Pitre
2008-09-08 15:12 ` Jon Smirl
2008-09-08 16:01 ` Jon Smirl
2008-09-07 8:18 ` Andreas Ericsson
2008-09-07 7:45 ` Mike Hommey
2008-08-14 18:38 ` Nicolas Pitre
2008-08-14 18:55 ` Linus Torvalds
2008-08-13 16:01 ` Geert Bosch
2008-08-13 17:13 ` Dana How
2008-08-13 17:26 ` Nicolas Pitre
2008-08-13 12:43 ` Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e1dab3980808130826m4870df3ctf09ecf0062ef6e7c@mail.gmail.com \
--to=david.tweed@gmail.com \
--cc=andi@firstfloor.org \
--cc=bosch@adacore.com \
--cc=git@vger.kernel.org \
--cc=jnareb@gmail.com \
--cc=ken@kenpratt.net \
--cc=nico@cam.org \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).