From: Linus Torvalds <torvalds@osdl.org>
To: Carl Baldwin <cnb@fc.hp.com>
Cc: Nicolas Pitre <nico@cam.org>, Junio C Hamano <junkio@cox.net>,
git@vger.kernel.org
Subject: Re: [PATCH] diff-delta: produce optimal pack data
Date: Fri, 24 Feb 2006 12:02:51 -0800 (PST) [thread overview]
Message-ID: <Pine.LNX.4.64.0602241152290.22647@g5.osdl.org> (raw)
In-Reply-To: <20060224192354.GC387@hpsvcnb.fc.hp.com>
On Fri, 24 Feb 2006, Carl Baldwin wrote:
>
> I meant that there is no benefit in disk space usage. Packing may
> actually increase my disk space usage in this case. Refer to what I
> said about experimentally running gzip and xdelta on the files
> independantly of git.
Yes. The deltas tend to compress a lot less well than "normal" files.
> I see what you're saying about this data reuse helping to speed up
> subsequent cloning operations. However, if packing takes this long and
> doesn't give me any disk space savings then I don't want to pay the very
> heavy price of packing these files even the first time nor do I want to
> pay the price incrementally.
I would look at tuning the heuristics in "try_delta()" (pack-objects.c) a
bit. That's the place that decides whether to even bother trying to make a
delta, and how big a delta is acceptable. For example, looking at them, I
already see one bug:
..
sizediff = oldsize > size ? oldsize - size : size - oldsize;
if (sizediff > size / 8)
return -1;
..
we really should compare sizediff to the _smaller_ of the two sizes, and
skip the delta if the difference in sizes is bound to be bigger than that.
However, the "size / 8" thing isn't a very strict limit anyway, so this
probably doesn't matter (and I think Nico already removed it as part of
his patches: the heuristic can make us avoid some deltas that would be
ok).
The other thing to look at is "max_size": right now it initializes that to
"size / 2 - 20", which just says that we don't ever want a delta that is
larger than about half the result (plus the 20 byte overhead for pointing
to the thing we delta against). Again, if you feel that normal compression
compresses better than half, you could try changing that to
..
max_size = size / 4 - 20;
..
or something like that instead (but then you need to check that it's still
positive - otherwise the comparisons with unsigned later on are screwed
up. Right now that value is guaranteed to be positive if only because we
already checked
..
if (size < 50)
return -1;
..
earlier).
NOTE! Every SINGLE one of those heuristics are just totally made up by
yours truly, and have no testing behind them. They're more of the type
"that sounds about right" than "this is how it must be". As mentioned,
Nico has already been playing with the heuristics - but he wanted better
packs, not better CPU usage, so he went the other way from what you would
want to try..
Linus
next prev parent reply other threads:[~2006-02-24 20:11 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-02-22 1:45 [PATCH] diff-delta: produce optimal pack data Nicolas Pitre
2006-02-24 8:49 ` Junio C Hamano
2006-02-24 15:37 ` Nicolas Pitre
2006-02-24 23:55 ` Junio C Hamano
2006-02-24 17:44 ` Carl Baldwin
2006-02-24 17:56 ` Nicolas Pitre
2006-02-24 18:35 ` Carl Baldwin
2006-02-24 18:57 ` Nicolas Pitre
2006-02-24 19:23 ` Carl Baldwin
2006-02-24 20:02 ` Nicolas Pitre
2006-02-24 20:40 ` Carl Baldwin
2006-02-24 21:12 ` Nicolas Pitre
2006-02-24 22:50 ` Carl Baldwin
2006-02-25 3:53 ` Nicolas Pitre
2006-02-24 20:02 ` Linus Torvalds [this message]
2006-02-24 20:19 ` Nicolas Pitre
2006-02-24 20:53 ` Junio C Hamano
2006-02-24 21:39 ` Nicolas Pitre
2006-02-24 21:48 ` Nicolas Pitre
2006-02-25 0:45 ` Linus Torvalds
2006-02-25 3:07 ` Nicolas Pitre
2006-02-25 4:05 ` Linus Torvalds
2006-02-25 5:10 ` Nicolas Pitre
2006-02-25 5:35 ` Nicolas Pitre
2006-03-07 23:48 ` [RFH] zlib gurus out there? Junio C Hamano
2006-03-08 0:59 ` Linus Torvalds
2006-03-08 1:22 ` Junio C Hamano
2006-03-08 2:00 ` Linus Torvalds
2006-03-08 9:46 ` Johannes Schindelin
2006-03-08 10:45 ` [PATCH] write_sha1_file(): Perform Z_FULL_FLUSH between header and data Sergey Vlasov
2006-03-08 11:04 ` Junio C Hamano
2006-03-08 14:17 ` Sergey Vlasov
2006-02-25 19:18 ` [PATCH] diff-delta: produce optimal pack data Linus Torvalds
2006-02-24 18:49 ` Carl Baldwin
2006-02-24 19:03 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0602241152290.22647@g5.osdl.org \
--to=torvalds@osdl.org \
--cc=cnb@fc.hp.com \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
--cc=nico@cam.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).