From: Linus Torvalds <torvalds@osdl.org>
To: Shawn Pearce <spearce@spearce.org>
Cc: Junio C Hamano <junkio@cox.net>, Nicolas Pitre <nico@cam.org>,
git@vger.kernel.org
Subject: Re: 1.3.0 creating bigger packs than 1.2.3
Date: Thu, 20 Apr 2006 15:59:01 -0700 (PDT) [thread overview]
Message-ID: <Pine.LNX.4.64.0604201545190.3701@g5.osdl.org> (raw)
In-Reply-To: <20060420220240.GB32748@spearce.org>
On Thu, 20 Apr 2006, Shawn Pearce wrote:
> Junio C Hamano <junkio@cox.net> wrote:
> >
> > This _might_ improve things:
> >
> > diff --git a/pack-objects.c b/pack-objects.c
> > index 09f4f2c..0c6abe9 100644
> > --- a/pack-objects.c
> > +++ b/pack-objects.c
> > @@ -1037,7 +1039,7 @@ static int try_delta(struct unpacked *cu
> > sizediff = oldsize > size ? oldsize - size : size - oldsize;
> >
> > if (size < 50)
> > - return -1;
> > + return 0;
> > if (old_entry->depth >= max_depth)
> > return 0;
> >
> > @@ -1052,7 +1054,7 @@ static int try_delta(struct unpacked *cu
> > if (cur_entry->delta)
> > max_size = cur_entry->delta_size-1;
> > if (sizediff >= max_size)
> > - return -1;
> > + return 0;
> > delta_buf = diff_delta(old->data, oldsize,
> > cur->data, size, &delta_size, max_size);
> > if (!delta_buf)
>
> Holy cow, it did:
>
> Total 46391, written 46391 (delta 8391), reused 37774 (delta 0)
> 46M pack-7f766f5af5547554bacb28c0294bd562589dc5e7.pack
>
> That's the smallest packing I've seen yet. And it doesn't have a
> negative affect on repacking GIT either.
I think I know what's going on, and why your bisection claimed it was the
re-hashing change that was the problem, even though it really wasn't.
That
if (sizediff >= max_size)
return -1;
check is actually fairly _old_. It's from June 2005, ie it's from pretty
much the first two days of that packing thing existing in the first place.
(The initial repacking was done June 25th, with a lot of tweaking over the
next few days. That sizediff thing was part of the fairly early tweaking).
The thing is, that check made sense back then. Why? Because we sorted
things in decreasing size order back then (I think this was before we even
did any name-based heuristic sorting at all), so that when we tried the
delta algorithm, and the size diff was bigger than the last delta size, we
pretty much _knew_ the new delta would be bigger still, and there was no
point in continuing with try_delta.
HOWEVER. We have since changed the sorting to sort according to name
before it sorts according to size, so that old heuristic that depended on
the size being monotonically increasing simply doesn't make any sense any
more.
So I think at that second "return -1" really _should_ be changed to a
"return 0", and not just because it helps your particular case. It's
literally a bug these days, because the assumptions that caused it to
return -1 simply aren't true any more.
(It wasn't _strictly_ true even originally: even if the sizediff is huge,
the _delta_ may not be huge, since we can delete data with a small delta.
So it's quite likely that we should compare the "old is bigger than new"
and "new is bigger than old" separately and have different heuristics for
them. Again, that was simply not much of an issue back when we sorted just
by size).
So even the "return 0" might not be completely right. We might actually
want to look at how big the delta is, and return only once that fails.
Linus
next prev parent reply other threads:[~2006-04-20 22:59 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-04-20 13:36 1.3.0 creating bigger packs than 1.2.3 Shawn Pearce
2006-04-20 14:47 ` Linus Torvalds
2006-04-20 15:03 ` Shawn Pearce
2006-04-20 16:07 ` Linus Torvalds
2006-04-20 16:43 ` Shawn Pearce
2006-04-20 17:03 ` Linus Torvalds
2006-04-20 17:24 ` Junio C Hamano
2006-04-20 17:31 ` Shawn Pearce
2006-04-20 17:54 ` Nicolas Pitre
2006-04-20 21:31 ` Junio C Hamano
2006-04-20 21:53 ` Shawn Pearce
2006-04-20 21:56 ` Jakub Narebski
2006-04-20 17:41 ` Nicolas Pitre
2006-04-20 17:55 ` Shawn Pearce
2006-04-20 18:24 ` Nicolas Pitre
2006-04-20 18:49 ` Junio C Hamano
2006-04-20 21:02 ` Nicolas Pitre
2006-04-20 21:40 ` Junio C Hamano
2006-04-20 22:02 ` Shawn Pearce
2006-04-20 22:35 ` Junio C Hamano
2006-04-21 1:01 ` Shawn Pearce
2006-04-20 22:59 ` Linus Torvalds [this message]
2006-04-21 0:52 ` Nicolas Pitre
2006-04-21 1:20 ` Shawn Pearce
2006-04-21 2:28 ` Nicolas Pitre
2006-04-21 2:40 ` Shawn Pearce
2006-04-21 3:07 ` Nicolas Pitre
2006-04-21 2:32 ` Shawn Pearce
2006-04-20 23:02 ` Junio C Hamano
2006-04-20 16:09 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0604201545190.3701@g5.osdl.org \
--to=torvalds@osdl.org \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
--cc=nico@cam.org \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).