git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Shawn Pearce <spearce@spearce.org>
Cc: Junio C Hamano <junkio@cox.net>, Nicolas Pitre <nico@cam.org>,
	git@vger.kernel.org
Subject: Re: 1.3.0 creating bigger packs than 1.2.3
Date: Thu, 20 Apr 2006 15:59:01 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0604201545190.3701@g5.osdl.org> (raw)
In-Reply-To: <20060420220240.GB32748@spearce.org>



On Thu, 20 Apr 2006, Shawn Pearce wrote:

> Junio C Hamano <junkio@cox.net> wrote:
> > 
> > This _might_ improve things:
> > 
> > diff --git a/pack-objects.c b/pack-objects.c
> > index 09f4f2c..0c6abe9 100644
> > --- a/pack-objects.c
> > +++ b/pack-objects.c
> > @@ -1037,7 +1039,7 @@ static int try_delta(struct unpacked *cu
> >  	sizediff = oldsize > size ? oldsize - size : size - oldsize;
> >  
> >  	if (size < 50)
> > -		return -1;
> > +		return 0;
> >  	if (old_entry->depth >= max_depth)
> >  		return 0;
> >  
> > @@ -1052,7 +1054,7 @@ static int try_delta(struct unpacked *cu
> >  	if (cur_entry->delta)
> >  		max_size = cur_entry->delta_size-1;
> >  	if (sizediff >= max_size)
> > -		return -1;
> > +		return 0;
> >  	delta_buf = diff_delta(old->data, oldsize,
> >  			       cur->data, size, &delta_size, max_size);
> >  	if (!delta_buf)
> 
> Holy cow, it did:
> 
>   Total 46391, written 46391 (delta 8391), reused 37774 (delta 0)
>    46M pack-7f766f5af5547554bacb28c0294bd562589dc5e7.pack
> 
> That's the smallest packing I've seen yet.  And it doesn't have a
> negative affect on repacking GIT either.

I think I know what's going on, and why your bisection claimed it was the 
re-hashing change that was the problem, even though it really wasn't.

That

	if (sizediff >= max_size)
		return -1;

check is actually fairly _old_. It's from June 2005, ie it's from pretty 
much the first two days of that packing thing existing in the first place.

(The initial repacking was done June 25th, with a lot of tweaking over the 
next few days. That sizediff thing was part of the fairly early tweaking).

The thing is, that check made sense back then. Why? Because we sorted 
things in decreasing size order back then (I think this was before we even 
did any name-based heuristic sorting at all), so that when we tried the 
delta algorithm, and the size diff was bigger than the last delta size, we 
pretty much _knew_ the new delta would be bigger still, and there was no 
point in continuing with try_delta.

HOWEVER. We have since changed the sorting to sort according to name 
before it sorts according to size, so that old heuristic that depended on 
the size being monotonically increasing simply doesn't make any sense any 
more.

So I think at that second "return -1" really _should_ be changed to a 
"return 0", and not just because it helps your particular case. It's 
literally a bug these days, because the assumptions that caused it to 
return -1 simply aren't true any more.

(It wasn't _strictly_ true even originally: even if the sizediff is huge, 
the _delta_ may not be huge, since we can delete data with a small delta. 
So it's quite likely that we should compare the "old is bigger than new" 
and "new is bigger than old" separately and have different heuristics for 
them. Again, that was simply not much of an issue back when we sorted just 
by size).

So even the "return 0" might not be completely right. We might actually 
want to look at how big the delta is, and return only once that fails.

			Linus

  parent reply	other threads:[~2006-04-20 22:59 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-20 13:36 1.3.0 creating bigger packs than 1.2.3 Shawn Pearce
2006-04-20 14:47 ` Linus Torvalds
2006-04-20 15:03   ` Shawn Pearce
2006-04-20 16:07     ` Linus Torvalds
2006-04-20 16:43       ` Shawn Pearce
2006-04-20 17:03         ` Linus Torvalds
2006-04-20 17:24           ` Junio C Hamano
2006-04-20 17:31           ` Shawn Pearce
2006-04-20 17:54             ` Nicolas Pitre
2006-04-20 21:31             ` Junio C Hamano
2006-04-20 21:53               ` Shawn Pearce
2006-04-20 21:56               ` Jakub Narebski
2006-04-20 17:41           ` Nicolas Pitre
2006-04-20 17:55           ` Shawn Pearce
2006-04-20 18:24             ` Nicolas Pitre
2006-04-20 18:49               ` Junio C Hamano
2006-04-20 21:02                 ` Nicolas Pitre
2006-04-20 21:40                   ` Junio C Hamano
2006-04-20 22:02                     ` Shawn Pearce
2006-04-20 22:35                       ` Junio C Hamano
2006-04-21  1:01                         ` Shawn Pearce
2006-04-20 22:59                       ` Linus Torvalds [this message]
2006-04-21  0:52                     ` Nicolas Pitre
2006-04-21  1:20                     ` Shawn Pearce
2006-04-21  2:28                       ` Nicolas Pitre
2006-04-21  2:40                         ` Shawn Pearce
2006-04-21  3:07                           ` Nicolas Pitre
2006-04-21  2:32                       ` Shawn Pearce
2006-04-20 23:02                   ` Junio C Hamano
2006-04-20 16:09 ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0604201545190.3701@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=nico@cam.org \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).