git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: bdowning@lavos.net (Brian Downing)
To: Junio C Hamano <gitster@pobox.com>
Cc: Nicolas Pitre <nico@cam.org>, git@vger.kernel.org
Subject: Re: Preferring shallower deltas on repack
Date: Mon, 9 Jul 2007 13:53:53 -0500	[thread overview]
Message-ID: <20070709185353.GL4087@lavos.net> (raw)
In-Reply-To: <7vir8tv8dr.fsf@assigned-by-dhcp.cox.net>

On Mon, Jul 09, 2007 at 09:39:44AM -0700, Junio C Hamano wrote:
> > OK here it is.  And results on the GIT repo and another patalogical test 
> > repo I keep around are actually really nice!  Not only the pack itself 
> > is a bit smaller, but the delta depth distribution as shown by 
> > git-verify-pack -v is much nicer with the bulk of deltas in the low 
> > depth end of the spectrum and no more peak at the max depth level.
> 
> Looks obviously correct.  Brian, it would be very interesting to
> see what Nico's patch does to your dataset.

Nico's patch makes the overall statistics look better, but the
version.lisp-expr file still goes 593 levels deep, as opposed to about
65 with my patch.  (That's better than 980 with stock Git, though.)

Pack statistics from my shoddy analysis tool (I'll post it a bit later):

"sizes" are all object sizes in the pack.  For deltas this is just the
delta size.  "path sizes" is the size of the /path/ to each object in
the file; this is the size of the base and each patch in the chain to
the object.  This is approximately how much data you have to read to
get to an object.  "depths" should be obvious.

SBCL, stock git:
      all sizes: count 46829 total 30256118 min 0 max 1012295 mean 646.10 median 45 std_dev 9555.48
 all path sizes: count 46829 total 1551200401 min 0 max 1012295 mean 33124.78 median 11661 std_dev 55310.88
         depths: count 46829 total 4693372 min 0 max 980 mean 100.22 median 12 std_dev 188.21

SBCL, my patch:
      all sizes: count 46829 total 30251762 min 0 max 1012295 mean 646.00 median 45 std_dev 9555.48
 all path sizes: count 46829 total 1529629918 min 0 max 1012295 mean 32664.16 median 11213 std_dev 54930.06
         depths: count 46829 total 2883121 min 0 max 787 mean 61.57 median 11 std_dev 127.64

SBCL, Nico's patch:
      all sizes: count 46829 total 30253345 min 0 max 1012295 mean 646.04 median 45 std_dev 9555.49
 all path sizes: count 46829 total 1518730701 min 0 max 1012295 mean 32431.41 median 10819 std_dev 54751.35
         depths: count 46829 total 3694511 min 0 max 699 mean 78.89 median 12 std_dev 141.53

I'm vaguely working on an alternate weighting mechanism based on path
sizes, but so far all I've been able to do is generate some really
strange packs.  :)

-bcd

  reply	other threads:[~2007-07-09 18:54 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-09  4:43 Preferring shallower deltas on repack Brian Downing
2007-07-09  4:45 ` [PATCH] pack-objects: Prefer shallower deltas if the size is equal Brian Downing
2007-07-09  5:31 ` Preferring shallower deltas on repack Junio C Hamano
2007-07-09  5:43   ` Junio C Hamano
2007-07-09  6:52   ` Brian Downing
2007-07-09  7:27     ` Junio C Hamano
2007-07-09  7:36       ` Brian Downing
2007-07-09 15:58   ` Nicolas Pitre
2007-07-09 16:39     ` Junio C Hamano
2007-07-09 18:53       ` Brian Downing [this message]
2007-07-09 19:13         ` Nicolas Pitre
2007-07-09 19:24           ` Brian Downing
2007-07-09 19:49             ` Brian Downing
2007-07-09 20:22               ` Nicolas Pitre
2007-07-09 20:23               ` Brian Downing
2007-07-09 19:30         ` [PATCH] Shoddy pack information tool Brian Downing
2007-07-11 21:55           ` Junio C Hamano
2007-07-12  3:02             ` [PATCH] Pack " Brian Downing
2007-07-09  5:41 ` Preferring shallower deltas on repack Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070709185353.GL4087@lavos.net \
    --to=bdowning@lavos.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=nico@cam.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).