git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: bdowning@lavos.net (Brian Downing)
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: Preferring shallower deltas on repack
Date: Mon, 9 Jul 2007 01:52:35 -0500	[thread overview]
Message-ID: <20070709065235.GJ4087@lavos.net> (raw)
In-Reply-To: <7v1wfixhvk.fsf@assigned-by-dhcp.cox.net>

On Sun, Jul 08, 2007 at 10:31:43PM -0700, Junio C Hamano wrote:
> Putting aside a potential argument that the way the file in
> question, version.lisp-expr, is kept track of might be insane,
> this is an interesting topic.

Yeah, that version numbering system worked quite well for CVS, given its
lack of any other kind of useful whole-tree versioning, and the fact
that there wasn't much branching and merging, due to it being a pain in
the ass.  If an when we move to something like Git, something else will
have to be done, as that file will /always/ be in conflict.

> In addition to the above stats, it may be interesting to know:
> 
>  - pack generation time and memory footprint (/usr/bin/time);
> 
>    I suspect you would have to try_delta more candidates, so
>    this may degrade a bit, but that is done for getting a better
>    deltification, so we would need to see if the extra cost is
>    within reason and worth spending.

It was already try_delta'ing everything in the window.  The only
difference now is that create_delta may generate one more byte of delta
before giving up.  That doesn't seem to have affected things at all
outside of sampling noise:

(These timings are for the Git pack on Linux/amd64, --window and --depth
both 100.  Since /usr/bin/time doesn't seem to report any useful memory
statistics on Linux, I also have a "ps aux" line from when the memory
size looked stable.  This was different from run to run but it shows the
two are in the same order of magnitude.)

Unpatched:
54.99user 0.18system 0:56.80elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (14major+32417minor)pagefaults 0swaps
bdowning  5290 98.7  4.5 106788 92900 pts/1    R+   01:26   0:49 git pack-obj

Patched:
55.37user 0.19system 0:56.35elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32249minor)pagefaults 0swaps
bdowning  6086  100  4.5 106880 92996 pts/1    R+   01:29   0:49 git pack-obj

>  - resulting pack size (ls -l pack-*.pack)
> 
>    I do not expect your change would degrade in this area, as
>    you are currently not trading size with shallower delta
>    depth.

The patched version is actually smaller in both SBCL's and Git's case
(again, --window 100 and --depth 100):

SBCL: 61696 bytes smaller (13294225-13232529)
Git:  16010 bytes smaller (12690424-12674414)

I believe the reason for this is that more deltas can get in under the
depth limit.  If I repack the Git pack with --depth=999999999, the patched
version generates a pack that is 1793 bytes smaller.  (12334183-12332390)
(Hmm, I was expecting that to be the same, I'm not sure why it's not.
Padding?)

> Regarding your patch, I think it does not look too bad, as you
> never pick delta that is larger than the best-so-far in favor of
> shallower depth.
> 
> It would become worrysome (*BUT* infinitely more interesting)
> once you start talking about a tradeoff between slightly larger
> delta and much shorter delta.  Such a tradeoff, if done right,
> would make a lot of sense, but I do not offhand think of a way
> to strike a proper balance between them efficiently.

Yeah, I was thinking about that too, and came to the same conclusion.
I suspect you'd have to save a /lot/ of delta depth to want to pay any
more I/O, though.

Another thing that might be iffy (and complicated) is that if you keep
making a good low-depth delta off of a particular object, it might be
good to promote it so it stays in the window for longer.

-bcd

  parent reply	other threads:[~2007-07-09  6:52 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-09  4:43 Preferring shallower deltas on repack Brian Downing
2007-07-09  4:45 ` [PATCH] pack-objects: Prefer shallower deltas if the size is equal Brian Downing
2007-07-09  5:31 ` Preferring shallower deltas on repack Junio C Hamano
2007-07-09  5:43   ` Junio C Hamano
2007-07-09  6:52   ` Brian Downing [this message]
2007-07-09  7:27     ` Junio C Hamano
2007-07-09  7:36       ` Brian Downing
2007-07-09 15:58   ` Nicolas Pitre
2007-07-09 16:39     ` Junio C Hamano
2007-07-09 18:53       ` Brian Downing
2007-07-09 19:13         ` Nicolas Pitre
2007-07-09 19:24           ` Brian Downing
2007-07-09 19:49             ` Brian Downing
2007-07-09 20:22               ` Nicolas Pitre
2007-07-09 20:23               ` Brian Downing
2007-07-09 19:30         ` [PATCH] Shoddy pack information tool Brian Downing
2007-07-11 21:55           ` Junio C Hamano
2007-07-12  3:02             ` [PATCH] Pack " Brian Downing
2007-07-09  5:41 ` Preferring shallower deltas on repack Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070709065235.GJ4087@lavos.net \
    --to=bdowning@lavos.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).