From: bdowning@lavos.net (Brian Downing)
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: Preferring shallower deltas on repack
Date: Mon, 9 Jul 2007 01:52:35 -0500 [thread overview]
Message-ID: <20070709065235.GJ4087@lavos.net> (raw)
In-Reply-To: <7v1wfixhvk.fsf@assigned-by-dhcp.cox.net>
On Sun, Jul 08, 2007 at 10:31:43PM -0700, Junio C Hamano wrote:
> Putting aside a potential argument that the way the file in
> question, version.lisp-expr, is kept track of might be insane,
> this is an interesting topic.
Yeah, that version numbering system worked quite well for CVS, given its
lack of any other kind of useful whole-tree versioning, and the fact
that there wasn't much branching and merging, due to it being a pain in
the ass. If an when we move to something like Git, something else will
have to be done, as that file will /always/ be in conflict.
> In addition to the above stats, it may be interesting to know:
>
> - pack generation time and memory footprint (/usr/bin/time);
>
> I suspect you would have to try_delta more candidates, so
> this may degrade a bit, but that is done for getting a better
> deltification, so we would need to see if the extra cost is
> within reason and worth spending.
It was already try_delta'ing everything in the window. The only
difference now is that create_delta may generate one more byte of delta
before giving up. That doesn't seem to have affected things at all
outside of sampling noise:
(These timings are for the Git pack on Linux/amd64, --window and --depth
both 100. Since /usr/bin/time doesn't seem to report any useful memory
statistics on Linux, I also have a "ps aux" line from when the memory
size looked stable. This was different from run to run but it shows the
two are in the same order of magnitude.)
Unpatched:
54.99user 0.18system 0:56.80elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (14major+32417minor)pagefaults 0swaps
bdowning 5290 98.7 4.5 106788 92900 pts/1 R+ 01:26 0:49 git pack-obj
Patched:
55.37user 0.19system 0:56.35elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32249minor)pagefaults 0swaps
bdowning 6086 100 4.5 106880 92996 pts/1 R+ 01:29 0:49 git pack-obj
> - resulting pack size (ls -l pack-*.pack)
>
> I do not expect your change would degrade in this area, as
> you are currently not trading size with shallower delta
> depth.
The patched version is actually smaller in both SBCL's and Git's case
(again, --window 100 and --depth 100):
SBCL: 61696 bytes smaller (13294225-13232529)
Git: 16010 bytes smaller (12690424-12674414)
I believe the reason for this is that more deltas can get in under the
depth limit. If I repack the Git pack with --depth=999999999, the patched
version generates a pack that is 1793 bytes smaller. (12334183-12332390)
(Hmm, I was expecting that to be the same, I'm not sure why it's not.
Padding?)
> Regarding your patch, I think it does not look too bad, as you
> never pick delta that is larger than the best-so-far in favor of
> shallower depth.
>
> It would become worrysome (*BUT* infinitely more interesting)
> once you start talking about a tradeoff between slightly larger
> delta and much shorter delta. Such a tradeoff, if done right,
> would make a lot of sense, but I do not offhand think of a way
> to strike a proper balance between them efficiently.
Yeah, I was thinking about that too, and came to the same conclusion.
I suspect you'd have to save a /lot/ of delta depth to want to pay any
more I/O, though.
Another thing that might be iffy (and complicated) is that if you keep
making a good low-depth delta off of a particular object, it might be
good to promote it so it stays in the window for longer.
-bcd
next prev parent reply other threads:[~2007-07-09 6:52 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-09 4:43 Preferring shallower deltas on repack Brian Downing
2007-07-09 4:45 ` [PATCH] pack-objects: Prefer shallower deltas if the size is equal Brian Downing
2007-07-09 5:31 ` Preferring shallower deltas on repack Junio C Hamano
2007-07-09 5:43 ` Junio C Hamano
2007-07-09 6:52 ` Brian Downing [this message]
2007-07-09 7:27 ` Junio C Hamano
2007-07-09 7:36 ` Brian Downing
2007-07-09 15:58 ` Nicolas Pitre
2007-07-09 16:39 ` Junio C Hamano
2007-07-09 18:53 ` Brian Downing
2007-07-09 19:13 ` Nicolas Pitre
2007-07-09 19:24 ` Brian Downing
2007-07-09 19:49 ` Brian Downing
2007-07-09 20:22 ` Nicolas Pitre
2007-07-09 20:23 ` Brian Downing
2007-07-09 19:30 ` [PATCH] Shoddy pack information tool Brian Downing
2007-07-11 21:55 ` Junio C Hamano
2007-07-12 3:02 ` [PATCH] Pack " Brian Downing
2007-07-09 5:41 ` Preferring shallower deltas on repack Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070709065235.GJ4087@lavos.net \
--to=bdowning@lavos.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.