From: bdowning@lavos.net (Brian Downing)
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: Preferring shallower deltas on repack
Date: Mon, 9 Jul 2007 01:52:35 -0500 [thread overview]
Message-ID: <20070709065235.GJ4087@lavos.net> (raw)
In-Reply-To: <7v1wfixhvk.fsf@assigned-by-dhcp.cox.net>
On Sun, Jul 08, 2007 at 10:31:43PM -0700, Junio C Hamano wrote:
> Putting aside a potential argument that the way the file in
> question, version.lisp-expr, is kept track of might be insane,
> this is an interesting topic.
Yeah, that version numbering system worked quite well for CVS, given its
lack of any other kind of useful whole-tree versioning, and the fact
that there wasn't much branching and merging, due to it being a pain in
the ass. If an when we move to something like Git, something else will
have to be done, as that file will /always/ be in conflict.
> In addition to the above stats, it may be interesting to know:
>
> - pack generation time and memory footprint (/usr/bin/time);
>
> I suspect you would have to try_delta more candidates, so
> this may degrade a bit, but that is done for getting a better
> deltification, so we would need to see if the extra cost is
> within reason and worth spending.
It was already try_delta'ing everything in the window. The only
difference now is that create_delta may generate one more byte of delta
before giving up. That doesn't seem to have affected things at all
outside of sampling noise:
(These timings are for the Git pack on Linux/amd64, --window and --depth
both 100. Since /usr/bin/time doesn't seem to report any useful memory
statistics on Linux, I also have a "ps aux" line from when the memory
size looked stable. This was different from run to run but it shows the
two are in the same order of magnitude.)
Unpatched:
54.99user 0.18system 0:56.80elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (14major+32417minor)pagefaults 0swaps
bdowning 5290 98.7 4.5 106788 92900 pts/1 R+ 01:26 0:49 git pack-obj
Patched:
55.37user 0.19system 0:56.35elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32249minor)pagefaults 0swaps
bdowning 6086 100 4.5 106880 92996 pts/1 R+ 01:29 0:49 git pack-obj
> - resulting pack size (ls -l pack-*.pack)
>
> I do not expect your change would degrade in this area, as
> you are currently not trading size with shallower delta
> depth.
The patched version is actually smaller in both SBCL's and Git's case
(again, --window 100 and --depth 100):
SBCL: 61696 bytes smaller (13294225-13232529)
Git: 16010 bytes smaller (12690424-12674414)
I believe the reason for this is that more deltas can get in under the
depth limit. If I repack the Git pack with --depth=999999999, the patched
version generates a pack that is 1793 bytes smaller. (12334183-12332390)
(Hmm, I was expecting that to be the same, I'm not sure why it's not.
Padding?)
> Regarding your patch, I think it does not look too bad, as you
> never pick delta that is larger than the best-so-far in favor of
> shallower depth.
>
> It would become worrysome (*BUT* infinitely more interesting)
> once you start talking about a tradeoff between slightly larger
> delta and much shorter delta. Such a tradeoff, if done right,
> would make a lot of sense, but I do not offhand think of a way
> to strike a proper balance between them efficiently.
Yeah, I was thinking about that too, and came to the same conclusion.
I suspect you'd have to save a /lot/ of delta depth to want to pay any
more I/O, though.
Another thing that might be iffy (and complicated) is that if you keep
making a good low-depth delta off of a particular object, it might be
good to promote it so it stays in the window for longer.
-bcd
next prev parent reply other threads:[~2007-07-09 6:52 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-09 4:43 Preferring shallower deltas on repack Brian Downing
2007-07-09 4:45 ` [PATCH] pack-objects: Prefer shallower deltas if the size is equal Brian Downing
2007-07-09 5:31 ` Preferring shallower deltas on repack Junio C Hamano
2007-07-09 5:43 ` Junio C Hamano
2007-07-09 6:52 ` Brian Downing [this message]
2007-07-09 7:27 ` Junio C Hamano
2007-07-09 7:36 ` Brian Downing
2007-07-09 15:58 ` Nicolas Pitre
2007-07-09 16:39 ` Junio C Hamano
2007-07-09 18:53 ` Brian Downing
2007-07-09 19:13 ` Nicolas Pitre
2007-07-09 19:24 ` Brian Downing
2007-07-09 19:49 ` Brian Downing
2007-07-09 20:22 ` Nicolas Pitre
2007-07-09 20:23 ` Brian Downing
2007-07-09 19:30 ` [PATCH] Shoddy pack information tool Brian Downing
2007-07-11 21:55 ` Junio C Hamano
2007-07-12 3:02 ` [PATCH] Pack " Brian Downing
2007-07-09 5:41 ` Preferring shallower deltas on repack Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070709065235.GJ4087@lavos.net \
--to=bdowning@lavos.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).