From: Nicolas Pitre <nico@cam.org>
To: "Shawn O. Pearce" <spearce@spearce.org>
Cc: Stephan Hennig <mailing_list@arcor.de>,
Andreas Ericsson <ae@op5.se>,
Johannes Schindelin <Johannes.Schindelin@gmx.de>,
Jakub Narebski <jnareb@gmail.com>,
Junio C Hamano <gitster@pobox.com>,
git@vger.kernel.org
Subject: Re: [RFC PATCH] index-pack: Issue a warning if deltaBaseCacheLimit is too small
Date: Thu, 17 Jul 2008 19:45:03 -0400 (EDT) [thread overview]
Message-ID: <alpine.LFD.1.10.0807171914270.3213@xanadu.home> (raw)
In-Reply-To: <20080717220251.GA3072@spearce.org>
On Thu, 17 Jul 2008, Shawn O. Pearce wrote:
> Its rare that we should exceed deltaBaseCacheLimit while resolving
> delta compressed objects. By default this limit is 16M, and most
> chains are under 50 objects in length. This affords about 327K per
> object in the chain, which is quite large by source code standards.
>
> If we have to recreate a prior delta base because we evicted it to
> stay within the deltaBaseCacheLimit we can warn the user that their
> configured limit is perhaps too low for this repository data set.
> If the user keeps seeing the warning they can research it in the
> documentation, and consider setting it higher on this repository,
> or just globally on their system.
As I said earlier, I don't think this is a good idea, but I'll elaborate
a bit more.
First, this is a really bad clue for setting deltaBaseCacheLimit. The
likelyhood of this warning to actually show up during an initial clone
is relatively high, yet this doesn't mean that deltaBaseCacheLimit has
to be changed at all. For one, the real time usage of
deltaBaseCacheLimit is to cap a cache of objects for multiple delta
chains with random access, and not only one chain traversed linearly
like in the index-pack case,
and that cache is
likely to always be full and in active eviction mode -- that's the point
of a cap after all. In the index-pack this is only used to avoid
excessive memory usage for intermediate delta results and not really a
cache. In other words, we have two rather different usages for the same
settings. Now don't read me wrong: I think that reusing this setting is
sensible, but its value should not be determined by what index-pack may
happen to do with it, especially on a first clone. And issuing warnings
on the first clone is not the way to give new users confidence either.
Secondly, on subsequent fetches, the warning is likely to never appear
again due to the fact that the delta chains will typically be much
shorter. And that would be true even if in reality the runtime access
to the repository would benefit a lot from deltaBaseCacheLimit being
raised. And it is the runtime access which is important here, not the
occasional fetch. Yet the full delta chains are not likely to be walked
in their entirety very often anyway either.
Thirdly, if such indication is considered useful, then it should really
be part of some statistic/analysis tool, such as verify-pack for
example. Such a tool could compute the exact memory requirements for a
given repository usage and possibly provide suggestions as to what the
optimal deltaBaseCacheLimit value could be. But yet that cache has a
hardcoded number of entries at the moment and its hash function might
not be optimal either, making the connection with index-pack even more
apart.
And finally, I think that index-pack would benefit a lot from a really
simple optimization which is to free the resulting intermediate delta
base object right away when there is only one delta child to resolve,
before that child is itself used as a base for further delta
grand-children. That is likely to cover most cases of big delta chains
already, making that warning an even worse indicator.
> Suggested-by: Stephan Hennig <mailing_list@arcor.de>
> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Unrecommended-by: Nicolas Pitre <nico@cam.org>
Nicolas
next prev parent reply other threads:[~2008-07-17 23:46 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-10 14:40 git pull is slow Stephan Hennig
2008-07-10 15:13 ` Martin Langhoff
2008-07-10 15:28 ` Petr Baudis
2008-07-10 15:30 ` Johannes Sixt
2008-07-10 15:45 ` Stephan Hennig
2008-07-10 15:50 ` Petr Baudis
2008-07-10 17:44 ` Stephan Hennig
2008-07-11 12:25 ` Stephan Hennig
2008-07-11 13:34 ` Andreas Ericsson
2008-07-11 14:04 ` Johannes Schindelin
2008-07-12 12:32 ` Stephan Hennig
2008-07-12 17:05 ` Johannes Schindelin
2008-07-13 1:15 ` Shawn O. Pearce
2008-07-13 13:59 ` Johannes Schindelin
2008-07-13 22:11 ` Shawn O. Pearce
2008-07-14 2:07 ` [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack Shawn O. Pearce
2008-07-14 2:27 ` Nicolas Pitre
2008-07-14 3:12 ` Shawn O. Pearce
2008-07-14 11:44 ` Johannes Schindelin
2008-07-14 11:54 ` Jakub Narebski
2008-07-14 12:10 ` Johannes Schindelin
2008-07-14 12:16 ` Andreas Ericsson
2008-07-14 12:25 ` Johannes Schindelin
2008-07-14 12:51 ` Andreas Ericsson
2008-07-14 12:58 ` Johannes Schindelin
2008-07-15 2:21 ` Nicolas Pitre
2008-07-15 2:47 ` Shawn O. Pearce
2008-07-15 3:06 ` Nicolas Pitre
2008-07-17 16:06 ` Stephan Hennig
2008-07-17 16:25 ` Nicolas Pitre
2008-07-17 21:35 ` Shawn O. Pearce
2008-07-17 22:02 ` [RFC PATCH] index-pack: Issue a warning if deltaBaseCacheLimit is too small Shawn O. Pearce
2008-07-17 23:45 ` Nicolas Pitre [this message]
2008-07-15 4:19 ` [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack Shawn O. Pearce
2008-07-14 2:07 ` [PATCH 1/4] index-pack: Refactor base arguments of resolve_delta into a struct Shawn O. Pearce
2008-07-15 2:40 ` Nicolas Pitre
2008-07-14 2:07 ` [PATCH 2/4] index-pack: Chain the struct base_data on the stack for traversal Shawn O. Pearce
2008-07-15 2:48 ` Nicolas Pitre
2008-07-14 2:07 ` [PATCH 3/4] index-pack: Track the object_entry that creates each base_data Shawn O. Pearce
2008-07-14 10:15 ` Johannes Schindelin
2008-07-15 2:50 ` Nicolas Pitre
2008-07-15 3:20 ` Shawn O. Pearce
2008-07-15 3:42 ` Nicolas Pitre
2008-07-14 2:07 ` [PATCH 4/4] index-pack: Honor core.deltaBaseCacheLimit when resolving deltas Shawn O. Pearce
2008-07-15 3:05 ` Nicolas Pitre
2008-07-15 3:18 ` Shawn O. Pearce
2008-07-15 4:45 ` [PATCH v2] " Shawn O. Pearce
2008-07-15 5:05 ` Nicolas Pitre
2008-07-15 18:48 ` Junio C Hamano
2008-07-13 9:01 ` git pull is slow Stephan Hennig
2008-07-11 12:55 ` Stephan Hennig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.1.10.0807171914270.3213@xanadu.home \
--to=nico@cam.org \
--cc=Johannes.Schindelin@gmx.de \
--cc=ae@op5.se \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jnareb@gmail.com \
--cc=mailing_list@arcor.de \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).