git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@cam.org>
To: "Shawn O. Pearce" <spearce@spearce.org>
Cc: Stephan Hennig <mailing_list@arcor.de>,
	Andreas Ericsson <ae@op5.se>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Jakub Narebski <jnareb@gmail.com>,
	Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org
Subject: Re: [RFC PATCH] index-pack: Issue a warning if deltaBaseCacheLimit is too small
Date: Thu, 17 Jul 2008 19:45:03 -0400 (EDT)	[thread overview]
Message-ID: <alpine.LFD.1.10.0807171914270.3213@xanadu.home> (raw)
In-Reply-To: <20080717220251.GA3072@spearce.org>

On Thu, 17 Jul 2008, Shawn O. Pearce wrote:

> Its rare that we should exceed deltaBaseCacheLimit while resolving
> delta compressed objects.  By default this limit is 16M, and most
> chains are under 50 objects in length.  This affords about 327K per
> object in the chain, which is quite large by source code standards.
> 
> If we have to recreate a prior delta base because we evicted it to
> stay within the deltaBaseCacheLimit we can warn the user that their
> configured limit is perhaps too low for this repository data set.
> If the user keeps seeing the warning they can research it in the
> documentation, and consider setting it higher on this repository,
> or just globally on their system.

As I said earlier, I don't think this is a good idea, but I'll elaborate 
a bit more.

First, this is a really bad clue for setting deltaBaseCacheLimit.  The 
likelyhood of this warning to actually show up during an initial clone 
is relatively high, yet this doesn't mean that deltaBaseCacheLimit has 
to be changed at all.  For one, the real time usage of 
deltaBaseCacheLimit is to cap a cache of objects for multiple delta 
chains with random access, and not only one chain traversed linearly 
like in the index-pack case, 
and that cache is 
likely to always be full and in active eviction mode -- that's the point 
of a cap after all.  In the index-pack this is only used to avoid 
excessive memory usage for intermediate delta results and not really a 
cache.  In other words, we have two rather different usages for the same 
settings.  Now don't read me wrong: I think that reusing this setting is 
sensible, but its value should not be determined by what index-pack may 
happen to do with it, especially on a first clone.  And issuing warnings 
on the first clone is not the way to give new users confidence either.

Secondly, on subsequent fetches, the warning is likely to never appear 
again due to the fact that the delta chains will typically be much 
shorter.  And that would be true even if in reality the runtime access 
to the repository would benefit a lot from deltaBaseCacheLimit being 
raised.  And it is the runtime access which is important here, not the 
occasional fetch.  Yet the full delta chains are not likely to be walked 
in their entirety very often anyway either.

Thirdly, if such indication is considered useful, then it should really 
be part of some statistic/analysis tool, such as verify-pack for 
example.  Such a tool could compute the exact memory requirements for a 
given repository usage and possibly provide suggestions as to what the 
optimal deltaBaseCacheLimit value could be.  But yet that cache has a 
hardcoded number of entries at the moment and its hash function might 
not be optimal either, making the connection with index-pack even more 
apart.

And finally, I think that index-pack would benefit a lot from a really 
simple optimization which is to free the resulting intermediate delta 
base object right away when there is only one delta child to resolve, 
before that child is itself used as a base for further delta 
grand-children.  That is likely to cover most cases of big delta chains 
already, making that warning an even worse indicator.

> Suggested-by: Stephan Hennig <mailing_list@arcor.de>
> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>

Unrecommended-by: Nicolas Pitre <nico@cam.org>


Nicolas

  reply	other threads:[~2008-07-17 23:46 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-10 14:40 git pull is slow Stephan Hennig
2008-07-10 15:13 ` Martin Langhoff
2008-07-10 15:28 ` Petr Baudis
2008-07-10 15:30 ` Johannes Sixt
2008-07-10 15:45   ` Stephan Hennig
2008-07-10 15:50     ` Petr Baudis
2008-07-10 17:44       ` Stephan Hennig
2008-07-11 12:25 ` Stephan Hennig
2008-07-11 13:34   ` Andreas Ericsson
2008-07-11 14:04     ` Johannes Schindelin
2008-07-12 12:32       ` Stephan Hennig
2008-07-12 17:05         ` Johannes Schindelin
2008-07-13  1:15           ` Shawn O. Pearce
2008-07-13 13:59             ` Johannes Schindelin
2008-07-13 22:11               ` Shawn O. Pearce
2008-07-14  2:07             ` [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack Shawn O. Pearce
2008-07-14  2:27               ` Nicolas Pitre
2008-07-14  3:12                 ` Shawn O. Pearce
2008-07-14 11:44                   ` Johannes Schindelin
2008-07-14 11:54                     ` Jakub Narebski
2008-07-14 12:10                       ` Johannes Schindelin
2008-07-14 12:16                       ` Andreas Ericsson
2008-07-14 12:25                         ` Johannes Schindelin
2008-07-14 12:51                           ` Andreas Ericsson
2008-07-14 12:58                             ` Johannes Schindelin
2008-07-15  2:21                             ` Nicolas Pitre
2008-07-15  2:47                               ` Shawn O. Pearce
2008-07-15  3:06                                 ` Nicolas Pitre
2008-07-17 16:06                                 ` Stephan Hennig
2008-07-17 16:25                                   ` Nicolas Pitre
2008-07-17 21:35                                     ` Shawn O. Pearce
2008-07-17 22:02                                       ` [RFC PATCH] index-pack: Issue a warning if deltaBaseCacheLimit is too small Shawn O. Pearce
2008-07-17 23:45                                         ` Nicolas Pitre [this message]
2008-07-15  4:19                       ` [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack Shawn O. Pearce
2008-07-14  2:07             ` [PATCH 1/4] index-pack: Refactor base arguments of resolve_delta into a struct Shawn O. Pearce
2008-07-15  2:40               ` Nicolas Pitre
2008-07-14  2:07             ` [PATCH 2/4] index-pack: Chain the struct base_data on the stack for traversal Shawn O. Pearce
2008-07-15  2:48               ` Nicolas Pitre
2008-07-14  2:07             ` [PATCH 3/4] index-pack: Track the object_entry that creates each base_data Shawn O. Pearce
2008-07-14 10:15               ` Johannes Schindelin
2008-07-15  2:50               ` Nicolas Pitre
2008-07-15  3:20                 ` Shawn O. Pearce
2008-07-15  3:42                   ` Nicolas Pitre
2008-07-14  2:07             ` [PATCH 4/4] index-pack: Honor core.deltaBaseCacheLimit when resolving deltas Shawn O. Pearce
2008-07-15  3:05               ` Nicolas Pitre
2008-07-15  3:18                 ` Shawn O. Pearce
2008-07-15  4:45                   ` [PATCH v2] " Shawn O. Pearce
2008-07-15  5:05                     ` Nicolas Pitre
2008-07-15 18:48                     ` Junio C Hamano
2008-07-13  9:01           ` git pull is slow Stephan Hennig
2008-07-11 12:55 ` Stephan Hennig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.1.10.0807171914270.3213@xanadu.home \
    --to=nico@cam.org \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=ae@op5.se \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=mailing_list@arcor.de \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).