git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@cam.org>
To: Andreas Ericsson <ae@op5.se>
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Jakub Narebski <jnareb@gmail.com>,
	"Shawn O. Pearce" <spearce@spearce.org>,
	Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org, Stephan Hennig <mailing_list@arcor.de>
Subject: Re: [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack
Date: Mon, 14 Jul 2008 22:21:41 -0400 (EDT)	[thread overview]
Message-ID: <alpine.LFD.1.10.0807142203530.12484@xanadu.home> (raw)
In-Reply-To: <487B4BD8.5030208@op5.se>

On Mon, 14 Jul 2008, Andreas Ericsson wrote:

> Johannes Schindelin wrote:
> > Hi,
> > 
> > On Mon, 14 Jul 2008, Andreas Ericsson wrote:
> > 
> > > Sorry for being clueless here, but why does the older versions need
> > > to be kept in-memory anyway? Aren't we applying the delta each time
> > > we find one, repeatedly creating a new base-object in-memory for
> > > each delta? If we aren't doing that, why do we need more than just
> > > a small amount of memory just for keeping the delta?
> > 
> > Think of a delta chain of 49 delta objects, 1 base object.  Now reconstruct
> > all of the objects.
> > 
> > If you do it one by one, not storing the intermediate results, you end up
> > applying 1 delta for the first, 2 for the second (first the first, then the
> > second), and so on, in total 1 + 2 + 3 + ... + 49 = 49 * 50 / 2 = 1225
> > times.
> > 
> > Compare that to the 49 times when reusing the intermediate results.
> > 
> 
> That's only true if you discard the result of applying D1 to DB though.
> What I'm wondering is; Why isn't it done like this (pseudo-code):
> 
> while (delta = find_earlier_delta(sha1)) {
> 	if (is_base_object(delta)) {
> 		base_object = delta;
> 		break;
> 	}
> 	push_delta(delta, patch_stack);
> }
> 
> while (pop_delta(patch_stack))
> 	apply_delta(base_object, delta);
> 
> 
> 
> where "apply_delta" replaces the base_object->blob with the delta
> applied, releasing the previously used memory?
> 
> That way, you'll never use more memory than the largest ever size
> of the object + 1 delta at a time and still not apply the delta
> more than delta_chain_length-1 times.

Those delta chains aren't simple chained lists.  They are trees of 
deltas where one object might be the base for an unlimited number of 
deltas of depth 1, and in turn each of those deltas might constitute the 
base for an unlimited number of deltas of depth 2, and so on.

So what the code does is to find out which objects are not deltas but 
are the base for a delta.  Then, for each of them, all deltas having 
given object for base are found and they are recursively resolved so 
each resolved delta is then considered a possible base for more deltas, 
etc.  In other words, those deltas are resolved by walking the delta 
tree in a "depth first" fashion.

If we discard previous delta bases, we will have to recreate them each 
time a delta sibling is processed.  And if those delta bases are 
themselves deltas then you have an explosion of delta results to 
re-compute.


Nicolas

  parent reply	other threads:[~2008-07-15  2:22 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-10 14:40 git pull is slow Stephan Hennig
2008-07-10 15:13 ` Martin Langhoff
2008-07-10 15:28 ` Petr Baudis
2008-07-10 15:30 ` Johannes Sixt
2008-07-10 15:45   ` Stephan Hennig
2008-07-10 15:50     ` Petr Baudis
2008-07-10 17:44       ` Stephan Hennig
2008-07-11 12:25 ` Stephan Hennig
2008-07-11 13:34   ` Andreas Ericsson
2008-07-11 14:04     ` Johannes Schindelin
2008-07-12 12:32       ` Stephan Hennig
2008-07-12 17:05         ` Johannes Schindelin
2008-07-13  1:15           ` Shawn O. Pearce
2008-07-13 13:59             ` Johannes Schindelin
2008-07-13 22:11               ` Shawn O. Pearce
2008-07-14  2:07             ` [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack Shawn O. Pearce
2008-07-14  2:27               ` Nicolas Pitre
2008-07-14  3:12                 ` Shawn O. Pearce
2008-07-14 11:44                   ` Johannes Schindelin
2008-07-14 11:54                     ` Jakub Narebski
2008-07-14 12:10                       ` Johannes Schindelin
2008-07-14 12:16                       ` Andreas Ericsson
2008-07-14 12:25                         ` Johannes Schindelin
2008-07-14 12:51                           ` Andreas Ericsson
2008-07-14 12:58                             ` Johannes Schindelin
2008-07-15  2:21                             ` Nicolas Pitre [this message]
2008-07-15  2:47                               ` Shawn O. Pearce
2008-07-15  3:06                                 ` Nicolas Pitre
2008-07-17 16:06                                 ` Stephan Hennig
2008-07-17 16:25                                   ` Nicolas Pitre
2008-07-17 21:35                                     ` Shawn O. Pearce
2008-07-17 22:02                                       ` [RFC PATCH] index-pack: Issue a warning if deltaBaseCacheLimit is too small Shawn O. Pearce
2008-07-17 23:45                                         ` Nicolas Pitre
2008-07-15  4:19                       ` [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack Shawn O. Pearce
2008-07-14  2:07             ` [PATCH 1/4] index-pack: Refactor base arguments of resolve_delta into a struct Shawn O. Pearce
2008-07-15  2:40               ` Nicolas Pitre
2008-07-14  2:07             ` [PATCH 2/4] index-pack: Chain the struct base_data on the stack for traversal Shawn O. Pearce
2008-07-15  2:48               ` Nicolas Pitre
2008-07-14  2:07             ` [PATCH 3/4] index-pack: Track the object_entry that creates each base_data Shawn O. Pearce
2008-07-14 10:15               ` Johannes Schindelin
2008-07-15  2:50               ` Nicolas Pitre
2008-07-15  3:20                 ` Shawn O. Pearce
2008-07-15  3:42                   ` Nicolas Pitre
2008-07-14  2:07             ` [PATCH 4/4] index-pack: Honor core.deltaBaseCacheLimit when resolving deltas Shawn O. Pearce
2008-07-15  3:05               ` Nicolas Pitre
2008-07-15  3:18                 ` Shawn O. Pearce
2008-07-15  4:45                   ` [PATCH v2] " Shawn O. Pearce
2008-07-15  5:05                     ` Nicolas Pitre
2008-07-15 18:48                     ` Junio C Hamano
2008-07-13  9:01           ` git pull is slow Stephan Hennig
2008-07-11 12:55 ` Stephan Hennig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.1.10.0807142203530.12484@xanadu.home \
    --to=nico@cam.org \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=ae@op5.se \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=mailing_list@arcor.de \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).