git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@fluxnic.net>
To: Avery Pennarun <apenwarr@gmail.com>
Cc: git@vger.kernel.org, gitster@pobox.com
Subject: Re: [PATCH] pack-objects: never deltify objects bigger than window_memory_limit.
Date: Wed, 22 Sep 2010 08:00:20 -0400 (EDT)	[thread overview]
Message-ID: <alpine.LFD.2.00.1009220749440.13233@xanadu.home> (raw)
In-Reply-To: <1285151105-32454-1-git-send-email-apenwarr@gmail.com>

On Wed, 22 Sep 2010, Avery Pennarun wrote:

> With very large objects, just loading them into the delta window wastes a
> huge amount of memory.  In one repo, I have some objects around 1GB in size,
> and git-pack-objects seems to require about 8x that in order to deltify it,
> even when the window memory limit is small (eg. --window-memory=100M).  With
> this patch, the maximum memory usage is about halved.
> 
> Perhaps more importantly, however, disabling deltification for large objects
> seems to reduce memory thrashing when you can't fit multiple large objects
> into physical RAM at once.  It seems to be the difference between "never
> finishes" and "finishes eventually" for me.
> 
> Test:
> 
> I created a test repo with 10 sequential commits containing a bunch of
> nearly-identical 110MB files (just appending a line each time).
> 
> Without this patch:
> 
>     $ /usr/bin/time git repack -a --window-memory=100M
> 
>     Counting objects: 43, done.
>     warning: suboptimal pack - out of memory
>     Compressing objects: 100% (43/43), done.
>     Writing objects: 100% (43/43), done.
>     Total 43 (delta 14), reused 0 (delta 0)
>     42.79user 1.07system 0:44.53elapsed 98%CPU (0avgtext+0avgdata
>       866736maxresident)k
>       0inputs+2752outputs (0major+718471minor)pagefaults 0swaps
> 
> With this patch:
> 
>     $ /usr/bin/time -a git repack -a --window-memory=100M
> 
>     Counting objects: 43, done.
>     Compressing objects: 100% (30/30), done.
>     Writing objects: 100% (43/43), done.
>     Total 43 (delta 14), reused 0 (delta 0)
>     35.86user 0.65system 0:36.30elapsed 100%CPU (0avgtext+0avgdata
>       437568maxresident)k
>       0inputs+2768outputs (0major+366137minor)pagefaults 0swaps
> 
> It apparently still uses about 4x the memory of the largest object, which is
> about twice as good as before, though still kind of awful.  (Ideally, we
> wouldn't even load the entire large object into memory even once.)

To not load big objects into memory, we'd have to add support for the 
core.bigFileThreshold config option in more places.

>  builtin/pack-objects.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 0e81673..9f1a289 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -1791,6 +1791,9 @@ static void prepare_pack(int window, int depth)
>  		if (entry->size < 50)
>  			continue;
>  
> +		if (window_memory_limit && entry->size > window_memory_limit)
> +                	continue;
> +

I think you should even use entry->size/2 here, or even entry->size/4.  
The reason for that is 1) you need at least 2 such similar objects in 
memory to find a possible delta, and 2) reference object to delta 
against has to be block indexed and that index table is almost the same 
size as the object itself especially on 64-bit machines.


Nicolas

  reply	other threads:[~2010-09-22 12:00 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-22 10:25 [PATCH] pack-objects: never deltify objects bigger than window_memory_limit Avery Pennarun
2010-09-22 12:00 ` Nicolas Pitre [this message]
2010-09-23  5:01   ` Avery Pennarun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.00.1009220749440.13233@xanadu.home \
    --to=nico@fluxnic.net \
    --cc=apenwarr@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).