git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Avery Pennarun <apenwarr@gmail.com>
To: git@vger.kernel.org, gitster@pobox.com, nico@fluxnic.net
Cc: Avery Pennarun <apenwarr@gmail.com>
Subject: [PATCH] pack-objects: never deltify objects bigger than window_memory_limit.
Date: Wed, 22 Sep 2010 03:25:05 -0700	[thread overview]
Message-ID: <1285151105-32454-1-git-send-email-apenwarr@gmail.com> (raw)

With very large objects, just loading them into the delta window wastes a
huge amount of memory.  In one repo, I have some objects around 1GB in size,
and git-pack-objects seems to require about 8x that in order to deltify it,
even when the window memory limit is small (eg. --window-memory=100M).  With
this patch, the maximum memory usage is about halved.

Perhaps more importantly, however, disabling deltification for large objects
seems to reduce memory thrashing when you can't fit multiple large objects
into physical RAM at once.  It seems to be the difference between "never
finishes" and "finishes eventually" for me.

Test:

I created a test repo with 10 sequential commits containing a bunch of
nearly-identical 110MB files (just appending a line each time).

Without this patch:

    $ /usr/bin/time git repack -a --window-memory=100M

    Counting objects: 43, done.
    warning: suboptimal pack - out of memory
    Compressing objects: 100% (43/43), done.
    Writing objects: 100% (43/43), done.
    Total 43 (delta 14), reused 0 (delta 0)
    42.79user 1.07system 0:44.53elapsed 98%CPU (0avgtext+0avgdata
      866736maxresident)k
      0inputs+2752outputs (0major+718471minor)pagefaults 0swaps

With this patch:

    $ /usr/bin/time -a git repack -a --window-memory=100M

    Counting objects: 43, done.
    Compressing objects: 100% (30/30), done.
    Writing objects: 100% (43/43), done.
    Total 43 (delta 14), reused 0 (delta 0)
    35.86user 0.65system 0:36.30elapsed 100%CPU (0avgtext+0avgdata
      437568maxresident)k
      0inputs+2768outputs (0major+366137minor)pagefaults 0swaps

It apparently still uses about 4x the memory of the largest object, which is
about twice as good as before, though still kind of awful.  (Ideally, we
wouldn't even load the entire large object into memory even once.)

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
---
 builtin/pack-objects.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 0e81673..9f1a289 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1791,6 +1791,9 @@ static void prepare_pack(int window, int depth)
 		if (entry->size < 50)
 			continue;
 
+		if (window_memory_limit && entry->size > window_memory_limit)
+                	continue;
+
 		if (entry->no_try_delta)
 			continue;
 
-- 
1.7.3.1.gca9d1

             reply	other threads:[~2010-09-22 10:25 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-22 10:25 Avery Pennarun [this message]
2010-09-22 12:00 ` [PATCH] pack-objects: never deltify objects bigger than window_memory_limit Nicolas Pitre
2010-09-23  5:01   ` Avery Pennarun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1285151105-32454-1-git-send-email-apenwarr@gmail.com \
    --to=apenwarr@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=nico@fluxnic.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).