All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Kastrup <dak@gnu.org>
To: Jeff King <peff@peff.net>
Cc: Duy Nguyen <pclouds@gmail.com>, Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 4/4] gc --aggressive: three phase repacking
Date: Tue, 18 Mar 2014 07:16:21 +0100	[thread overview]
Message-ID: <87pplkvxoq.fsf@fencepost.gnu.org> (raw)
In-Reply-To: <20140318051342.GA17200@sigill.intra.peff.net> (Jeff King's message of "Tue, 18 Mar 2014 01:13:43 -0400")

Jeff King <peff@peff.net> writes:

> On Tue, Mar 18, 2014 at 12:00:48PM +0700, Duy Nguyen wrote:
>
>> On Tue, Mar 18, 2014 at 11:50 AM, Jeff King <peff@peff.net> wrote:
>> > On Sun, Mar 16, 2014 at 08:35:04PM +0700, Nguyễn Thái Ngọc Duy wrote:
>> >
>> >> As explained in the previous commit, current aggressive settings
>> >> --depth=250 --window=250 could slow down repository access
>> >> significantly. Notice that people usually work on recent history only,
>> >> we could keep recent history more loosely packed, so that repo access
>> >> is fast most of the time while the pack file remains small.
>> >
>> > One thing I have not seen is real-world timings showing the slowdown
>> > based on --depth. Did I miss them, or are we just making assumptions
>> > based on one old case from 2009 (that, AFAIK does not have real numbers,
>> > just speculation)? Has anyone measured the effect of bumping the delta
>> > cache size (and its hash implementation)?
>> 
>> David tested it with git-blame [1]. I should probably run some tests
>> too (I don't remember if I tested some operations last time).
>> 
>> http://thread.gmane.org/gmane.comp.version-control.git/242277/focus=242435
>
> Ah, thanks. I do remember that thread now.
>
> It looks like David's last word is that he gets a significant
> performance from bumping the delta base cache size (and number of
> buckets).

Increasing number of buckets was having comparatively minor effects
(that was the suggestion I started with), actually _degrading_
performance rather soon.  The delta base cache size was much more
noticeable.  I had prepared a patch serious increasing it.  The reason
I have not submitted it yet is that I have not found a compelling
real-world test case _apart_ from the fast git-blame that is still
missing implementation of -M and -C options.

There should be other commands digging through large amounts of old
history, but I did not really find something benchmarking convincingly.
Either most stuff is inefficient anyway, or the access order is
better-behaved, causing fewer unwanted cache flushes.

Access order in the optimized git-blame case is basically done with a
reverse commit-time based priority queue leading to a breadth-first
strategy.  It still beats unsorted access solidly in its timing.  Don't
think I compared depth-first results (inversing the priority queue
sorting condition) with regard to cache results, but it's bad for
interactive use as it tends to leave some recent history unblamed for a
long time while digging up stuff in the remote past.

Moderate cache size increases seem like a better strategy, and the
default size of 16M does not make a lot of sense with modern computers.
In particular since the history digging is rarely competing with other
memory intensive operations at the same time.

> And that matches the timings I just did. I suspect there are still
> pathological cases that could behave worse, but it really sounds like
> we should be looking into improving that cache as a first step.

I can put up a patch.  My git-blame experiments used 128M, and the patch
proposes a more conservative 64M.  I don't actually have made
experiments for the 64M setting, though.  The current default is 16M.

-- 
David Kastrup

  reply	other threads:[~2014-03-18  6:20 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-16 13:34 [PATCH 0/4] Better "gc --aggressive" Nguyễn Thái Ngọc Duy
2014-03-16 13:35 ` [PATCH 1/4] environment.c: fix constness for odb_pack_keep() Nguyễn Thái Ngọc Duy
2014-03-16 13:35 ` [PATCH] index-pack: do not segfault when keep_name is NULL Nguyễn Thái Ngọc Duy
2014-03-16 13:35 ` [PATCH 2/4] pack-objects: support --keep Nguyễn Thái Ngọc Duy
2014-03-16 13:35 ` [PATCH 3/4] gc --aggressive: make --depth configurable Nguyễn Thái Ngọc Duy
     [not found]   ` <CAG+J_Dw=Y5d2JTOngkxH=vNg3C43nP5=y7S6VXS=aHgmBshYZQ@mail.gmail.com>
2014-03-16 23:06     ` Duy Nguyen
2014-03-16 13:35 ` [PATCH 4/4] gc --aggressive: three phase repacking Nguyễn Thái Ngọc Duy
2014-03-17 22:12   ` Junio C Hamano
2014-03-17 22:59     ` Duy Nguyen
2014-03-17 23:07       ` Junio C Hamano
2014-03-18  4:50   ` Jeff King
2014-03-18  5:00     ` Duy Nguyen
2014-03-18  5:13       ` Jeff King
2014-03-18  6:16         ` David Kastrup [this message]
2014-03-19 11:03       ` Duy Nguyen
2014-03-18  5:07     ` Jeff King
2014-03-18  5:16       ` Duy Nguyen
2014-03-18  6:19         ` Duy Nguyen
2014-03-18  7:38           ` David Kastrup
     [not found]         ` <CALbm-EbZSuzynXoUNEifP=Ga_mj6Fp9L9Do-mxhRdMvUEfogig@mail.gmail.com>
2014-03-20  1:31           ` Duy Nguyen
2014-03-18  6:19       ` David Kastrup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pplkvxoq.fsf@fencepost.gnu.org \
    --to=dak@gnu.org \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.