From: Martin Fick <mfick@codeaurora.org>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Git List <git@vger.kernel.org>,
Ramkumar Ramachandra <artagnon@gmail.com>
Subject: Re: [PATCH] git exproll: steps to tackle gc aggression
Date: Tue, 6 Aug 2013 18:10:46 -0600 [thread overview]
Message-ID: <201308061810.46562.mfick@codeaurora.org> (raw)
In-Reply-To: <CACsJy8CGWJ07Uk8EBjfejdyshKB1NKk=_7VUoeyZWZgJFqCSkg@mail.gmail.com>
On Tuesday, August 06, 2013 06:24:50 am Duy Nguyen wrote:
> On Tue, Aug 6, 2013 at 9:38 AM, Ramkumar Ramachandra
<artagnon@gmail.com> wrote:
> > + Garbage collect using a pseudo
> > logarithmic packfile maintenance +
> > approach. This approach attempts to minimize packfile
> > churn + by keeping several generations
> > of varying sized packfiles around + and
> > only consolidating packfiles (or loose objects) which
> > are + either new packfiles, or packfiles
> > close to the same size as + another
> > packfile.
>
> I wonder if a simpler approach may be nearly efficient as
> this one: keep the largest pack out, repack the rest at
> fetch/push time so there are at most 2 packs at a time.
> Or we we could do the repack at 'gc --auto' time, but
> with lower pack threshold (about 10 or so). When the
> second pack is as big as, say half the size of the
> first, merge them into one at "gc --auto" time. This can
> be easily implemented in git-repack.sh.
It would definitely be better than the current gc approach.
However, I suspect it is still at least one to two orders of
magnitude off from where it should be. To give you a real
world example, on our server today when gitexproll ran on
our kernel/msm repo, it consolidated 317 pack files into one
almost 8M packfile (it compresses/dedupes shockingly well,
one of those new packs was 33M). Our largest packfile in
that repo is 1.5G!
So let's now imagine that the second closest packfile is
only 100M, it would keep getting consolidated with 8M worth
of data every day (assuming the same conditions and no extra
compression). That would take (750M-100M)/8M ~ 81 days to
finally build up large enough to no longer consolidate the
new packs with the second largest pack file daily. During
those 80+ days, it will be on average writing 325M too much
per day (when it should be writing just 8M).
So I can see the appeal of a simple solution, unfortunately
I think one layer would still "suck" though. And if you are
going to add even just one extra layer, I suspect that you
might as well go the full distance since you probably
already need to implement the logic to do so?
-Martin
--
The Qualcomm Innovation Center, Inc. is a member of Code
Aurora Forum, hosted by The Linux Foundation
next prev parent reply other threads:[~2013-08-07 0:10 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-06 2:38 [PATCH] git exproll: steps to tackle gc aggression Ramkumar Ramachandra
2013-08-06 12:24 ` Duy Nguyen
2013-08-06 17:39 ` Junio C Hamano
2013-08-07 4:43 ` Ramkumar Ramachandra
2013-08-08 7:13 ` Junio C Hamano
2013-08-08 7:44 ` Ramkumar Ramachandra
2013-08-08 16:56 ` Junio C Hamano
2013-08-08 17:34 ` Martin Fick
2013-08-08 18:52 ` Junio C Hamano
2013-08-08 19:14 ` Ramkumar Ramachandra
2013-08-08 17:36 ` Ramkumar Ramachandra
2013-08-08 19:37 ` Junio C Hamano
2013-08-08 20:04 ` Ramkumar Ramachandra
2013-08-08 21:09 ` Martin Langhoff
2013-08-09 11:00 ` Jeff King
2013-08-09 13:34 ` Ramkumar Ramachandra
2013-08-09 17:35 ` Junio C Hamano
2013-08-09 22:16 ` Jeff King
2013-08-10 1:24 ` Duy Nguyen
2013-08-10 9:50 ` Jeff King
2013-08-10 5:26 ` Junio C Hamano
2013-08-10 8:42 ` Ramkumar Ramachandra
2013-08-10 9:24 ` Duy Nguyen
2013-08-10 9:28 ` Duy Nguyen
2013-08-10 9:43 ` Jeff King
2013-08-10 9:50 ` Duy Nguyen
2013-08-10 10:05 ` Ramkumar Ramachandra
2013-08-10 10:16 ` Duy Nguyen
2013-08-10 9:38 ` Jeff King
2013-08-07 0:10 ` Martin Fick [this message]
2013-08-08 2:18 ` Duy Nguyen
2013-08-07 0:25 ` Martin Fick
2013-08-07 4:36 ` Ramkumar Ramachandra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201308061810.46562.mfick@codeaurora.org \
--to=mfick@codeaurora.org \
--cc=artagnon@gmail.com \
--cc=git@vger.kernel.org \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.