git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Shawn O. Pearce" <spearce@spearce.org>, git@vger.kernel.org
Subject: Re: [PATCH 0/2] optimizing pack access on "read only" fetch repos
Date: Tue, 29 Jan 2013 16:19:33 -0500	[thread overview]
Message-ID: <20130129211932.GA17377@sigill.intra.peff.net> (raw)
In-Reply-To: <7vwquwrng6.fsf@alter.siamese.dyndns.org>

On Tue, Jan 29, 2013 at 07:58:01AM -0800, Junio C Hamano wrote:

> The point is not about space.  Disk is cheap, and it is not making
> it any worse than what happens to your target audience, that is a
> fetch-only repository with only "gc --auto" in it, where nobody
> passes "-f" to "repack" to cause recomputation of delta.
> 
> What I was trying to seek was a way to reduce the runtime penalty we
> pay every time we run git in such a repository.
> 
>  - Object look-up cost will become log2(50*n) from 50*log2(n), which
>    is about 50/log2(50) improvement;

Yes and no. Our heuristic is to look at the last-used pack for an
object. So assuming we have locality of requests, we should quite often
get "lucky" and find the object in the first log2 search. Even if we
don't assume locality, a situation with one large pack and a few small
packs will have the large one as "last used" more often than the others,
and it will also have the looked-for object more often than the others

So I can see how it is something we could potentially optimize, but I
could also see it being surprisingly not a big deal. I'd be very
interested to see real measurements, even of something as simple as a
"master index" which can reference multiple packfiles.

>  - System resource cost we incur by having to keep 50 file
>    descriptors open and maintaining 50 mmap windows will reduce by
>    50 fold.

I wonder how measurable that is (and if it matters on Linux versus less
efficient platforms).

> > I would be interested to see the timing on how quick it is compared to a
> > real repack,...
> 
> Yes, that is what I meant by "wonder if we would be helped by" ;-)

There is only one way to find out... :)

Maybe I am blessed with nice machines, but I have mostly found the
repack process not to be that big a deal these days (especially with
threaded delta compression).

> > But how do these somewhat mediocre concatenated packs get turned into
> > real packs?
> 
> How do they get processed in a fetch-only repositories that
> sometimes run "gc --auto" today?  By runninng "repack -a -d -f"
> occasionally, perhaps?

Do we run "repack -adf" regularly? The usual "git gc" procedure will not
use "-f", and without that, we will not even consider making deltas
between objects that were formerly in different packs (but now are in
the same pack).

So you are avoiding doing medium-effort packs ("repack -ad") in favor of
doing potentially quick packs, but occasionally doing a big-effort pack
("repack -adf"). It may be reasonable advice to "repack -adf"
occasionally, but I suspect most people are not doing it regularly (if
only because "git gc" does not do it by default).

-Peff

  reply	other threads:[~2013-01-29 21:19 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-26 22:40 [PATCH 0/2] optimizing pack access on "read only" fetch repos Jeff King
2013-01-26 22:40 ` [PATCH 1/2] fetch: run gc --auto after fetching Jeff King
2013-01-27  1:51   ` Jonathan Nieder
     [not found]   ` <87bmopzbqx.fsf@gmail.com>
2017-07-12 20:00     ` git gc --auto aquires *.lock files that make a subsequent git-fetch error out Jeff King
2017-07-12 20:30       ` Ævar Arnfjörð Bjarmason
2017-07-12 20:43         ` Jeff King
2013-01-26 22:40 ` [PATCH 2/2] fetch-pack: avoid repeatedly re-scanning pack directory Jeff King
2013-01-27 10:27   ` Jonathan Nieder
2013-01-27 20:09     ` Junio C Hamano
2013-01-27 23:20       ` Jonathan Nieder
2013-01-27  6:32 ` [PATCH 0/2] optimizing pack access on "read only" fetch repos Junio C Hamano
2013-01-29  8:06   ` Shawn Pearce
2013-01-29  8:29   ` Jeff King
2013-01-29 15:25     ` Martin Fick
2013-01-29 15:58     ` Junio C Hamano
2013-01-29 21:19       ` Jeff King [this message]
2013-01-29 22:26         ` Junio C Hamano
2013-01-31 16:47         ` Shawn Pearce
2013-02-01  9:14           ` Jeff King
2013-02-02 10:07             ` Shawn Pearce
2013-01-29 11:01   ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130129211932.GA17377@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).