All of lore.kernel.org
 help / color / mirror / Atom feed
From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: Colin Stolley <cstolley@runbox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] packfile.c: speed up loading lots of packfiles.
Date: Mon, 2 Dec 2019 18:40:35 +0100	[thread overview]
Message-ID: <20191202174035.GJ23183@szeder.dev> (raw)
In-Reply-To: <20191127222453.GA3765@owl.colinstolley.com>

On Wed, Nov 27, 2019 at 04:24:53PM -0600, Colin Stolley wrote:
> When loading packfiles on start-up, we traverse the internal packfile
> list once per file to avoid reloading packfiles that have already
> been loaded. This check runs in quadratic time, so for poorly
> maintained repos with a large number of packfiles, it can be pretty
> slow.
> 
> Add a hashmap containing the packfile names as we load them so that
> the average runtime cost of checking for already-loaded packs becomes
> constant.
> 
> Add a perf test to p5303 to show speed-up.
> 
> The existing p5303 test runtimes are dominated by other factors and do
> not show an appreciable speed-up. The new test in p5303 clearly exposes
> a speed-up in bad cases. In this test we create 10,000 packfiles and
> measure the start-up time of git rev-parse, which does little else
> besides load in the packs.
> 
> Here are the numbers for the new p5303 test:
> 
> Test                         HEAD^             HEAD
> ---------------------------------------------------------------------
> 5303.12: load 10,000 packs   1.03(0.92+0.10)   0.12(0.02+0.09) -88.3%
> 
> Thanks-to: Jeff King <peff@peff.net>
> Signed-off-by: Colin Stolley <cstolley@runbox.com>
> ---

This patch break test 'gc --keep-largest-pack' in 't6500-gc.sh' when
run with GIT_TEST_MULTI_PACK_INDEX=1, because there is a duplicate
entry in '.git/objects/info/packs':

  expecting success of 6500.7 'gc --keep-largest-pack':
          test_create_repo keep-pack &&
          (
                  cd keep-pack &&
                  test_commit one &&
                  test_commit two &&
                  test_commit three &&
                  git gc &&
                  ( cd .git/objects/pack && ls *.pack ) >pack-list &&
                  test_line_count = 1 pack-list &&
                  BASE_PACK=.git/objects/pack/pack-*.pack &&
                  test_commit four &&
                  git repack -d &&
                  test_commit five &&
                  git repack -d &&
                  ( cd .git/objects/pack && ls *.pack ) >pack-list &&
                  test_line_count = 3 pack-list &&
                  git gc --keep-largest-pack &&
                  ( cd .git/objects/pack && ls *.pack ) >pack-list &&
                  test_line_count = 2 pack-list &&
                  awk "/^P /{print \$2}" <.git/objects/info/packs >pack-info &&
                  test_line_count = 2 pack-info &&
                  test_path_is_file $BASE_PACK &&
                  git fsck
          )
  
  + test_create_repo keep-pack
  Initialized empty Git repository in /home/szeder/src/git/t/trash directory.t6500-gc/keep-pack/.git/
  + cd keep-pack
  + test_commit one
  [master (root-commit) d79ce16] one
   Author: A U Thor <author@example.com>
   1 file changed, 1 insertion(+)
   create mode 100644 one.t
  + test_commit two
  [master 139b20d] two
   Author: A U Thor <author@example.com>
   1 file changed, 1 insertion(+)
   create mode 100644 two.t
  + test_commit three
  [master 7c7cd71] three
   Author: A U Thor <author@example.com>
   1 file changed, 1 insertion(+)
   create mode 100644 three.t
  + git gc
  Computing commit graph generation numbers:  33% (1/3)^MComputing commit graph generation numbers:  66% (2/3)^MComputing commit graph generation numbers: 100% (3/3)^MComputing commit graph generation numbers: 100% (3/3), done.
  + cd .git/objects/pack
  + ls pack-a4b37b9b5458e8116b1c1840185b39fb5e6b8726.pack
  + test_line_count = 1 pack-list
  + BASE_PACK=.git/objects/pack/pack-*.pack
  + test_commit four
  [master fd8d77e] four
   Author: A U Thor <author@example.com>
   1 file changed, 1 insertion(+)
   create mode 100644 four.t
  + git repack -d
  + test_commit five
  [master a383792] five
   Author: A U Thor <author@example.com>
   1 file changed, 1 insertion(+)
   create mode 100644 five.t
  + git repack -d
  + cd .git/objects/pack
  + ls pack-057d7f493a7c26d58090f4777ff66d4c226c4408.pack pack-54feec766fc7d2d204b03879d96f4595d7e48c37.pack pack-a4b37b9b5458e8116b1c1840185b39fb5e6b8726.pack
  + test_line_count = 3 pack-list
  + git gc --keep-largest-pack
  Computing commit graph generation numbers:  20% (1/5)^MComputing commit graph generation numbers:  40% (2/5)^MComputing commit graph generation numbers:  60% (3/5)^MComputing commit graph generation numbers:  80% (4/5)^MComputing commit graph generation numbers: 100% (5/5)^MComputing commit graph generation numbers: 100% (5/5), done.
  + cd .git/objects/pack
  + ls pack-390dbbb8e27c014b080c08dfc482d4982d4c6644.pack pack-a4b37b9b5458e8116b1c1840185b39fb5e6b8726.pack
  + test_line_count = 2 pack-list
  + awk /^P /{print $2}
  + test_line_count = 2 pack-info
  test_line_count: line count for pack-info != 2
  pack-a4b37b9b5458e8116b1c1840185b39fb5e6b8726.pack
  pack-a4b37b9b5458e8116b1c1840185b39fb5e6b8726.pack
  pack-390dbbb8e27c014b080c08dfc482d4982d4c6644.pack
  error: last command exited with $?=1
  not ok 7 - gc --keep-largest-pack



  parent reply	other threads:[~2019-12-02 17:40 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-27 22:24 [PATCH] packfile.c: speed up loading lots of packfiles Colin Stolley
2019-11-28  0:42 ` hashmap vs khash? " Eric Wong
2019-11-30 17:36   ` Junio C Hamano
2019-12-02 14:39   ` Jeff King
2019-12-02 17:40 ` SZEDER Gábor [this message]
2019-12-02 19:42   ` Jeff King
2019-12-03  6:17     ` Taylor Blau
2019-12-03 15:34       ` Jeff King
2019-12-03 16:04     ` Junio C Hamano
2019-12-03 17:33       ` Colin Stolley
2019-12-03 22:18         ` Jeff King
2019-12-04 18:15           ` Junio C Hamano
2019-12-03 22:17       ` Jeff King
2019-12-04  4:23         ` Jonathan Nieder
2019-12-03  6:19 ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191202174035.GJ23183@szeder.dev \
    --to=szeder.dev@gmail.com \
    --cc=cstolley@runbox.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.