From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: "SZEDER Gábor" <szeder.dev@gmail.com>,
"Colin Stolley" <cstolley@runbox.com>,
git@vger.kernel.org
Subject: Re: [PATCH] packfile.c: speed up loading lots of packfiles.
Date: Mon, 2 Dec 2019 22:17:44 -0800 [thread overview]
Message-ID: <20191203061744.GA74594@syl.local> (raw)
In-Reply-To: <20191202194231.GA10707@sigill.intra.peff.net>
On Mon, Dec 02, 2019 at 02:42:31PM -0500, Jeff King wrote:
> On Mon, Dec 02, 2019 at 06:40:35PM +0100, SZEDER Gábor wrote:
>
> > > When loading packfiles on start-up, we traverse the internal packfile
> > > list once per file to avoid reloading packfiles that have already
> > > been loaded. This check runs in quadratic time, so for poorly
> > > maintained repos with a large number of packfiles, it can be pretty
> > > slow.
> > >
> > > Add a hashmap containing the packfile names as we load them so that
> > > the average runtime cost of checking for already-loaded packs becomes
> > > constant.
> > [...]
> > This patch break test 'gc --keep-largest-pack' in 't6500-gc.sh' when
> > run with GIT_TEST_MULTI_PACK_INDEX=1, because there is a duplicate
> > entry in '.git/objects/info/packs':
>
> Good catch. The issue is that we only add entries to the hashmap in
> prepare_packed_git(), but they may be added to the pack list by other
> callers of install_packed_git(). It probably makes sense to just push
> the hashmap maintenance down into that function, like below. That
> requires an extra strhash() when inserting a new pack, but I don't think
> that's a big deal.
Ah, great catch, and thanks for pointing it out. We have been running
this patch in production at GitHub for a few weeks now, but didn't
notice this because we never run tests with
'GIT_TEST_MULTI_PACK_INDEX=1' in the environment.
Perhaps in the future that might change, but I think that for now that
can explain why the failure wasn't noticed earlier.
> diff --git a/packfile.c b/packfile.c
> index 253559fa87..f0dc63e92f 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -757,6 +757,9 @@ void install_packed_git(struct repository *r, struct packed_git *pack)
>
> pack->next = r->objects->packed_git;
> r->objects->packed_git = pack;
> +
> + hashmap_entry_init(&pack->packmap_ent, strhash(pack->pack_name));
> + hashmap_add(&r->objects->pack_map, &pack->packmap_ent);
> }
>
> void (*report_garbage)(unsigned seen_bits, const char *path);
> @@ -864,11 +867,8 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
> /* Don't reopen a pack we already have. */
> if (!hashmap_get(&data->r->objects->pack_map, &hent, pack_name)) {
> p = add_packed_git(full_name, full_name_len, data->local);
> - if (p) {
> - hashmap_entry_init(&p->packmap_ent, hash);
> - hashmap_add(&data->r->objects->pack_map, &p->packmap_ent);
> + if (p)
> install_packed_git(data->r, p);
> - }
> }
> free(pack_name);
> }
>
> -Peff
Thanks,
Taylor
next prev parent reply other threads:[~2019-12-03 6:17 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-27 22:24 [PATCH] packfile.c: speed up loading lots of packfiles Colin Stolley
2019-11-28 0:42 ` hashmap vs khash? " Eric Wong
2019-11-30 17:36 ` Junio C Hamano
2019-12-02 14:39 ` Jeff King
2019-12-02 17:40 ` SZEDER Gábor
2019-12-02 19:42 ` Jeff King
2019-12-03 6:17 ` Taylor Blau [this message]
2019-12-03 15:34 ` Jeff King
2019-12-03 16:04 ` Junio C Hamano
2019-12-03 17:33 ` Colin Stolley
2019-12-03 22:18 ` Jeff King
2019-12-04 18:15 ` Junio C Hamano
2019-12-03 22:17 ` Jeff King
2019-12-04 4:23 ` Jonathan Nieder
2019-12-03 6:19 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191203061744.GA74594@syl.local \
--to=me@ttaylorr.com \
--cc=cstolley@runbox.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
--cc=szeder.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.