From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Michael Haggerty <mhagger@alum.mit.edu>
Subject: Re: [PATCH 0/6] address packed-refs speed regressions
Date: Sun, 05 Apr 2015 15:41:39 +0200 [thread overview]
Message-ID: <55213B93.9050207@web.de> (raw)
In-Reply-To: <20150405010611.GA15901@peff.net>
Am 05.04.2015 um 03:06 schrieb Jeff King:
> As I've mentioned before, I have some repositories with rather large
> numbers of refs. The worst one has ~13 million refs, for a 1.6GB
> packed-refs file. So I was saddened by this:
>
> $ time git.v2.0.0 rev-parse refs/heads/foo >/dev/null 2>&1
> real 0m6.840s
> user 0m6.404s
> sys 0m0.440s
>
> $ time git.v2.4.0-rc1 rev-parse refs/heads/foo >/dev/null 2>&1
> real 0m19.432s
> user 0m18.996s
> sys 0m0.456s
>
> The command isn't important; what I'm really measuring is loading the
> packed-refs file. And yes, of course this repository is absolutely
> ridiculous. But the slowdowns here are linear with the number of refs.
> So _every_ git command got a little bit slower, even in less crazy
> repositories. We just didn't notice it as much.
>
> Here are the numbers after this series:
>
> real 0m8.539s
> user 0m8.052s
> sys 0m0.496s
>
> Much better, but I'm frustrated that they are still 20% slower than the
> original.
>
> The main culprits seem to be d0f810f (which introduced some extra
> expensive code for each ref) and my 10c497a, which switched from fgets()
> to strbuf_getwholeline. It turns out that strbuf_getwholeline is really
> slow.
10c497a changed read_packed_refs(), which reads *all* packed refs.
Each is checked for validity. That sounds expensive if the goal is
just to look up a single (non-existing) ref.
Would it help to defer any checks until a ref is actually accessed?
Can a binary search be used instead of reading the whole file?
I wonder if pluggable reference backends could help here. Storing refs
in a database table indexed by refname should simplify things.
Short-term, can we avoid the getc()/strbuf_grow() dance e.g. by mapping
the packed refs file? What numbers do you get with the following patch?
---
refs.c | 36 ++++++++++++++++++++++++++++--------
1 file changed, 28 insertions(+), 8 deletions(-)
diff --git a/refs.c b/refs.c
index 47e4e53..144255f 100644
--- a/refs.c
+++ b/refs.c
@@ -1153,16 +1153,35 @@ static const char *parse_ref_line(struct strbuf *line, unsigned char *sha1)
* compatibility with older clients, but we do not require it
* (i.e., "peeled" is a no-op if "fully-peeled" is set).
*/
-static void read_packed_refs(FILE *f, struct ref_dir *dir)
+static void read_packed_refs(int fd, struct ref_dir *dir)
{
struct ref_entry *last = NULL;
struct strbuf line = STRBUF_INIT;
enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE;
+ struct stat st;
+ void *map;
+ size_t mapsz, len;
+ const char *p;
+
+ fstat(fd, &st);
+ mapsz = xsize_t(st.st_size);
+ if (!mapsz)
+ return;
+ map = xmmap(NULL, mapsz, PROT_READ, MAP_PRIVATE, fd, 0);
- while (strbuf_getwholeline(&line, f, '\n') != EOF) {
+ for (p = map, len = mapsz; len; ) {
unsigned char sha1[20];
const char *refname;
const char *traits;
+ const char *nl;
+ size_t linelen;
+
+ nl = memchr(p, '\n', len);
+ linelen = nl ? nl - p + 1 : len;
+ strbuf_reset(&line);
+ strbuf_add(&line, p, linelen);
+ p += linelen;
+ len -= linelen;
if (skip_prefix(line.buf, "# pack-refs with:", &traits)) {
if (strstr(traits, " fully-peeled "))
@@ -1204,6 +1223,7 @@ static void read_packed_refs(FILE *f, struct ref_dir *dir)
}
strbuf_release(&line);
+ munmap(map, mapsz);
}
/*
@@ -1224,16 +1244,16 @@ static struct packed_ref_cache *get_packed_ref_cache(struct ref_cache *refs)
clear_packed_ref_cache(refs);
if (!refs->packed) {
- FILE *f;
+ int fd;
refs->packed = xcalloc(1, sizeof(*refs->packed));
acquire_packed_ref_cache(refs->packed);
refs->packed->root = create_dir_entry(refs, "", 0, 0);
- f = fopen(packed_refs_file, "r");
- if (f) {
- stat_validity_update(&refs->packed->validity, fileno(f));
- read_packed_refs(f, get_ref_dir(refs->packed->root));
- fclose(f);
+ fd = open(packed_refs_file, O_RDONLY);
+ if (fd >= 0) {
+ stat_validity_update(&refs->packed->validity, fd);
+ read_packed_refs(fd, get_ref_dir(refs->packed->root));
+ close(fd);
}
}
return refs->packed;
--
2.3.5
next prev parent reply other threads:[~2015-04-05 13:42 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-05 1:06 [PATCH 0/6] address packed-refs speed regressions Jeff King
2015-04-05 1:07 ` [PATCH 1/6] strbuf_getwholeline: use getc macro Jeff King
2015-04-05 1:08 ` [PATCH 2/6] git-compat-util: add fallbacks for unlocked stdio Jeff King
2015-04-05 1:11 ` [PATCH 3/6] strbuf_getwholeline: use getc_unlocked Jeff King
2015-04-05 4:56 ` Jeff King
2015-04-05 5:27 ` Jeff King
2015-04-05 5:35 ` Jeff King
2015-04-05 20:49 ` Junio C Hamano
2015-04-05 14:36 ` Duy Nguyen
2015-04-05 18:24 ` Jeff King
2015-04-05 20:09 ` Junio C Hamano
2015-04-07 13:48 ` Rasmus Villemoes
2015-04-07 19:04 ` Jeff King
2015-04-07 22:43 ` Rasmus Villemoes
2015-04-08 0:17 ` Jeff King
2015-04-05 1:11 ` [PATCH 4/6] strbuf: add an optimized 1-character strbuf_grow Jeff King
2015-04-06 2:13 ` Eric Sunshine
2015-04-06 5:05 ` Jeff King
2015-04-05 1:11 ` [PATCH 5/6] t1430: add another refs-escape test Jeff King
2015-04-05 1:15 ` [PATCH 6/6] refname_is_safe: avoid expensive normalize_path_copy call Jeff King
2015-04-05 13:41 ` René Scharfe [this message]
2015-04-05 18:52 ` [PATCH 0/6] address packed-refs speed regressions Jeff King
2015-04-05 18:59 ` Jeff King
2015-04-05 23:04 ` René Scharfe
2015-04-05 22:39 ` René Scharfe
2015-04-06 4:49 ` Jeff King
2015-04-16 8:47 ` [PATCH v2 0/9] " Jeff King
2015-04-16 8:48 ` [PATCH 1/9] strbuf_getwholeline: use getc macro Jeff King
2015-04-16 8:48 ` [PATCH 2/9] git-compat-util: add fallbacks for unlocked stdio Jeff King
2015-04-16 8:49 ` [PATCH 3/9] strbuf_getwholeline: use getc_unlocked Jeff King
2015-04-16 8:51 ` [PATCH 4/9] config: use getc_unlocked when reading from file Jeff King
2015-04-16 8:53 ` [PATCH 5/9] strbuf_addch: avoid calling strbuf_grow Jeff King
2015-04-16 8:58 ` [PATCH 6/9] strbuf_getwholeline: " Jeff King
2015-04-16 9:01 ` [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available Jeff King
2015-04-17 10:16 ` Eric Sunshine
2015-04-21 23:09 ` Jeff King
2015-05-08 23:56 ` Eric Sunshine
2015-05-09 1:09 ` Jeff King
2015-06-02 18:22 ` Eric Sunshine
2015-04-22 18:00 ` Johannes Schindelin
2015-04-22 18:06 ` Jeff King
2015-04-16 9:03 ` [PATCH 8/9] read_packed_refs: avoid double-checking sane refs Jeff King
2015-04-16 9:04 ` [PATCH 9/9] t1430: add another refs-escape test Jeff King
2015-04-16 9:25 ` [PATCH v2 0/9] address packed-refs speed regressions Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55213B93.9050207@web.de \
--to=l.s.r@web.de \
--cc=git@vger.kernel.org \
--cc=mhagger@alum.mit.edu \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.