From: Jeff King <peff@peff.net>
To: Stefan Saasen <ssaasen@atlassian.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
Date: Fri, 17 Apr 2015 10:03:15 -0400 [thread overview]
Message-ID: <20150417140315.GA13506@peff.net> (raw)
In-Reply-To: <CADoxLGPYOkgzb4bkdHq5tK0aJS2M=nWGzO=YYXPDcy-gh45q-g@mail.gmail.com>
On Fri, Apr 17, 2015 at 05:30:22PM +1000, Stefan Saasen wrote:
> The merge is created in a temporary location that uses alternates. The
> temporary repository is on a local disk, the alternate object database
> on an NFS mount.
Is the alternate writeable? If we can't freshen the object, we fall back
to storing the object locally, which could have a performance impact.
But it looks from your tables below like the utime() call is succeeding,
so that is probably not what is happening here.
> My current hypothesis is that the additional `access`, but more
> importantly the additional `utime` calls are responsible in the
> increased merge times that we see.
Yeah, that makes sense from your tables. The commit in question flips
the order of the loose/packed check, and the packed check should be much
faster on your NFS mount. Can you try:
diff --git a/sha1_file.c b/sha1_file.c
index 88f06ba..822aaef 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3014,7 +3014,7 @@ int write_sha1_file(const void *buf, unsigned long len, const char *type, unsign
write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
if (returnsha1)
hashcpy(returnsha1, sha1);
- if (freshen_loose_object(sha1) || freshen_packed_object(sha1))
+ if (freshen_packed_object(sha1) || freshen_loose_object(sha1))
return 0;
return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
}
I think that should clear up the access() calls, but leave the utime()
ones.
> Looking at the detailed strace shows that utime will be called
> repeatedly in same cases (e.g.
> https://bitbucket.org/snippets/ssaasen/oend shows an example where the
> same packfile will be updated more than 4000 times in a single merge).
>
> http://www.spinics.net/lists/git/msg240106.html discusses a potential
> improvement for this case. Would that be an acceptable avenue to
> improve this situation?
I think so. Here's a tentative patch:
diff --git a/cache.h b/cache.h
index 3d3244b..72c6888 100644
--- a/cache.h
+++ b/cache.h
@@ -1174,6 +1174,7 @@ extern struct packed_git {
int pack_fd;
unsigned pack_local:1,
pack_keep:1,
+ freshened:1,
do_not_close:1;
unsigned char sha1[20];
/* something like ".git/objects/pack/xxxxx.pack" */
diff --git a/sha1_file.c b/sha1_file.c
index 822aaef..f27cbf1 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2999,7 +2999,11 @@ static int freshen_loose_object(const unsigned char *sha1)
static int freshen_packed_object(const unsigned char *sha1)
{
struct pack_entry e;
- return find_pack_entry(sha1, &e) && freshen_file(e.p->pack_name);
+ if (!find_pack_entry(sha1, &e))
+ return 0;
+ if (e.p->freshened)
+ return 1;
+ return freshen_file(e.p->pack_name);
}
int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *returnsha1)
If it's not a problem, I'd love to see timings for your case with just
the first patch, and then with both.
You may also be interested in:
http://thread.gmane.org/gmane.comp.version-control.git/266370
which addresses another performance problem related to the
freshen/recent code in v2.2.
-Peff
next prev parent reply other threads:[~2015-04-17 14:03 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-17 7:30 [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects Stefan Saasen
2015-04-17 14:03 ` Jeff King [this message]
2015-04-17 15:50 ` Junio C Hamano
2015-04-18 3:35 ` Stefan Saasen
2015-04-20 19:53 ` Jeff King
2015-04-20 19:54 ` [PATCH 1/2] sha1_file: freshen pack objects before loose Jeff King
2015-04-21 0:46 ` Stefan Saasen
2015-04-20 19:55 ` [PATCH 2/2] sha1_file: only freshen packs once per run Jeff King
2015-04-21 0:45 ` Stefan Saasen
2015-04-20 20:04 ` [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects Junio C Hamano
2015-04-20 20:09 ` Jeff King
2015-04-20 20:12 ` Junio C Hamano
2015-04-20 20:28 ` Jeff King
2015-04-21 1:49 ` Stefan Saasen
2015-04-21 17:05 ` Junio C Hamano
2015-04-21 21:45 ` Junio C Hamano
2015-04-22 1:46 ` Stefan Saasen
2015-04-22 22:00 ` Junio C Hamano
2015-04-22 0:04 ` Stefan Saasen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150417140315.GA13506@peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=ssaasen@atlassian.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.