From: Jeff King <peff@peff.net>
To: Stefan Saasen <ssaasen@atlassian.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
Date: Fri, 17 Apr 2015 10:03:15 -0400 [thread overview]
Message-ID: <20150417140315.GA13506@peff.net> (raw)
In-Reply-To: <CADoxLGPYOkgzb4bkdHq5tK0aJS2M=nWGzO=YYXPDcy-gh45q-g@mail.gmail.com>
On Fri, Apr 17, 2015 at 05:30:22PM +1000, Stefan Saasen wrote:
> The merge is created in a temporary location that uses alternates. The
> temporary repository is on a local disk, the alternate object database
> on an NFS mount.
Is the alternate writeable? If we can't freshen the object, we fall back
to storing the object locally, which could have a performance impact.
But it looks from your tables below like the utime() call is succeeding,
so that is probably not what is happening here.
> My current hypothesis is that the additional `access`, but more
> importantly the additional `utime` calls are responsible in the
> increased merge times that we see.
Yeah, that makes sense from your tables. The commit in question flips
the order of the loose/packed check, and the packed check should be much
faster on your NFS mount. Can you try:
diff --git a/sha1_file.c b/sha1_file.c
index 88f06ba..822aaef 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3014,7 +3014,7 @@ int write_sha1_file(const void *buf, unsigned long len, const char *type, unsign
write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
if (returnsha1)
hashcpy(returnsha1, sha1);
- if (freshen_loose_object(sha1) || freshen_packed_object(sha1))
+ if (freshen_packed_object(sha1) || freshen_loose_object(sha1))
return 0;
return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
}
I think that should clear up the access() calls, but leave the utime()
ones.
> Looking at the detailed strace shows that utime will be called
> repeatedly in same cases (e.g.
> https://bitbucket.org/snippets/ssaasen/oend shows an example where the
> same packfile will be updated more than 4000 times in a single merge).
>
> http://www.spinics.net/lists/git/msg240106.html discusses a potential
> improvement for this case. Would that be an acceptable avenue to
> improve this situation?
I think so. Here's a tentative patch:
diff --git a/cache.h b/cache.h
index 3d3244b..72c6888 100644
--- a/cache.h
+++ b/cache.h
@@ -1174,6 +1174,7 @@ extern struct packed_git {
int pack_fd;
unsigned pack_local:1,
pack_keep:1,
+ freshened:1,
do_not_close:1;
unsigned char sha1[20];
/* something like ".git/objects/pack/xxxxx.pack" */
diff --git a/sha1_file.c b/sha1_file.c
index 822aaef..f27cbf1 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2999,7 +2999,11 @@ static int freshen_loose_object(const unsigned char *sha1)
static int freshen_packed_object(const unsigned char *sha1)
{
struct pack_entry e;
- return find_pack_entry(sha1, &e) && freshen_file(e.p->pack_name);
+ if (!find_pack_entry(sha1, &e))
+ return 0;
+ if (e.p->freshened)
+ return 1;
+ return freshen_file(e.p->pack_name);
}
int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *returnsha1)
If it's not a problem, I'd love to see timings for your case with just
the first patch, and then with both.
You may also be interested in:
http://thread.gmane.org/gmane.comp.version-control.git/266370
which addresses another performance problem related to the
freshen/recent code in v2.2.
-Peff
next prev parent reply other threads:[~2015-04-17 14:03 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-17 7:30 [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects Stefan Saasen
2015-04-17 14:03 ` Jeff King [this message]
2015-04-17 15:50 ` Junio C Hamano
2015-04-18 3:35 ` Stefan Saasen
2015-04-20 19:53 ` Jeff King
2015-04-20 19:54 ` [PATCH 1/2] sha1_file: freshen pack objects before loose Jeff King
2015-04-21 0:46 ` Stefan Saasen
2015-04-20 19:55 ` [PATCH 2/2] sha1_file: only freshen packs once per run Jeff King
2015-04-21 0:45 ` Stefan Saasen
2015-04-20 20:04 ` [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects Junio C Hamano
2015-04-20 20:09 ` Jeff King
2015-04-20 20:12 ` Junio C Hamano
2015-04-20 20:28 ` Jeff King
2015-04-21 1:49 ` Stefan Saasen
2015-04-21 17:05 ` Junio C Hamano
2015-04-21 21:45 ` Junio C Hamano
2015-04-22 1:46 ` Stefan Saasen
2015-04-22 22:00 ` Junio C Hamano
2015-04-22 0:04 ` Stefan Saasen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150417140315.GA13506@peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=ssaasen@atlassian.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).