git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Stefan Saasen <ssaasen@atlassian.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
Date: Fri, 17 Apr 2015 10:03:15 -0400	[thread overview]
Message-ID: <20150417140315.GA13506@peff.net> (raw)
In-Reply-To: <CADoxLGPYOkgzb4bkdHq5tK0aJS2M=nWGzO=YYXPDcy-gh45q-g@mail.gmail.com>

On Fri, Apr 17, 2015 at 05:30:22PM +1000, Stefan Saasen wrote:

> The merge is created in a temporary location that uses alternates. The
> temporary repository is on a local disk, the alternate object database
> on an NFS mount.

Is the alternate writeable? If we can't freshen the object, we fall back
to storing the object locally, which could have a performance impact.
But it looks from your tables below like the utime() call is succeeding,
so that is probably not what is happening here.

> My current hypothesis is that the additional `access`, but more
> importantly the additional `utime` calls are responsible in the
> increased merge times that we see.

Yeah, that makes sense from your tables. The commit in question flips
the order of the loose/packed check, and the packed check should be much
faster on your NFS mount. Can you try:

diff --git a/sha1_file.c b/sha1_file.c
index 88f06ba..822aaef 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3014,7 +3014,7 @@ int write_sha1_file(const void *buf, unsigned long len, const char *type, unsign
 	write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
 	if (returnsha1)
 		hashcpy(returnsha1, sha1);
-	if (freshen_loose_object(sha1) || freshen_packed_object(sha1))
+	if (freshen_packed_object(sha1) || freshen_loose_object(sha1))
 		return 0;
 	return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
 }

I think that should clear up the access() calls, but leave the utime()
ones.

> Looking at the detailed strace shows that utime will be called
> repeatedly in same cases (e.g.
> https://bitbucket.org/snippets/ssaasen/oend shows an example where the
> same packfile will be updated more than 4000 times in a single merge).
> 
> http://www.spinics.net/lists/git/msg240106.html discusses a potential
> improvement for this case. Would that be an acceptable avenue to
> improve this situation?

I think so. Here's a tentative patch:

diff --git a/cache.h b/cache.h
index 3d3244b..72c6888 100644
--- a/cache.h
+++ b/cache.h
@@ -1174,6 +1174,7 @@ extern struct packed_git {
 	int pack_fd;
 	unsigned pack_local:1,
 		 pack_keep:1,
+		 freshened:1,
 		 do_not_close:1;
 	unsigned char sha1[20];
 	/* something like ".git/objects/pack/xxxxx.pack" */
diff --git a/sha1_file.c b/sha1_file.c
index 822aaef..f27cbf1 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2999,7 +2999,11 @@ static int freshen_loose_object(const unsigned char *sha1)
 static int freshen_packed_object(const unsigned char *sha1)
 {
 	struct pack_entry e;
-	return find_pack_entry(sha1, &e) && freshen_file(e.p->pack_name);
+	if (!find_pack_entry(sha1, &e))
+		return 0;
+	if (e.p->freshened)
+		return 1;
+	return freshen_file(e.p->pack_name);
 }
 
 int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *returnsha1)


If it's not a problem, I'd love to see timings for your case with just
the first patch, and then with both.

You may also be interested in:

  http://thread.gmane.org/gmane.comp.version-control.git/266370

which addresses another performance problem related to the
freshen/recent code in v2.2.

-Peff

  reply	other threads:[~2015-04-17 14:03 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-17  7:30 [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects Stefan Saasen
2015-04-17 14:03 ` Jeff King [this message]
2015-04-17 15:50   ` Junio C Hamano
2015-04-18  3:35   ` Stefan Saasen
2015-04-20 19:53     ` Jeff King
2015-04-20 19:54       ` [PATCH 1/2] sha1_file: freshen pack objects before loose Jeff King
2015-04-21  0:46         ` Stefan Saasen
2015-04-20 19:55       ` [PATCH 2/2] sha1_file: only freshen packs once per run Jeff King
2015-04-21  0:45         ` Stefan Saasen
2015-04-20 20:04       ` [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects Junio C Hamano
2015-04-20 20:09         ` Jeff King
2015-04-20 20:12           ` Junio C Hamano
2015-04-20 20:28             ` Jeff King
2015-04-21  1:49           ` Stefan Saasen
2015-04-21 17:05             ` Junio C Hamano
2015-04-21 21:45               ` Junio C Hamano
2015-04-22  1:46                 ` Stefan Saasen
2015-04-22 22:00                   ` Junio C Hamano
2015-04-22  0:04               ` Stefan Saasen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150417140315.GA13506@peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=ssaasen@atlassian.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).