From: Jeff King <peff@peff.net>
To: steve.norman@thomsonreuters.com
Cc: pclouds@gmail.com, git@vger.kernel.org,
Michael Haggerty <mhagger@alum.mit.edu>
Subject: [PATCH 3/3] prepare_packed_git: use stat_validity to avoid re-reading packs
Date: Fri, 22 May 2015 19:54:05 -0400 [thread overview]
Message-ID: <20150522235404.GA4973@peff.net> (raw)
In-Reply-To: <20150522235116.GA4300@peff.net>
When we try to read an object and fail, we call
reprepare_packed_git() to re-scan the list of packfiles.
This catches any "racy" cases in which somebody is repacking
or making new objects. The performance implications of
re-scanning the pack directory are not usually a big deal,
because failing to find an object is the unlikely case;
we typically die if the re-scan does not help.
Since 45e8a74 (has_sha1_file: re-check pack directory before
giving up, 2013-08-30), we do the same thing for
has_sha1_file. Some calls to has_sha1_file are in the same
boat: they expect most calls to return true. But some sites,
such as the collision test in index-pack.c, may make a large
number of calls, most of which they expect to be false. On a
local system, this can cause a performance slowdown of
around 5%. But on a system with high-latency system calls
(like NFS), it can be much worse.
Since the common case is that the pack directory has _not_
changed, we can improve the performance by using stat()
before doing the opendir()/readdir()/closedir() to re-scan
the directory. It's valid to check stat() for just the
single "objects/pack" directory because:
1. We know that packfiles themselves are not changed in
place; only new files are written, which would update
the mtime of the directory.
2. There are no subdirectories inside the pack directory.
Checking the single top-level directory can tell us
whether anything changed.
Signed-off-by: Jeff King <peff@peff.net>
---
I'm not sure what we want to do for the case where we don't have DIRFD.
We have to separately open the directory, but I'm not sure if regular
open() is even valid on a regular directory on all systems. We could
just skip the stat_validity entirely in that case.
cache.h | 1 +
git-compat-util.h | 4 ++++
sha1_file.c | 27 ++++++++++++++++++++++++---
3 files changed, 29 insertions(+), 3 deletions(-)
diff --git a/cache.h b/cache.h
index cdd279a..9c34a17 100644
--- a/cache.h
+++ b/cache.h
@@ -1216,6 +1216,7 @@ void stat_validity_update(struct stat_validity *sv, int fd);
extern struct alternate_object_database {
struct alternate_object_database *next;
char *name;
+ struct stat_validity pack_validity;
char base[FLEX_ARRAY]; /* more */
} *alt_odb_list;
extern void prepare_alt_odb(void);
diff --git a/git-compat-util.h b/git-compat-util.h
index b7a97fb..8631e75 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -941,4 +941,8 @@ struct tm *git_gmtime_r(const time_t *, struct tm *);
#define getc_unlocked(fh) getc(fh)
#endif
+#if _POSIX_VERSION >= 200809L
+#define HAVE_DIRFD
+#endif
+
#endif
diff --git a/sha1_file.c b/sha1_file.c
index ccc6dac..0e77838 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1225,7 +1225,8 @@ static void report_pack_garbage(struct string_list *list)
report_helper(list, seen_bits, first, list->nr);
}
-static void prepare_packed_git_one(char *objdir, int local)
+static void prepare_packed_git_one(char *objdir, int local,
+ struct stat_validity *validity)
{
struct strbuf path = STRBUF_INIT;
size_t dirnamelen;
@@ -1235,6 +1236,12 @@ static void prepare_packed_git_one(char *objdir, int local)
strbuf_addstr(&path, objdir);
strbuf_addstr(&path, "/pack");
+
+ if (stat_validity_check(validity, path.buf)) {
+ strbuf_release(&path);
+ return;
+ }
+
dir = opendir(path.buf);
if (!dir) {
if (errno != ENOENT)
@@ -1243,6 +1250,19 @@ static void prepare_packed_git_one(char *objdir, int local)
strbuf_release(&path);
return;
}
+
+#ifdef HAVE_DIRFD
+ stat_validity_update(validity, dirfd(dir));
+#else
+ {
+ int fd = open(path.buf, O_RDONLY);
+ if (fd >= 0) {
+ stat_validity_update(validity, fd);
+ close(fd);
+ }
+ }
+#endif
+
strbuf_addch(&path, '/');
dirnamelen = path.len;
while ((de = readdir(dir)) != NULL) {
@@ -1348,15 +1368,16 @@ static void rearrange_packed_git(void)
static int prepare_packed_git_run_once = 0;
void prepare_packed_git(void)
{
+ static struct stat_validity validity;
struct alternate_object_database *alt;
if (prepare_packed_git_run_once)
return;
- prepare_packed_git_one(get_object_directory(), 1);
+ prepare_packed_git_one(get_object_directory(), 1, &validity);
prepare_alt_odb();
for (alt = alt_odb_list; alt; alt = alt->next) {
alt->name[-1] = 0;
- prepare_packed_git_one(alt->base, 0);
+ prepare_packed_git_one(alt->base, 0, &alt->pack_validity);
alt->name[-1] = '/';
}
rearrange_packed_git();
--
2.4.1.538.g69ac333
next prev parent reply other threads:[~2015-05-22 23:54 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-21 13:13 Troubleshoot clone issue to NFS steve.norman
2015-05-21 14:00 ` Christian Couder
2015-05-21 14:31 ` Duy Nguyen
2015-05-21 14:38 ` Duy Nguyen
2015-05-21 15:53 ` steve.norman
2015-05-22 0:16 ` Duy Nguyen
2015-05-22 7:12 ` Jeff King
2015-05-22 8:35 ` steve.norman
2015-05-22 10:05 ` Duy Nguyen
2015-05-22 14:37 ` Junio C Hamano
2015-05-22 15:02 ` steve.norman
2015-05-22 23:51 ` [PATCH/RFC 0/3] using stat() to avoid re-scanning pack dir Jeff King
2015-05-22 23:51 ` [PATCH 1/3] stat_validity: handle non-regular files Jeff King
2015-05-23 11:00 ` Michael Haggerty
2015-05-24 8:29 ` Jeff King
2015-05-22 23:52 ` [PATCH 2/3] cache.h: move stat_validity definition up Jeff King
2015-05-22 23:54 ` Jeff King [this message]
2015-05-23 1:19 ` [PATCH/RFC 0/3] using stat() to avoid re-scanning pack dir Duy Nguyen
2015-05-23 1:21 ` Duy Nguyen
2015-05-24 8:20 ` Jeff King
2015-05-24 9:00 ` Troubleshoot clone issue to NFS Duy Nguyen
2015-06-05 12:01 ` steve.norman
2015-06-05 12:18 ` Jeff King
2015-06-05 12:29 ` [PATCH] index-pack: avoid excessive re-reading of pack directory Jeff King
2015-06-09 17:24 ` Jeff King
2015-06-09 17:41 ` Jeff King
2015-06-10 3:46 ` Shawn Pearce
2015-06-10 14:00 ` Jeff King
2015-06-10 14:36 ` Duy Nguyen
2015-06-10 21:34 ` Shawn Pearce
2015-06-05 14:20 ` Troubleshoot clone issue to NFS steve.norman
2015-06-16 20:50 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150522235404.GA4973@peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=mhagger@alum.mit.edu \
--cc=pclouds@gmail.com \
--cc=steve.norman@thomsonreuters.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).