From: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
To: git@vger.kernel.org
Cc: "Thomas Rast" <trast@inf.ethz.ch>,
"Joshua Redstone" <joshua.redstone@fb.com>,
"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: [PATCH 6/6] Automatically switch to crc32 checksum for index when it's too large
Date: Mon, 6 Feb 2012 12:48:39 +0700 [thread overview]
Message-ID: <1328507319-24687-6-git-send-email-pclouds@gmail.com> (raw)
In-Reply-To: <1328507319-24687-1-git-send-email-pclouds@gmail.com>
An experiment with -O3 is done on Intel D510@1.66GHz. At around 250k
entries, index reading time exceeds 0.5s. Switching to crc32 brings it
back lower than 0.2s.
On 4M files index, reading time with SHA-1 takes ~8.4, crc32 2.8s.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
I know no real repositories this size though. gentoo-x86 is "only"
120k. Haven't checked libreoffice repo yet.
On 2M files index, allocating one big block (i.e. reverting debed2a
(read-cache.c: allocate index entries individually - 2011-10-24)
saves about 0.3s. Maybe we can allocate one big block, then malloc
separately when the block is fully used.
Writing time is still high. "git update-index --crc32" on crc32 250k index
takes 0.9s (so writing time is about 0.5s)
A better solution may be narrow clone (or just the narrow checkout
part), where index only contains entries from checked out
subdirectories.
Documentation/config.txt | 7 +++++++
builtin/update-index.c | 1 +
cache.h | 1 +
config.c | 5 +++++
environment.c | 1 +
read-cache.c | 8 ++++++++
6 files changed, 23 insertions(+), 0 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index abeb82b..55b7596 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -540,6 +540,13 @@ relatively high IO latencies. With this set to 'true', git will do the
index comparison to the filesystem data in parallel, allowing
overlapping IO's.
+core.crc32IndexThreshold::
+ Usually SHA-1 is used to check for index integerity. When the
+ number of entries in index exceeds this threshold, crc32 will
+ be used instead. Zero means SHA-1 always be used. Negative
+ value disables this threshold (i.e. crc32 or SHA-1 is decided
+ by other means).
+
core.createObject::
You can set this to 'link', in which case a hardlink followed by
a delete of the source are used to make sure that object creation
diff --git a/builtin/update-index.c b/builtin/update-index.c
index 6913226..5cb51c7 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -856,6 +856,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
argc = parse_options_end(&ctx);
if (do_crc != -1) {
+ core_crc32_index_threshold = -1;
if (do_crc)
the_index.hdr_flags |= CACHE_F_CRC;
else
diff --git a/cache.h b/cache.h
index 7352402..d05856b 100644
--- a/cache.h
+++ b/cache.h
@@ -610,6 +610,7 @@ extern unsigned long pack_size_limit_cfg;
extern int read_replace_refs;
extern int fsync_object_files;
extern int core_preload_index;
+extern int core_crc32_index_threshold;
extern int core_apply_sparse_checkout;
enum branch_track {
diff --git a/config.c b/config.c
index 40f9c6d..905e071 100644
--- a/config.c
+++ b/config.c
@@ -671,6 +671,11 @@ static int git_default_core_config(const char *var, const char *value)
return 0;
}
+ if (!strcmp(var, "core.crc32indexthreshold")) {
+ core_crc32_index_threshold = git_config_int(var, value);
+ return 0;
+ }
+
if (!strcmp(var, "core.createobject")) {
if (!strcmp(value, "rename"))
object_creation_mode = OBJECT_CREATION_USES_RENAMES;
diff --git a/environment.c b/environment.c
index c93b8f4..9d9dfc2 100644
--- a/environment.c
+++ b/environment.c
@@ -66,6 +66,7 @@ unsigned long pack_size_limit_cfg;
/* Parallel index stat data preload? */
int core_preload_index = 0;
+int core_crc32_index_threshold = 250000;
/* This is set by setup_git_dir_gently() and/or git_default_config() */
char *git_work_tree_cfg;
diff --git a/read-cache.c b/read-cache.c
index a34878e..fd032d8 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1582,6 +1582,14 @@ int write_index(struct index_state *istate, int newfd)
}
}
+ if (core_crc32_index_threshold >= 0) {
+ if (core_crc32_index_threshold > 0 &&
+ istate->cache_nr >= core_crc32_index_threshold)
+ istate->hdr_flags |= CACHE_F_CRC;
+ else
+ istate->hdr_flags &= ~CACHE_F_CRC;
+ }
+
hdr.h.hdr_signature = htonl(CACHE_SIGNATURE);
if (istate->hdr_flags) {
hdr.h.hdr_version = htonl(4);
--
1.7.8.36.g69ee2
next prev parent reply other threads:[~2012-02-06 5:44 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-06 5:48 [PATCH 1/6] read-cache: use sha1file for sha1 calculation Nguyễn Thái Ngọc Duy
2012-02-06 5:48 ` [PATCH 2/6] csum-file: make sha1 calculation optional Nguyễn Thái Ngọc Duy
2012-02-06 5:48 ` [PATCH 3/6] Stop producing index version 2 Nguyễn Thái Ngọc Duy
2012-02-06 7:10 ` Junio C Hamano
2012-02-07 3:09 ` Shawn Pearce
2012-02-07 4:50 ` Nguyen Thai Ngoc Duy
2012-02-07 8:51 ` Nguyen Thai Ngoc Duy
2012-02-07 5:21 ` Junio C Hamano
2012-02-07 17:25 ` Thomas Rast
2012-02-06 5:48 ` [PATCH 4/6] Introduce index version 4 with global flags Nguyễn Thái Ngọc Duy
2012-02-06 5:48 ` [PATCH 5/6] Allow to use crc32 as a lighter checksum on index Nguyễn Thái Ngọc Duy
2012-02-07 3:17 ` Shawn Pearce
2012-02-07 4:04 ` Dave Zarzycki
2012-02-07 4:29 ` Dave Zarzycki
2012-02-06 5:48 ` Nguyễn Thái Ngọc Duy [this message]
2012-02-06 8:50 ` [PATCH 6/6] Automatically switch to crc32 checksum for index when it's too large Dave Zarzycki
2012-02-06 8:54 ` Nguyen Thai Ngoc Duy
2012-02-06 9:07 ` Dave Zarzycki
2012-02-06 7:34 ` [PATCH 1/6] read-cache: use sha1file for sha1 calculation Junio C Hamano
2012-02-06 8:36 ` Nguyen Thai Ngoc Duy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1328507319-24687-6-git-send-email-pclouds@gmail.com \
--to=pclouds@gmail.com \
--cc=git@vger.kernel.org \
--cc=joshua.redstone@fb.com \
--cc=trast@inf.ethz.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).