From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
"René Scharfe" <l.s.r@web.de>,
"Ramsay Jones" <ramsay@ramsayjones.plus.com>,
"Johannes Schindelin" <johannes.schindelin@gmx.de>,
"Jeff King" <peff@peff.net>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [PATCH v5 09/10] fsck: use oidset instead of oid_array for skipList
Date: Mon, 3 Sep 2018 14:49:27 +0000 [thread overview]
Message-ID: <20180903144928.30691-10-avarab@gmail.com> (raw)
In-Reply-To: <2b31e12e-20e9-3d08-58bd-977f8b83e0a7@web.de>
From: René Scharfe <l.s.r@web.de>
Change the implementation of the skipList feature to use oidset
instead of oid_array to store SHA-1s for later lookup.
This list is parsed once on startup by fsck, fetch-pack or
receive-pack depending on the *.skipList config in use. I.e. only once
per invocation, but note that for "clone --recurse-submodules" each
submodule will re-parse the list, in addition to the main project, and
it will be re-parsed when checking .gitmodules blobs, see
fb16287719 ("fsck: check skiplist for object in fsck_blob()",
2018-06-27).
Memory usage is a bit higher, but we don't need to keep track of the
sort order anymore. Embed the oidset into struct fsck_options to make
its ownership clear (no hidden sharing) and avoid unnecessary pointer
indirection.
The cumulative impact on performance of this & the preceding change,
using the test setup described in the previous commit:
Test HEAD~2 HEAD~ HEAD
----------------------------------------------------------------------------------------------------------------
1450.3: fsck with 0 skipped bad commits 7.70(7.31+0.38) 7.72(7.33+0.38) +0.3% 7.70(7.30+0.40) +0.0%
1450.5: fsck with 1 skipped bad commits 7.84(7.47+0.37) 7.69(7.32+0.36) -1.9% 7.71(7.29+0.41) -1.7%
1450.7: fsck with 10 skipped bad commits 7.81(7.40+0.40) 7.94(7.57+0.36) +1.7% 7.92(7.55+0.37) +1.4%
1450.9: fsck with 100 skipped bad commits 7.81(7.42+0.38) 7.95(7.53+0.41) +1.8% 7.83(7.42+0.41) +0.3%
1450.11: fsck with 1000 skipped bad commits 7.99(7.62+0.36) 7.90(7.50+0.40) -1.1% 7.86(7.49+0.37) -1.6%
1450.13: fsck with 10000 skipped bad commits 7.98(7.57+0.40) 7.94(7.53+0.40) -0.5% 7.90(7.45+0.44) -1.0%
1450.15: fsck with 100000 skipped bad commits 7.97(7.57+0.39) 8.03(7.67+0.36) +0.8% 7.84(7.43+0.41) -1.6%
1450.17: fsck with 1000000 skipped bad commits 7.72(7.22+0.50) 7.28(7.07+0.20) -5.7% 7.13(6.87+0.25) -7.6%
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
Documentation/config.txt | 11 ++++++-----
fsck.c | 23 ++---------------------
fsck.h | 8 +++++---
3 files changed, 13 insertions(+), 29 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 3287c7ef8a..161ffe259e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1731,11 +1731,12 @@ all three of them they must all set to the same values.
+
Older versions of Git (before 2.20) documented that the object names
list should be sorted. This was never a requirement, the object names
-can appear in any order, but when reading the list we track whether
-the list is sorted for the purposes of an internal binary search
-implementation, which can save itself some work with an already sorted
-list. Unless you have a humongous list there's no reason to go out of
-your way to pre-sort the list.
+could appear in any order, but when reading the list we tracked whether
+the list was sorted for the purposes of an internal binary search
+implementation, which could save itself some work with an already sorted
+list. Unless you had a humongous list there was no reason to go out of
+your way to pre-sort the list. After Git version 2.20 a hash implementation
+is used instead, so there's now no reason to pre-sort the list.
gc.aggressiveDepth::
The depth parameter used in the delta compression
diff --git a/fsck.c b/fsck.c
index 972a26b9ba..4c643f1d40 100644
--- a/fsck.c
+++ b/fsck.c
@@ -10,7 +10,6 @@
#include "fsck.h"
#include "refs.h"
#include "utf8.h"
-#include "sha1-array.h"
#include "decorate.h"
#include "oidset.h"
#include "packfile.h"
@@ -182,19 +181,10 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
static void init_skiplist(struct fsck_options *options, const char *path)
{
- static struct oid_array skiplist = OID_ARRAY_INIT;
- int sorted;
FILE *fp;
struct strbuf sb = STRBUF_INIT;
struct object_id oid;
- if (options->skiplist)
- sorted = options->skiplist->sorted;
- else {
- sorted = 1;
- options->skiplist = &skiplist;
- }
-
fp = fopen(path, "r");
if (!fp)
die("Could not open skip list: %s", path);
@@ -202,19 +192,12 @@ static void init_skiplist(struct fsck_options *options, const char *path)
const char *p;
if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0')
die("Invalid SHA-1: %s", sb.buf);
- oid_array_append(&skiplist, &oid);
- if (sorted && skiplist.nr > 1 &&
- oidcmp(&skiplist.oid[skiplist.nr - 2],
- &oid) > 0)
- sorted = 0;
+ oidset_insert(&options->skiplist, &oid);
}
if (ferror(fp))
die_errno("Could not read '%s'", path);
fclose(fp);
strbuf_release(&sb);
-
- if (sorted)
- skiplist.sorted = 1;
}
static int parse_msg_type(const char *str)
@@ -319,9 +302,7 @@ static void append_msg_id(struct strbuf *sb, const char *msg_id)
static int object_on_skiplist(struct fsck_options *opts, struct object *obj)
{
- if (opts && opts->skiplist && obj)
- return oid_array_lookup(opts->skiplist, &obj->oid) >= 0;
- return 0;
+ return opts && obj && oidset_contains(&opts->skiplist, &obj->oid);
}
__attribute__((format (printf, 4, 5)))
diff --git a/fsck.h b/fsck.h
index 0c7e8c9428..b95595ae5f 100644
--- a/fsck.h
+++ b/fsck.h
@@ -1,6 +1,8 @@
#ifndef GIT_FSCK_H
#define GIT_FSCK_H
+#include "oidset.h"
+
#define FSCK_ERROR 1
#define FSCK_WARN 2
#define FSCK_IGNORE 3
@@ -35,12 +37,12 @@ struct fsck_options {
fsck_error error_func;
unsigned strict:1;
int *msg_type;
- struct oid_array *skiplist;
+ struct oidset skiplist;
struct decoration *object_names;
};
-#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL }
-#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL }
+#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
+#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT }
/* descend in all linked child objects
* the return value is:
--
2.19.0.rc1.350.ge57e33dbd1
next prev parent reply other threads:[~2018-09-03 14:49 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-27 19:43 [PATCH v3 0/7] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 1/7] fsck tests: setup of bogus commit object Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 2/7] fsck tests: add a test for no skipList input Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 3/7] fsck: document and test sorted " Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 4/7] fsck: document and test commented & empty line " Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 5/7] fsck: use strbuf_getline() to read skiplist file Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 6/7] fsck: use oidset for skiplist Ævar Arnfjörð Bjarmason
2018-08-27 20:15 ` Ævar Arnfjörð Bjarmason
2018-08-28 9:52 ` [PATCH v4 0/8] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
2018-08-28 9:52 ` [PATCH v4 1/8] fsck tests: setup of bogus commit object Ævar Arnfjörð Bjarmason
2018-08-28 9:52 ` [PATCH v4 2/8] fsck tests: add a test for no skipList input Ævar Arnfjörð Bjarmason
2018-08-28 9:52 ` [PATCH v4 3/8] fsck: document and test sorted " Ævar Arnfjörð Bjarmason
2018-08-28 9:52 ` [PATCH v4 4/8] fsck: document and test commented & empty line " Ævar Arnfjörð Bjarmason
2018-08-28 9:52 ` [PATCH v4 5/8] fsck: add a performance test for skipList Ævar Arnfjörð Bjarmason
2018-08-28 9:52 ` [PATCH v4 6/8] fsck: use strbuf_getline() to read skiplist file Ævar Arnfjörð Bjarmason
2018-08-28 9:52 ` [PATCH v4 7/8] fsck: use oidset instead of oid_array for skipList Ævar Arnfjörð Bjarmason
2018-08-28 9:52 ` [PATCH v4 8/8] fsck: support comments & empty lines in skipList Ævar Arnfjörð Bjarmason
2018-08-28 14:59 ` René Scharfe
2018-09-03 14:49 ` [PATCH v5 00/10] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
2018-09-03 14:49 ` [PATCH v5 01/10] fsck tests: setup of bogus commit object Ævar Arnfjörð Bjarmason
2018-09-03 14:49 ` [PATCH v5 02/10] fsck tests: add a test for no skipList input Ævar Arnfjörð Bjarmason
2018-09-03 14:49 ` [PATCH v5 03/10] fsck: document and test sorted " Ævar Arnfjörð Bjarmason
2018-09-03 14:49 ` [PATCH v5 04/10] fsck: document and test commented & empty line " Ævar Arnfjörð Bjarmason
2018-09-03 14:49 ` [PATCH v5 05/10] fsck: document that skipList input must be unabbreviated Ævar Arnfjörð Bjarmason
2018-09-03 14:49 ` [PATCH v5 06/10] fsck: add a performance test Ævar Arnfjörð Bjarmason
2018-09-03 14:49 ` [PATCH v5 07/10] fsck: add a performance test for skipList Ævar Arnfjörð Bjarmason
2018-09-03 14:49 ` [PATCH v5 08/10] fsck: use strbuf_getline() to read skiplist file Ævar Arnfjörð Bjarmason
2018-09-03 14:49 ` Ævar Arnfjörð Bjarmason [this message]
2018-09-03 14:49 ` [PATCH v5 10/10] fsck: support comments & empty lines in skipList Ævar Arnfjörð Bjarmason
2018-08-28 14:29 ` [PATCH v3 6/7] fsck: use oidset for skiplist René Scharfe
2018-08-27 19:43 ` [PATCH v3 7/7] fsck: support comments & empty lines in skipList Ævar Arnfjörð Bjarmason
2018-08-27 19:46 ` [PATCH v3 0/7] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180903144928.30691-10-avarab@gmail.com \
--to=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=johannes.schindelin@gmx.de \
--cc=l.s.r@web.de \
--cc=peff@peff.net \
--cc=ramsay@ramsayjones.plus.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.