All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"René Scharfe" <l.s.r@web.de>,
	"Ramsay Jones" <ramsay@ramsayjones.plus.com>,
	"Johannes Schindelin" <johannes.schindelin@gmx.de>,
	"Jeff King" <peff@peff.net>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [PATCH v4 0/8] use oidset for skiplist + docs + tests + comment support
Date: Tue, 28 Aug 2018 09:52:11 +0000	[thread overview]
Message-ID: <20180828095219.23296-1-avarab@gmail.com> (raw)
In-Reply-To: <87lg8refcr.fsf@evledraar.gmail.com>

On Mon, Aug 27 2018, Ævar Arnfjörð Bjarmason wrote:

> On Mon, Aug 27 2018, Ævar Arnfjörð Bjarmason wrote:
> Does that just mean that when cloning with --recursive with
> transfer.fsckObjects=true we'll re-read the file for each "clone"
> invocation, both for the main project and everything listed in
> .gitmodules?
>
> If so, I think something like this commit message would be clearer:

Here's a version that does that, I also integrated the perf test René
wrote and used it in the commit messages. The range-diff with v3
follows:

1:  09b52b0eae = 1:  ba9ef73304 fsck tests: setup of bogus commit object
2:  294a4dfd69 = 2:  1d20e37d3f fsck tests: add a test for no skipList input
3:  15a6e16dc1 = 3:  953ad29f6d fsck: document and test sorted skipList input
4:  b93fe502c3 = 4:  c8a0bd1555 fsck: document and test commented & empty line skipList input
-:  ---------- > 5:  cbbb0326d3 fsck: add a performance test for skipList
5:  3cf7d4e9eb ! 6:  ca6aca2994 fsck: use strbuf_getline() to read skiplist file
    @@ -2,14 +2,33 @@
     
         fsck: use strbuf_getline() to read skiplist file
     
    -    buffer is unlikely to contain a NUL character, so printing its contents
    -    using %s in a die() format is unsafe (detected with ASan).
    +    The buffer is unlikely to contain a NUL character, so printing its
    +    contents using %s in a die() format is unsafe (detected with ASan).
     
         Use an idiomatic strbuf_getline() loop instead, which ensures the buffer
         is always NUL-terminated, supports CRLF files as well, accepts files
         without a newline after the last line, supports any hash length
         automatically, and is shorter.
     
    +    This fixes a bug where emitting an error about an invalid line on say
    +    line 1 would continue printing subsequent lines, and usually continue
    +    into uninitialized memory.
    +
    +    The performance impact of this, on a CentOS 7 box with RedHat GCC
    +    4.8.5-28:
    +
    +        $ GIT_PERF_REPEAT_COUNT=5 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run HEAD~ HEAD p1450-fsck-skip-list.sh
    +        Test                                             HEAD~             HEAD
    +        ----------------------------------------------------------------------------------------
    +        1450.3: fsck with 0 skipped bad commits          7.75(7.39+0.35)   7.68(7.29+0.39) -0.9%
    +        1450.5: fsck with 1 skipped bad commits          7.70(7.30+0.40)   7.80(7.42+0.37) +1.3%
    +        1450.7: fsck with 10 skipped bad commits         7.77(7.37+0.40)   7.87(7.47+0.40) +1.3%
    +        1450.9: fsck with 100 skipped bad commits        7.82(7.41+0.40)   7.88(7.43+0.44) +0.8%
    +        1450.11: fsck with 1000 skipped bad commits      7.88(7.49+0.39)   7.84(7.43+0.40) -0.5%
    +        1450.13: fsck with 10000 skipped bad commits     8.02(7.63+0.39)   8.07(7.67+0.39) +0.6%
    +        1450.15: fsck with 100000 skipped bad commits    8.01(7.60+0.41)   8.08(7.70+0.38) +0.9%
    +        1450.17: fsck with 1000000 skipped bad commits   7.60(7.10+0.50)   7.37(7.18+0.19) -3.0%
    +
         Helped-by: Jeff King <peff@peff.net>
         Signed-off-by: Rene Scharfe <l.s.r@web.de>
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
6:  bd51de7c14 ! 7:  58bfd04d96 fsck: use oidset for skiplist
    @@ -1,22 +1,33 @@
     Author: René Scharfe <l.s.r@web.de>
     
    -    fsck: use oidset for skiplist
    +    fsck: use oidset instead of oid_array for skipList
     
    -    Object IDs to skip are stored in a shared static oid_array.  Lookups do
    -    a binary search on the sorted array.  The code checks if the object IDs
    -    are already in the correct order while loading and skips sorting in that
    -    case.  Lookups are done before reporting a (non-fatal) corruption and
    -    before checking .gitmodules files.
    +    Change the implementation of the skipList feature to use oidset
    +    instead of oid_array to store SHA-1s for later lookup.
     
    -    Simplify the code by using an oidset instead.  Memory usage is a bit
    -    higher, but we don't need to worry about any sort order anymore.  Embed
    -    the oidset into struct fsck_options to make its ownership clear (no
    -    hidden sharing) and avoid unnecessary pointer indirection.
    +    This list is parsed once on startup by fsck, fetch-pack or
    +    receive-pack depending on the *.skipList config in use. I.e. only once
    +    per invocation, but note that for "clone --recurse-submodules" each
    +    submodule will re-parse the list, in addition to the main project.
     
    -    Performance on repositories with a low number of reported issues and
    -    .gitmodules files (i.e. the usual case) won't be affected much.  The
    -    oidset should be a bit quicker with higher numbers of bad objects in
    -    the skipList.
    +    Memory usage is a bit higher, but we don't need to keep track of the
    +    sort order anymore. Embed the oidset into struct fsck_options to make
    +    its ownership clear (no hidden sharing) and avoid unnecessary pointer
    +    indirection.
    +
    +    The cumulative impact on performance of this & the preceding change,
    +    using the test setup described in the previous commit:
    +
    +        Test                                             HEAD~2            HEAD~                   HEAD
    +        ----------------------------------------------------------------------------------------------------------------
    +        1450.3: fsck with 0 skipped bad commits          7.70(7.31+0.38)   7.72(7.33+0.38) +0.3%   7.70(7.30+0.40) +0.0%
    +        1450.5: fsck with 1 skipped bad commits          7.84(7.47+0.37)   7.69(7.32+0.36) -1.9%   7.71(7.29+0.41) -1.7%
    +        1450.7: fsck with 10 skipped bad commits         7.81(7.40+0.40)   7.94(7.57+0.36) +1.7%   7.92(7.55+0.37) +1.4%
    +        1450.9: fsck with 100 skipped bad commits        7.81(7.42+0.38)   7.95(7.53+0.41) +1.8%   7.83(7.42+0.41) +0.3%
    +        1450.11: fsck with 1000 skipped bad commits      7.99(7.62+0.36)   7.90(7.50+0.40) -1.1%   7.86(7.49+0.37) -1.6%
    +        1450.13: fsck with 10000 skipped bad commits     7.98(7.57+0.40)   7.94(7.53+0.40) -0.5%   7.90(7.45+0.44) -1.0%
    +        1450.15: fsck with 100000 skipped bad commits    7.97(7.57+0.39)   8.03(7.67+0.36) +0.8%   7.84(7.43+0.41) -1.6%
    +        1450.17: fsck with 1000000 skipped bad commits   7.72(7.22+0.50)   7.28(7.07+0.20) -5.7%   7.13(6.87+0.25) -7.6%
     
         Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
         Signed-off-by: Rene Scharfe <l.s.r@web.de>
7:  17838cf68c ! 8:  0125461f13 fsck: support comments & empty lines in skipList
    @@ -11,6 +11,20 @@
         since this on-disk format can be expected to be used by multiple
         versions of git.
     
    +    There is no notable performance impact from this change, using the
    +    test setup described a couple of commist back:
    +
    +        Test                                             HEAD~             HEAD
    +        ----------------------------------------------------------------------------------------
    +        1450.3: fsck with 0 skipped bad commits          7.81(7.42+0.39)   7.72(7.34+0.38) -1.2%
    +        1450.5: fsck with 1 skipped bad commits          7.75(7.36+0.38)   7.66(7.26+0.39) -1.2%
    +        1450.7: fsck with 10 skipped bad commits         7.81(7.43+0.38)   7.70(7.30+0.39) -1.4%
    +        1450.9: fsck with 100 skipped bad commits        7.85(7.42+0.42)   7.73(7.31+0.41) -1.5%
    +        1450.11: fsck with 1000 skipped bad commits      7.81(7.43+0.38)   7.84(7.46+0.38) +0.4%
    +        1450.13: fsck with 10000 skipped bad commits     7.87(7.47+0.40)   7.86(7.46+0.40) -0.1%
    +        1450.15: fsck with 100000 skipped bad commits    7.77(7.39+0.38)   7.83(7.48+0.34) +0.8%
    +        1450.17: fsck with 1000000 skipped bad commits   7.17(6.92+0.24)   7.11(6.85+0.26) -0.8%
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
     diff --git a/Documentation/config.txt b/Documentation/config.txt

René Scharfe (3):
  fsck: add a performance test for skipList
  fsck: use strbuf_getline() to read skiplist file
  fsck: use oidset instead of oid_array for skipList

Ævar Arnfjörð Bjarmason (5):
  fsck tests: setup of bogus commit object
  fsck tests: add a test for no skipList input
  fsck: document and test sorted skipList input
  fsck: document and test commented & empty line skipList input
  fsck: support comments & empty lines in skipList

 Documentation/config.txt        | 22 ++++++++++----
 fsck.c                          | 48 ++++++++++-------------------
 fsck.h                          |  8 +++--
 t/perf/p1450-fsck-skip-list.sh  | 40 +++++++++++++++++++++++++
 t/t5504-fetch-receive-strict.sh | 53 ++++++++++++++++++++++++++++++---
 5 files changed, 126 insertions(+), 45 deletions(-)
 create mode 100755 t/perf/p1450-fsck-skip-list.sh

-- 
2.19.0.rc0.228.g281dcd1b4d0


  reply	other threads:[~2018-08-28  9:52 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-27 19:43 [PATCH v3 0/7] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 1/7] fsck tests: setup of bogus commit object Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 2/7] fsck tests: add a test for no skipList input Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 3/7] fsck: document and test sorted " Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 4/7] fsck: document and test commented & empty line " Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 5/7] fsck: use strbuf_getline() to read skiplist file Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 6/7] fsck: use oidset for skiplist Ævar Arnfjörð Bjarmason
2018-08-27 20:15   ` Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` Ævar Arnfjörð Bjarmason [this message]
2018-08-28  9:52     ` [PATCH v4 1/8] fsck tests: setup of bogus commit object Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 2/8] fsck tests: add a test for no skipList input Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 3/8] fsck: document and test sorted " Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 4/8] fsck: document and test commented & empty line " Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 5/8] fsck: add a performance test for skipList Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 6/8] fsck: use strbuf_getline() to read skiplist file Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 7/8] fsck: use oidset instead of oid_array for skipList Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 8/8] fsck: support comments & empty lines in skipList Ævar Arnfjörð Bjarmason
2018-08-28 14:59       ` René Scharfe
2018-09-03 14:49         ` [PATCH v5 00/10] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 01/10] fsck tests: setup of bogus commit object Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 02/10] fsck tests: add a test for no skipList input Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 03/10] fsck: document and test sorted " Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 04/10] fsck: document and test commented & empty line " Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 05/10] fsck: document that skipList input must be unabbreviated Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 06/10] fsck: add a performance test Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 07/10] fsck: add a performance test for skipList Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 08/10] fsck: use strbuf_getline() to read skiplist file Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 09/10] fsck: use oidset instead of oid_array for skipList Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 10/10] fsck: support comments & empty lines in skipList Ævar Arnfjörð Bjarmason
2018-08-28 14:29     ` [PATCH v3 6/7] fsck: use oidset for skiplist René Scharfe
2018-08-27 19:43 ` [PATCH v3 7/7] fsck: support comments & empty lines in skipList Ævar Arnfjörð Bjarmason
2018-08-27 19:46 ` [PATCH v3 0/7] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180828095219.23296-1-avarab@gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=l.s.r@web.de \
    --cc=peff@peff.net \
    --cc=ramsay@ramsayjones.plus.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.