From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: "Geert Jansen" <gerardu@amazon.com>,
"Junio C Hamano" <gitster@pobox.com>,
"git@vger.kernel.org" <git@vger.kernel.org>,
"René Scharfe" <l.s.r@web.de>,
"Takuto Ikuta" <tikuta@chromium.org>
Subject: Re: [PATCH 8/9] sha1-file: use loose object cache for quick existence check
Date: Mon, 12 Nov 2018 17:01:02 +0100 [thread overview]
Message-ID: <87ftw62sld.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20181112145442.GH7400@sigill.intra.peff.net>
On Mon, Nov 12 2018, Jeff King wrote:
> In cases where we expect to ask has_sha1_file() about a lot of objects
> that we are not likely to have (e.g., during fetch negotiation), we
> already use OBJECT_INFO_QUICK to sacrifice accuracy (due to racing with
> a simultaneous write or repack) for speed (we avoid re-scanning the pack
> directory).
>
> However, even checking for loose objects can be expensive, as we will
> stat() each one. On many systems this cost isn't too noticeable, but
> stat() can be particularly slow on some operating systems, or due to
> network filesystems.
>
> Since the QUICK flag already tells us that we're OK with a slightly
> stale answer, we can use that as a cue to look in our in-memory cache of
> each object directory. That basically trades an in-memory binary search
> for a stat() call.
>
> Note that it is possible for this to actually be _slower_. We'll do a
> full readdir() to fill the cache, so if you have a very large number of
> loose objects and a very small number of lookups, that readdir() may end
> up more expensive.
>
> This shouldn't be a big deal in practice. If you have a large number of
> reachable loose objects, you'll already run into performance problems
> (which you should remedy by repacking). You may have unreachable objects
> which wouldn't otherwise impact performance. Usually these would go away
> with the prune step of "git gc", but they may be held for up to 2 weeks
> in the default configuration.
>
> So it comes down to how many such objects you might reasonably expect to
> have, how much slower is readdir() on N entries versus M stat() calls
> (and here we really care about the syscall backing readdir(), like
> getdents() on Linux, but I'll just call this readdir() below).
>
> If N is much smaller than M (a typical packed repo), we know this is a
> big win (few readdirs() followed by many uses of the resulting cache).
> When N and M are similar in size, it's also a win. We care about the
> latency of making a syscall, and readdir() should be giving us many
> values in a single call. How many?
>
> On Linux, running "strace -e getdents ls" shows a 32k buffer getting 512
> entries per call (which is 64 bytes per entry; the name itself is 38
> bytes, plus there are some other fields). So we can imagine that this is
> always a win as long as the number of loose objects in the repository is
> a factor of 500 less than the number of lookups you make. It's hard to
> auto-tune this because we don't generally know up front how many lookups
> we're going to do. But it's unlikely for this to perform significantly
> worse.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> There's some obvious hand-waving in the paragraphs above. I would love
> it if somebody with an NFS system could do some before/after timings
> with various numbers of loose objects, to get a sense of where the
> breakeven point is.
>
> My gut is that we do not need the complexity of a cache-size limit, nor
> of a config option to disable this. But it would be nice to have a real
> number where "reasonable" ends and "pathological" begins. :)
I'm happy to test this on some of the NFS we have locally, and started
out with a plan to write some for-loop using the low-level API (so it
would look up all 256), fake populate .git/objects/?? with N number of
objects etc, but ran out of time.
Do you have something ready that you think would be representative and I
could just run? If not I'll try to pick this up again...
next prev parent reply other threads:[~2018-11-12 16:01 UTC|newest]
Thread overview: 99+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-25 18:38 [RFC PATCH] index-pack: improve performance on NFS Jansen, Geert
2018-10-26 0:21 ` Junio C Hamano
2018-10-26 20:38 ` Ævar Arnfjörð Bjarmason
2018-10-27 7:26 ` Junio C Hamano
2018-10-27 9:33 ` Jeff King
2018-10-27 11:22 ` Ævar Arnfjörð Bjarmason
2018-10-28 22:50 ` [PATCH 0/4] index-pack: optionally turn off SHA-1 collision checking Ævar Arnfjörð Bjarmason
2018-10-30 2:49 ` Geert Jansen
2018-10-30 9:04 ` Junio C Hamano
2018-10-30 18:43 ` [PATCH v2 0/3] index-pack: test updates Ævar Arnfjörð Bjarmason
2018-11-13 20:19 ` [PATCH v3] index-pack: add ability to disable SHA-1 collision check Ævar Arnfjörð Bjarmason
2018-11-14 7:09 ` Junio C Hamano
2018-11-14 12:40 ` Ævar Arnfjörð Bjarmason
2018-10-30 18:43 ` [PATCH v2 1/3] pack-objects test: modernize style Ævar Arnfjörð Bjarmason
2018-10-30 18:43 ` [PATCH v2 2/3] pack-objects tests: don't leave test .git corrupt at end Ævar Arnfjörð Bjarmason
2018-10-30 18:43 ` [PATCH v2 3/3] index-pack tests: don't leave test repo dirty " Ævar Arnfjörð Bjarmason
2018-10-28 22:50 ` [PATCH 1/4] pack-objects test: modernize style Ævar Arnfjörð Bjarmason
2018-10-28 22:50 ` [PATCH 2/4] pack-objects tests: don't leave test .git corrupt at end Ævar Arnfjörð Bjarmason
2018-10-28 22:50 ` [PATCH 3/4] index-pack tests: don't leave test repo dirty " Ævar Arnfjörð Bjarmason
2018-10-28 22:50 ` [PATCH 4/4] index-pack: add ability to disable SHA-1 collision check Ævar Arnfjörð Bjarmason
2018-10-29 15:04 ` [RFC PATCH] index-pack: improve performance on NFS Jeff King
2018-10-29 15:09 ` Jeff King
2018-10-29 19:36 ` Ævar Arnfjörð Bjarmason
2018-10-29 23:27 ` Jeff King
2018-11-07 22:55 ` Geert Jansen
2018-11-08 12:02 ` Jeff King
2018-11-08 20:58 ` Geert Jansen
2018-11-08 21:18 ` Jeff King
2018-11-08 21:55 ` Geert Jansen
2018-11-08 22:20 ` Ævar Arnfjörð Bjarmason
2018-11-09 10:11 ` Ævar Arnfjörð Bjarmason
2018-11-12 14:31 ` Jeff King
2018-11-12 14:46 ` [PATCH 0/9] caching loose objects Jeff King
2018-11-12 14:46 ` [PATCH 1/9] fsck: do not reuse child_process structs Jeff King
2018-11-12 15:26 ` Derrick Stolee
2018-11-12 14:47 ` [PATCH 2/9] submodule--helper: prefer strip_suffix() to ends_with() Jeff King
2018-11-12 18:23 ` Stefan Beller
2018-11-12 14:48 ` [PATCH 3/9] rename "alternate_object_database" to "object_directory" Jeff King
2018-11-12 15:30 ` Derrick Stolee
2018-11-12 15:36 ` Jeff King
2018-11-12 19:41 ` Ramsay Jones
2018-11-12 14:48 ` [PATCH 4/9] sha1_file_name(): overwrite buffer instead of appending Jeff King
2018-11-12 15:32 ` Derrick Stolee
2018-11-12 14:49 ` [PATCH 5/9] handle alternates paths the same as the main object dir Jeff King
2018-11-12 15:38 ` Derrick Stolee
2018-11-12 15:46 ` Jeff King
2018-11-12 15:50 ` Derrick Stolee
2018-11-12 14:50 ` [PATCH 6/9] sha1-file: use an object_directory for " Jeff King
2018-11-12 15:48 ` Derrick Stolee
2018-11-12 16:09 ` Jeff King
2018-11-12 19:04 ` Stefan Beller
2018-11-22 17:42 ` Jeff King
2018-11-12 18:48 ` Stefan Beller
2018-11-12 14:50 ` [PATCH 7/9] object-store: provide helpers for loose_objects_cache Jeff King
2018-11-12 19:24 ` René Scharfe
2018-11-12 20:16 ` Jeff King
2018-11-12 14:54 ` [PATCH 8/9] sha1-file: use loose object cache for quick existence check Jeff King
2018-11-12 16:00 ` Derrick Stolee
2018-11-12 16:01 ` Ævar Arnfjörð Bjarmason [this message]
2018-11-12 16:21 ` Jeff King
2018-11-12 22:18 ` Ævar Arnfjörð Bjarmason
2018-11-12 22:30 ` Ævar Arnfjörð Bjarmason
2018-11-13 10:02 ` Ævar Arnfjörð Bjarmason
2018-11-14 18:21 ` René Scharfe
2018-12-02 10:52 ` René Scharfe
2018-12-03 22:04 ` Jeff King
2018-12-04 21:45 ` René Scharfe
2018-12-05 4:46 ` Jeff King
2018-12-05 6:02 ` René Scharfe
2018-12-05 6:51 ` Jeff King
2018-12-05 8:15 ` Jeff King
2018-12-05 18:41 ` René Scharfe
2018-12-05 20:17 ` Jeff King
2018-11-12 22:44 ` Geert Jansen
2018-11-27 20:48 ` René Scharfe
2018-12-01 19:49 ` Jeff King
2018-11-12 14:55 ` [PATCH 9/9] fetch-pack: drop custom loose object cache Jeff King
2018-11-12 19:25 ` René Scharfe
2018-11-12 19:32 ` Ævar Arnfjörð Bjarmason
2018-11-12 20:07 ` Jeff King
2018-11-12 20:13 ` René Scharfe
2018-11-12 16:02 ` [PATCH 0/9] caching loose objects Derrick Stolee
2018-11-12 19:10 ` Stefan Beller
2018-11-09 13:43 ` [RFC PATCH] index-pack: improve performance on NFS Ævar Arnfjörð Bjarmason
2018-11-09 16:08 ` Duy Nguyen
2018-11-10 14:04 ` Ævar Arnfjörð Bjarmason
2018-11-12 14:34 ` Jeff King
2018-11-12 22:58 ` Geert Jansen
2018-10-27 14:04 ` Duy Nguyen
2018-10-29 15:18 ` Jeff King
2018-10-29 0:48 ` Junio C Hamano
2018-10-29 15:20 ` Jeff King
2018-10-29 18:43 ` Ævar Arnfjörð Bjarmason
2018-10-29 21:34 ` Geert Jansen
2018-10-29 21:50 ` Jeff King
2018-10-29 22:21 ` Geert Jansen
2018-10-29 22:27 ` Jeff King
2018-10-29 22:35 ` Stefan Beller
2018-10-29 23:29 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ftw62sld.fsf@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=gerardu@amazon.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=l.s.r@web.de \
--cc=peff@peff.net \
--cc=tikuta@chromium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.