From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>
Cc: "Junio C Hamano" <gitster@pobox.com>,
"Michael Giuffrida" <michaelpg@chromium.org>,
git@vger.kernel.org, "SZEDER Gábor" <szeder.dev@gmail.com>
Subject: Re: [BUG] add_again() off-by-one error in custom format
Date: Thu, 15 Jun 2017 13:33:34 +0200 [thread overview]
Message-ID: <ec36f9fa-5f3e-b511-3985-3d0301b4847f@web.de> (raw)
In-Reply-To: <20170615055654.efvsouhr3leszz3i@sigill.intra.peff.net>
Am 15.06.2017 um 07:56 schrieb Jeff King:
> One interesting thing is that the cost of finding short hashes very much
> depends on your loose object setup. I timed:
>
> git log --format=%H >/dev/null
>
> versus
>
> git log --format=%h >/dev/null
>
> on git.git. It went from about 400ms to about 800ms. But then I noticed
> I had a lot of loose object directories, and ran "git gc --prune=now".
> Afterwards, my timings were more like 380ms and 460ms.
>
> The difference is that in the "before" case, we actually opened each
> directory and ran getdents(). But after gc, the directories are gone
> totally and open() fails. We also have to do a linear walk through the
> objects in each directory, since the contents are sorted.
Do you mean "unsorted"?
> So I wonder if it is worth trying to optimize the short-sha1 computation
> in the first place. Double-%h aside, that would make _everything_
> faster, including --oneline.
Right.
> I'm not really sure how, though, short of caching the directory
> contents. That opens up questions of whether and when to invalidate the
> cache. If the cache were _just_ about short hashes, it might be OK to
> just assume that it remains valid through the length of the program (so
> worst case, a simultaneous write might mean that we generate a sha1
> which just became ambiguous, but that's generally going to be racy
> anyway).
>
> The other downside of course is that we'd spend RAM on it. We could
> bound the size of the cache, I suppose.
You mean like an in-memory pack index for loose objects? In order to
avoid the readdir call in sha1_name.c::find_short_object_filename()?
We'd only need to keep the hashes of found objects. An oid_array
would be quite compact.
Non-racy writes inside a process should not be ignored (write, read
later) -- e.g. checkout does something like that.
Can we trust object directory time stamps for cache invalidation?
René
next prev parent reply other threads:[~2017-06-15 11:33 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-12 3:13 [BUG] add_again() off-by-one error in custom format Michael Giuffrida
2017-06-12 22:49 ` Junio C Hamano
2017-06-13 18:09 ` René Scharfe
2017-06-13 18:29 ` Junio C Hamano
2017-06-13 20:29 ` René Scharfe
2017-06-13 21:20 ` Junio C Hamano
2017-06-14 18:24 ` René Scharfe
2017-06-15 5:56 ` Jeff King
2017-06-15 11:33 ` René Scharfe [this message]
2017-06-15 13:25 ` Jeff King
2017-06-18 10:58 ` René Scharfe
2017-06-18 11:49 ` Jeff King
2017-06-18 12:59 ` René Scharfe
2017-06-18 13:56 ` Jeff King
2017-06-22 18:19 ` René Scharfe
2017-06-22 23:15 ` Jeff King
2017-06-18 10:58 ` René Scharfe
2017-06-18 11:50 ` Jeff King
2017-06-19 4:46 ` Junio C Hamano
2017-06-22 18:19 ` [PATCH] sha1_name: cache readdir(3) results in find_short_object_filename() René Scharfe
2017-06-22 23:10 ` Jeff King
2017-06-24 12:12 ` René Scharfe
2017-06-24 12:14 ` Jeff King
2017-06-24 12:12 ` René Scharfe
2017-06-24 12:20 ` Jeff King
2017-06-24 14:09 ` René Scharfe
2017-06-24 14:12 ` Jeff King
2017-06-15 18:37 ` [BUG] add_again() off-by-one error in custom format Junio C Hamano
2017-06-13 22:24 ` SZEDER Gábor
2017-06-14 17:34 ` René Scharfe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ec36f9fa-5f3e-b511-3985-3d0301b4847f@web.de \
--to=l.s.r@web.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=michaelpg@chromium.org \
--cc=peff@peff.net \
--cc=szeder.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).