git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Michael Giuffrida" <michaelpg@chromium.org>,
	git@vger.kernel.org, "SZEDER Gábor" <szeder.dev@gmail.com>
Subject: Re: [BUG] add_again() off-by-one error in custom format
Date: Thu, 15 Jun 2017 13:33:34 +0200	[thread overview]
Message-ID: <ec36f9fa-5f3e-b511-3985-3d0301b4847f@web.de> (raw)
In-Reply-To: <20170615055654.efvsouhr3leszz3i@sigill.intra.peff.net>

Am 15.06.2017 um 07:56 schrieb Jeff King:
> One interesting thing is that the cost of finding short hashes very much
> depends on your loose object setup. I timed:
> 
>    git log --format=%H >/dev/null
> 
> versus
> 
>    git log --format=%h >/dev/null
> 
> on git.git. It went from about 400ms to about 800ms. But then I noticed
> I had a lot of loose object directories, and ran "git gc --prune=now".
> Afterwards, my timings were more like 380ms and 460ms.
> 
> The difference is that in the "before" case, we actually opened each
> directory and ran getdents(). But after gc, the directories are gone
> totally and open() fails. We also have to do a linear walk through the
> objects in each directory, since the contents are sorted.

Do you mean "unsorted"?

> So I wonder if it is worth trying to optimize the short-sha1 computation
> in the first place. Double-%h aside, that would make _everything_
> faster, including --oneline.

Right.

> I'm not really sure how, though, short of caching the directory
> contents. That opens up questions of whether and when to invalidate the
> cache. If the cache were _just_ about short hashes, it might be OK to
> just assume that it remains valid through the length of the program (so
> worst case, a simultaneous write might mean that we generate a sha1
> which just became ambiguous, but that's generally going to be racy
> anyway).
> 
> The other downside of course is that we'd spend RAM on it. We could
> bound the size of the cache, I suppose.

You mean like an in-memory pack index for loose objects?  In order to
avoid the readdir call in sha1_name.c::find_short_object_filename()?
We'd only need to keep the hashes of found objects.  An oid_array
would be quite compact.

Non-racy writes inside a process should not be ignored (write, read
later) -- e.g. checkout does something like that.

Can we trust object directory time stamps for cache invalidation?

René

  reply	other threads:[~2017-06-15 11:33 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-12  3:13 [BUG] add_again() off-by-one error in custom format Michael Giuffrida
2017-06-12 22:49 ` Junio C Hamano
2017-06-13 18:09   ` René Scharfe
2017-06-13 18:29     ` Junio C Hamano
2017-06-13 20:29       ` René Scharfe
2017-06-13 21:20         ` Junio C Hamano
2017-06-14 18:24           ` René Scharfe
2017-06-15  5:56             ` Jeff King
2017-06-15 11:33               ` René Scharfe [this message]
2017-06-15 13:25                 ` Jeff King
2017-06-18 10:58                   ` René Scharfe
2017-06-18 11:49                     ` Jeff King
2017-06-18 12:59                       ` René Scharfe
2017-06-18 13:56                         ` Jeff King
2017-06-22 18:19                           ` René Scharfe
2017-06-22 23:15                             ` Jeff King
2017-06-18 10:58                   ` René Scharfe
2017-06-18 11:50                     ` Jeff King
2017-06-19  4:46                       ` Junio C Hamano
2017-06-22 18:19                         ` [PATCH] sha1_name: cache readdir(3) results in find_short_object_filename() René Scharfe
2017-06-22 23:10                           ` Jeff King
2017-06-24 12:12                             ` René Scharfe
2017-06-24 12:14                               ` Jeff King
2017-06-24 12:12                             ` René Scharfe
2017-06-24 12:20                               ` Jeff King
2017-06-24 14:09                                 ` René Scharfe
2017-06-24 14:12                                   ` Jeff King
2017-06-15 18:37             ` [BUG] add_again() off-by-one error in custom format Junio C Hamano
2017-06-13 22:24         ` SZEDER Gábor
2017-06-14 17:34           ` René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ec36f9fa-5f3e-b511-3985-3d0301b4847f@web.de \
    --to=l.s.r@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=michaelpg@chromium.org \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).