git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	David Turner <dturner@twopensource.com>,
	Michael Haggerty <mhagger@alum.mit.edu>,
	Git Mailing List <git@vger.kernel.org>,
	David Turner <dturner@twitter.com>
Subject: Re: [PATCH] check_refname_component: Optimize
Date: Thu, 29 May 2014 20:07:28 -0400	[thread overview]
Message-ID: <20140530000728.GC28683@sigill.intra.peff.net> (raw)
In-Reply-To: <CACsJy8BgriBBWJ6ZzQS8S7p4SUB=bdZHdnUQsyN03g+vtApbxA@mail.gmail.com>

On Fri, May 30, 2014 at 06:43:18AM +0700, Duy Nguyen wrote:

> >> The first time we read packed_refs, check_refname_format() is called
> >> in read_packed_refs()->create_ref_entry() as usual. If we find no
> >> problem, we store packed_refs stat() info in maybe packed_refs.stat.
> >> Next time we read packed_refs, if packed_refs.stat is there and
> >> indicates that packed_refs has not changed, we can make
> >> create_ref_entry() ignore check_refname_format() completely.
> >
> > I'm confused. Why would we re-open packed-refs at all if the stat
> > information hasn't changed?
> 
> No, not in the same process. In the next run.

Ah, I thought "packed_refs.stat" was a struct member, but I guess you
mean it as a filename.

But then we're just trusting that the "trust me" flag on disk is
correct. Why not just trust that packed-refs is correct in the first
place?

IOW, consider this progression of changes:

  1. Check refname format when we read packed-refs (the current
     behavior).

  2. Keep a separate file "packed-refs.stat" with stat information. If
     the packed-refs file matches that stat information, do not bother
     checking refname formats.

  3. Put a flag in "packed-refs" that says "trust me, I'm valid". Check
     the refnames when it is generated.

  4. Realize that we already check the refnames when we write it out.
     Don't bother writing "trust me, I'm valid"; readers can assume that
     it is.

What is the scenario that option (2) protects against that options (3)
and (4) do not?

I could guess something like "the writer has a different idea of what a
valid refname is than we do". But that applies as well to (2), but just
as "the reader who wrote packed-refs.stat has a different idea than we
do".

As a side note, while it is nice that we might make check_refname_format
faster, I think if you _really_ want to make repos with a lot of refs
faster, it would make more sense to introduce an on-disk format that
does not need linear parsing (e.g., something we could mmap and binary
search, or even something dbm-ish that could be updated without
rewriting the whole file (deletions, for example, must rewrite the
whole file, giving quadratic performance when deleting all refs one by
one).

-Peff

  reply	other threads:[~2014-05-30  0:07 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-28 21:04 [PATCH v2 0/1] check_refname_component: Optimize David Turner
2014-05-28 21:04 ` [PATCH] " David Turner
2014-05-28 21:44   ` Michael Haggerty
2014-05-28 23:49     ` David Turner
2014-05-29 13:41       ` Duy Nguyen
2014-05-29 16:36         ` Junio C Hamano
2014-05-29 23:24           ` Duy Nguyen
2014-05-29 23:41             ` Jeff King
2014-05-29 23:43               ` Duy Nguyen
2014-05-30  0:07                 ` Jeff King [this message]
2014-05-30  2:03                   ` Duy Nguyen
2014-05-30  9:47                   ` Michael Haggerty
2014-05-30 17:29                     ` Jeff King
2014-05-31 10:47                       ` Michael Haggerty
2014-05-31 11:21                   ` Duy Nguyen
  -- strict thread matches above, loose matches on Subject: below --
2014-05-28 19:57 David Turner
2014-05-28 21:24 ` Junio C Hamano
2014-05-29 12:19 ` brian m. carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140530000728.GC28683@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=dturner@twitter.com \
    --cc=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).