Re: [PATCH] check_refname_component: Optimize

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Michael Haggerty <mhagger@alum.mit.edu>
To: Jeff King <peff@peff.net>, Duy Nguyen <pclouds@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	David Turner <dturner@twopensource.com>,
	Git Mailing List <git@vger.kernel.org>,
	David Turner <dturner@twitter.com>
Subject: Re: [PATCH] check_refname_component: Optimize
Date: Fri, 30 May 2014 11:47:33 +0200	[thread overview]
Message-ID: <538853B5.1080308@alum.mit.edu> (raw)
In-Reply-To: <20140530000728.GC28683@sigill.intra.peff.net>

On 05/30/2014 02:07 AM, Jeff King wrote:
> On Fri, May 30, 2014 at 06:43:18AM +0700, Duy Nguyen wrote:
> 
>>>> The first time we read packed_refs, check_refname_format() is called
>>>> in read_packed_refs()->create_ref_entry() as usual. If we find no
>>>> problem, we store packed_refs stat() info in maybe packed_refs.stat.
>>>> Next time we read packed_refs, if packed_refs.stat is there and
>>>> indicates that packed_refs has not changed, we can make
>>>> create_ref_entry() ignore check_refname_format() completely.
>>>
>>> I'm confused. Why would we re-open packed-refs at all if the stat
>>> information hasn't changed?
>>
>> No, not in the same process. In the next run.
> 
> Ah, I thought "packed_refs.stat" was a struct member, but I guess you
> mean it as a filename.
> 
> But then we're just trusting that the "trust me" flag on disk is
> correct. Why not just trust that packed-refs is correct in the first
> place?
> 
> IOW, consider this progression of changes:
> 
>   1. Check refname format when we read packed-refs (the current
>      behavior).
> 
>   2. Keep a separate file "packed-refs.stat" with stat information. If
>      the packed-refs file matches that stat information, do not bother
>      checking refname formats.
> 
>   3. Put a flag in "packed-refs" that says "trust me, I'm valid". Check
>      the refnames when it is generated.
> 
>   4. Realize that we already check the refnames when we write it out.
>      Don't bother writing "trust me, I'm valid"; readers can assume that
>      it is.
> 
> What is the scenario that option (2) protects against that options (3)
> and (4) do not?
> 
> I could guess something like "the writer has a different idea of what a
> valid refname is than we do". But that applies as well to (2), but just
> as "the reader who wrote packed-refs.stat has a different idea than we
> do".

If we want to be robust to future changes to refname rules, we could add
a header flag like

    # pack-refs with: peeled fully-peeled check-level=1.0

which promises that the reference names in the file conform to the
current ("version 1.0") check_refname_format() rules.

If we ever make the rules stricter (a "backwards-compatible" change), we
would increment the check-level to 1.1.  That way, an old reader, who
knows about check-level=1.0 but not check-level=1.1, can still trust
that the refnames in the file conform to its check_refname_format()
rules and avoid the verification step.  (Of course if that version
writes the file again, it would have to set the check-level=1.0 tag, and
newer software would be forced to validate on reading to be sure that
the refnames still conform to check-level=1.1.)

If we make the rules looser (a "backwards-incompatible" change), we
would increment the check-level to 2.0.  In that case readers who only
know about check-level 1.x would have to turn their verification code
back on when reading the file to ensure that they can work with the
refnames that it contains.

Format changes should be infrequent enough, and the cost of verification
is low enough, that sometimes ping-ponging back and forth between
software versions shouldn't be a problem in practice.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

next prev parent reply	other threads:[~2014-05-30  9:47 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-28 21:04 [PATCH v2 0/1] check_refname_component: Optimize David Turner
2014-05-28 21:04 ` [PATCH] " David Turner
2014-05-28 21:44   ` Michael Haggerty
2014-05-28 23:49     ` David Turner
2014-05-29 13:41       ` Duy Nguyen
2014-05-29 16:36         ` Junio C Hamano
2014-05-29 23:24           ` Duy Nguyen
2014-05-29 23:41             ` Jeff King
2014-05-29 23:43               ` Duy Nguyen
2014-05-30  0:07                 ` Jeff King
2014-05-30  2:03                   ` Duy Nguyen
2014-05-30  9:47                   ` Michael Haggerty [this message]
2014-05-30 17:29                     ` Jeff King
2014-05-31 10:47                       ` Michael Haggerty
2014-05-31 11:21                   ` Duy Nguyen
  -- strict thread matches above, loose matches on Subject: below --
2014-05-28 19:57 David Turner
2014-05-28 21:24 ` Junio C Hamano
2014-05-29 12:19 ` brian m. carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=538853B5.1080308@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=dturner@twitter.com \
    --cc=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).