From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Haggerty Subject: Re: [PATCH] check_refname_component: Optimize Date: Fri, 30 May 2014 11:47:33 +0200 Message-ID: <538853B5.1080308@alum.mit.edu> References: <1401311055-480-1-git-send-email-dturner@twitter.com> <1401311055-480-2-git-send-email-dturner@twitter.com> <538658C0.8050001@alum.mit.edu> <1401320968.18134.98.camel@stross> <20140529234109.GA28683@sigill.intra.peff.net> <20140530000728.GC28683@sigill.intra.peff.net> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: Junio C Hamano , David Turner , Git Mailing List , David Turner To: Jeff King , Duy Nguyen X-From: git-owner@vger.kernel.org Fri May 30 11:47:43 2014 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WqJPb-0007o7-08 for gcvg-git-2@plane.gmane.org; Fri, 30 May 2014 11:47:43 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932338AbaE3Jrj (ORCPT ); Fri, 30 May 2014 05:47:39 -0400 Received: from alum-mailsec-scanner-1.mit.edu ([18.7.68.12]:57757 "EHLO alum-mailsec-scanner-1.mit.edu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754750AbaE3Jrh (ORCPT ); Fri, 30 May 2014 05:47:37 -0400 X-AuditID: 1207440c-f79656d000000c83-a2-538853b9cb74 Received: from outgoing-alum.mit.edu (OUTGOING-ALUM.MIT.EDU [18.7.68.33]) by alum-mailsec-scanner-1.mit.edu (Symantec Messaging Gateway) with SMTP id 27.38.03203.9B358835; Fri, 30 May 2014 05:47:37 -0400 (EDT) Received: from [192.168.69.130] (p5DDB3ADC.dip0.t-ipconnect.de [93.219.58.220]) (authenticated bits=0) (User authenticated as mhagger@ALUM.MIT.EDU) by outgoing-alum.mit.edu (8.13.8/8.12.4) with ESMTP id s4U9lYDw027805 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Fri, 30 May 2014 05:47:35 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Icedove/24.4.0 In-Reply-To: <20140530000728.GC28683@sigill.intra.peff.net> X-Enigmail-Version: 1.6 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrCKsWRmVeSWpSXmKPExsUixO6iqLszuCPYYPofFosPV54zWszfdILR outKN5NFQ+8VZovuKW8ZLX609DA7sHnsnHWX3eNZ7x5Gj4uXlD0udj9h9ljw/D67x+dNcgFs Udw2SYklZcGZ6Xn6dgncGcdPrmEuaJGqOPD0BFMD4z2RLkZODgkBE4mFM9czQdhiEhfurWfr YuTiEBK4zCjR/Gs9K0hCSOA8k8TPjX4gNq+AtsSOGavAGlgEVCU+tO9kA7HZBHQlFvU0A8U5 OEQFgiT+nFWEKBeUODnzCQuILSJgK/F093+w+cwC2xgl7l9bzw6SEBYwk7h8aBMLxOJpLBIf JrUwgiQ4BawlvnQ/YwYZKiEgLtHTGARiMguoS6yfJwRSwSwgL7H97RzmCYyCs5Csm4VQNQtJ 1QJG5lWMcok5pbm6uYmZOcWpybrFyYl5ealFuoZ6uZkleqkppZsYIbHAs4Px2zqZQ4wCHIxK PLwZLu3BQqyJZcWVuYcYJTmYlER5nTw6goX4kvJTKjMSizPii0pzUosPMUpwMCuJ8O71Acrx piRWVqUW5cOkpDlYlMR5VZeo+wkJpCeWpGanphakFsFkZTg4lCR4PwUBNQoWpaanVqRl5pQg pJk4OEGGc0mJFKfmpaQWJZaWZMSDoje+GBi/ICkeoL1rAkH2Fhck5gJFIVpPMepynLpzrI1J iCUvPy9VSpx3H0iRAEhRRmke3ApY4nvFKA70sTDvC5BLeIBJE27SK6AlTEBLnnS2giwpSURI STUwlqbecXr7wWDF0v3WD7h++7dGhlSoKp3RPzDpbO1S/7PR5/OnTGE/wvnUq0q/ Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On 05/30/2014 02:07 AM, Jeff King wrote: > On Fri, May 30, 2014 at 06:43:18AM +0700, Duy Nguyen wrote: > >>>> The first time we read packed_refs, check_refname_format() is called >>>> in read_packed_refs()->create_ref_entry() as usual. If we find no >>>> problem, we store packed_refs stat() info in maybe packed_refs.stat. >>>> Next time we read packed_refs, if packed_refs.stat is there and >>>> indicates that packed_refs has not changed, we can make >>>> create_ref_entry() ignore check_refname_format() completely. >>> >>> I'm confused. Why would we re-open packed-refs at all if the stat >>> information hasn't changed? >> >> No, not in the same process. In the next run. > > Ah, I thought "packed_refs.stat" was a struct member, but I guess you > mean it as a filename. > > But then we're just trusting that the "trust me" flag on disk is > correct. Why not just trust that packed-refs is correct in the first > place? > > IOW, consider this progression of changes: > > 1. Check refname format when we read packed-refs (the current > behavior). > > 2. Keep a separate file "packed-refs.stat" with stat information. If > the packed-refs file matches that stat information, do not bother > checking refname formats. > > 3. Put a flag in "packed-refs" that says "trust me, I'm valid". Check > the refnames when it is generated. > > 4. Realize that we already check the refnames when we write it out. > Don't bother writing "trust me, I'm valid"; readers can assume that > it is. > > What is the scenario that option (2) protects against that options (3) > and (4) do not? > > I could guess something like "the writer has a different idea of what a > valid refname is than we do". But that applies as well to (2), but just > as "the reader who wrote packed-refs.stat has a different idea than we > do". If we want to be robust to future changes to refname rules, we could add a header flag like # pack-refs with: peeled fully-peeled check-level=1.0 which promises that the reference names in the file conform to the current ("version 1.0") check_refname_format() rules. If we ever make the rules stricter (a "backwards-compatible" change), we would increment the check-level to 1.1. That way, an old reader, who knows about check-level=1.0 but not check-level=1.1, can still trust that the refnames in the file conform to its check_refname_format() rules and avoid the verification step. (Of course if that version writes the file again, it would have to set the check-level=1.0 tag, and newer software would be forced to validate on reading to be sure that the refnames still conform to check-level=1.1.) If we make the rules looser (a "backwards-incompatible" change), we would increment the check-level to 2.0. In that case readers who only know about check-level 1.x would have to turn their verification code back on when reading the file to ensure that they can work with the refnames that it contains. Format changes should be infrequent enough, and the cost of verification is low enough, that sometimes ping-ponging back and forth between software versions shouldn't be a problem in practice. Michael -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/