From mboxrd@z Thu Jan 1 00:00:00 1970 From: Junio C Hamano Subject: Re: [REVISED PATCH 2/6] Introduce commit notes Date: Thu, 19 Jul 2007 02:34:00 -0700 Message-ID: <7vodi83fg7.fsf@assigned-by-dhcp.cox.net> References: <7vejj96igx.fsf@assigned-by-dhcp.cox.net> <7vfy3l3rj0.fsf@assigned-by-dhcp.cox.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Johannes Schindelin , Alberto Bertogli , git@vger.kernel.org, Johan Herland To: Linus Torvalds X-From: git-owner@vger.kernel.org Thu Jul 19 11:34:10 2007 Return-path: Envelope-to: gcvg-git@gmane.org Received: from vger.kernel.org ([209.132.176.167]) by lo.gmane.org with esmtp (Exim 4.50) id 1IBSOf-0003ij-EQ for gcvg-git@gmane.org; Thu, 19 Jul 2007 11:34:09 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933587AbXGSJeF (ORCPT ); Thu, 19 Jul 2007 05:34:05 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1760526AbXGSJeE (ORCPT ); Thu, 19 Jul 2007 05:34:04 -0400 Received: from fed1rmmtao105.cox.net ([68.230.241.41]:36583 "EHLO fed1rmmtao105.cox.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757535AbXGSJeB (ORCPT ); Thu, 19 Jul 2007 05:34:01 -0400 Received: from fed1rmimpo01.cox.net ([70.169.32.71]) by fed1rmmtao105.cox.net (InterMail vM.7.08.02.01 201-2186-121-102-20070209) with ESMTP id <20070719093401.NULO1399.fed1rmmtao105.cox.net@fed1rmimpo01.cox.net>; Thu, 19 Jul 2007 05:34:01 -0400 Received: from assigned-by-dhcp.cox.net ([68.5.247.80]) by fed1rmimpo01.cox.net with bizsmtp id RMa01X0041kojtg0000000; Thu, 19 Jul 2007 05:34:01 -0400 In-Reply-To: <7vfy3l3rj0.fsf@assigned-by-dhcp.cox.net> (Junio C. Hamano's message of "Wed, 18 Jul 2007 22:13:07 -0700") User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/21.4 (gnu/linux) Sender: git-owner@vger.kernel.org Precedence: bulk X-Mailing-List: git@vger.kernel.org Archived-At: Junio C Hamano writes: > Linus Torvalds writes: > ... >> So if you're really really *really* unlucky, you might end up having to >> fall back on the linear search. But it still works! >> >> Can anybody see anything wrong in my thinking above? > ... > But the real problem of this approach of course is that this is > not reliable and can get a false match. You can find your > beginning NUL in the SHA-1 part of one entry, and terminating > NUL later in the SHA-1 part of next entry, and you will never > notice. In other words, if you are really really *really* unlucky, not only you might end up being fooled by random byte sequences in SHA-1 part of the tree object, you would not even notice that you have to fall back on the linear search. I've long time ago concluded that if we care about reliability (and we do very much), a bisectable tree without breaking backward compatibility is impossible. I was hoping to find a "hole" in tree object format so that I can place an extended section that is invisible to older versions of git, and place a table that records offsets of each tree entries to help bisection and/or perhaps a hash table to help look-up, but I do not think it is possible. In the case of index file, the original file format had a hole after the cache-entry array where we can later squeeze an extension section that is invisible to older versions of git. But the tree object format is designed so tight that I do not see there is any place to put an extension section. Side note: I also think adding "extension section" to tree object is not a good idea to begin with. The data nor length of such a section cannot participate in hash computation to derive the tree's object name so that we can still compare two tree objects (with and without such extension) that have the same contents by only looking at their object names. But having contents that are not counted as parts of the object's name goes against the reliability and safety of git. > ... > I was suggesting to have a specialized parser only to read such > tree objects that are "abused" to represent notes. You can > cheaply validate that these trees are of expected shape. > ... > For an added safety, a "notes" writer could even throw in > signature bytes (say, a symlink whose name is " !" in the > top-level tree, and another symlink " !{37}" in the second-level > tree) to protect the reader. Of course, even with the above trick with relatively cheap validation based on size, entry format, and "signature entries", the way I outlined to speed up "notes" access really relies on the tree objects used in "notes" to be well formed. If somebody throws in a tree that is not really a "note" to refs/notes/, and if I am really really *really* unlucky, not only I might end up being fooled by random byte sequences in SHA-1 part of the tree object, I would not even notice that I am reading garbage and end up giving garbage as "note" to the object back to the user.