From mboxrd@z Thu Jan 1 00:00:00 1970 From: linux@horizon.com Subject: Re: [PATCH 1/3] Lazily open pack index files on demand Date: 28 May 2007 20:09:49 -0400 Message-ID: <20070529000949.28007.qmail@science.horizon.com> To: git@vger.kernel.org, torvalds@linux-foundation.org X-From: git-owner@vger.kernel.org Tue May 29 02:10:18 2007 Return-path: Envelope-to: gcvg-git@gmane.org Received: from vger.kernel.org ([209.132.176.167]) by lo.gmane.org with esmtp (Exim 4.50) id 1HspI1-0006B9-QE for gcvg-git@gmane.org; Tue, 29 May 2007 02:10:18 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751872AbXE2AJv (ORCPT ); Mon, 28 May 2007 20:09:51 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752620AbXE2AJv (ORCPT ); Mon, 28 May 2007 20:09:51 -0400 Received: from science.horizon.com ([192.35.100.1]:10175 "HELO science.horizon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1751872AbXE2AJv (ORCPT ); Mon, 28 May 2007 20:09:51 -0400 Received: (qmail 28008 invoked by uid 1000); 28 May 2007 20:09:49 -0400 Sender: git-owner@vger.kernel.org Precedence: bulk X-Mailing-List: git@vger.kernel.org Archived-At: > First off, when comparing O(logn) to O(1), with "n" being less than a > billion, they are pretty much exactly the same. Think of it this way: > O(logn) == O(9) == O(1), if you know that n < 10**9. Well, binary searches mean binary logarithms, so O(log n) = O(30). Still, pretty low. > Secondly, the cost of Newton isn't "almost O(1)". I don't know _what_ it > is (the rule of thumb with Newton-Raphson should be that the number of > significant correct digits in the answer doubles with each iteration: I > think that probably means that it should approximate O(loglog(n)), but I > haven't thought deeply about it. Excellent intuition! The algorithm is most commonly known in the computer science literature as "interpolation search" and it does indeed take O(log log n) time for uniformly distributed data, which is a good assumption for SHA-1 hashes. Of course, for n = 10**9, log(n) is 30 and log log n is 5. More to the point, for n = 10**6, log(n) is 20 and log(log(n)) is still 5. Even losing a constant factor of 2, it still seems like it might offer a factor-of-2 speedup for large repositories.