From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff King Subject: Re: Git is not scalable with too many refs/* Date: Fri, 10 Jun 2011 16:35:45 -0400 Message-ID: <20110610203545.GA2564@sigill.intra.peff.net> References: <4DF0EC32.40001@gmail.com> <4DF1CAC1.7060705@op5.se> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: Andreas Ericsson , A Large Angry SCM , Sverre Rabbelier , NAKAMURA Takumi , git To: Shawn Pearce X-From: git-owner@vger.kernel.org Fri Jun 10 22:36:18 2011 Return-path: Envelope-to: gcvg-git-2@lo.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1QV8RG-0005Ps-0p for gcvg-git-2@lo.gmane.org; Fri, 10 Jun 2011 22:36:18 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757835Ab1FJUfu (ORCPT ); Fri, 10 Jun 2011 16:35:50 -0400 Received: from 99-108-226-0.lightspeed.iplsin.sbcglobal.net ([99.108.226.0]:46887 "EHLO peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754736Ab1FJUfu (ORCPT ); Fri, 10 Jun 2011 16:35:50 -0400 Received: (qmail 28765 invoked by uid 107); 10 Jun 2011 20:35:57 -0000 Received: from m180436d0.tmodns.net (HELO sigill.intra.peff.net) (208.54.4.24) (smtp-auth username relayok, mechanism cram-md5) by peff.net (qpsmtpd/0.84) with ESMTPA; Fri, 10 Jun 2011 16:35:57 -0400 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Fri, 10 Jun 2011 16:35:45 -0400 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Fri, Jun 10, 2011 at 12:41:39PM -0700, Shawn O. Pearce wrote: > Not really. > > The queue isn't sorting by SHA-1. Its sorting by commit timestamp, > descending. Those aren't pre-hashed. The O(N^2) insertion is because > the code is trying to find where this commit belongs in the list of > commits as sorted by commit timestamp. > > There are some priority queue datastructures designed for this sort of > work, e.g. a calendar queue might help. But its not as simple as a 1 > byte trie. All you really need is a heap-based priority queue, which gives O(lg n) insertion and popping (and O(1) peeking at the top). I even wrote one and posted it recently (I won't dig up the reference, but I posted it elsewhere in this thread, I think). The problem is that many parts of the code assume that commit_list is a linked list and do fast iterations, or even splicing. It's nothing you couldn't get around with some work, but it turns out to involve a lot of code changes. -Peff