git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Shawn O. Pearce" <spearce@spearce.org>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 3/4] Use binary searching on large buckets in git-describe.
Date: Sun, 14 Jan 2007 18:07:21 -0500	[thread overview]
Message-ID: <20070114230721.GD10888@spearce.org> (raw)
In-Reply-To: <7v3b6dh19y.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> wrote:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
> 
> > If a project has a really huge number of tags (such as several
> > thousand tags) then we are likely to have nearly a hundred tags in
> > some buckets.  Scanning those buckets as linked lists could take
> > a large amount of time if done repeatedly during history traversal.
> 
> I think this is a good idea -- but would you still need the 256
> buckets if you do this?

Well, yes and no.  The binary search would make it O(log N) rather
than O(N), which is clearly an improvement.  But we're hashing by
the commit SHA1 and since SHA1 does such a good job of distributing
values around we get pretty even distribution among the buckets.
This gets us a 1/256 reduction within each bucket, at the cost of
just one array index operation.

Since the buckets are so small (1 pointer) and the match function
is in the critical path of describe it would seem reasonable to
take the small memory hit in return for better behavior on very
large sets of tags.  It certainly doesn't hurt on smaller tag sets
(like git.git), we wind up with 1 tag per bucket and our lookups
are constant time.

-- 
Shawn.

  reply	other threads:[~2007-01-15 17:21 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <8e29bab8b4b9f53cbdc85e6e783bf3b5d3787f0f.1168727248.git.spearce@spearce.org>
2007-01-13 22:28 ` [PATCH 2/4] Hash tags by commit SHA1 in git-describe Shawn O. Pearce
2007-01-14 13:01   ` Johannes Schindelin
2007-01-14 18:51     ` Junio C Hamano
2007-01-14 23:19       ` Shawn O. Pearce
2007-01-13 22:29 ` [PATCH 3/4] Use binary searching on large buckets " Shawn O. Pearce
2007-01-14 19:11   ` Junio C Hamano
2007-01-14 23:07     ` Shawn O. Pearce [this message]
2007-01-13 22:30 ` [PATCH 4/4] Improve git-describe performance by reducing revision listing Shawn O. Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070114230721.GD10888@spearce.org \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).