From mboxrd@z Thu Jan 1 00:00:00 1970 From: "patrick.reynolds@github.com" Subject: [PATCH] use a hashmap to make remotes faster Date: Tue, 29 Jul 2014 03:53:42 +0000 Message-ID: <305063.319715799-sendEmail@debian> Mime-Version: 1.0 Content-Type: multipart/related; boundary="----MIME delimiter for sendEmail-397402.219547988" To: "git@vger.kernel.org" X-From: git-owner@vger.kernel.org Tue Jul 29 05:53:54 2014 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XByU6-0006RW-3q for gcvg-git-2@plane.gmane.org; Tue, 29 Jul 2014 05:53:54 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752352AbaG2Dxr (ORCPT ); Mon, 28 Jul 2014 23:53:47 -0400 Received: from mail-qa0-f42.google.com ([209.85.216.42]:56599 "EHLO mail-qa0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752223AbaG2Dxq (ORCPT ); Mon, 28 Jul 2014 23:53:46 -0400 Received: by mail-qa0-f42.google.com with SMTP id j15so8707496qaq.15 for ; Mon, 28 Jul 2014 20:53:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=github.com; s=google; h=message-id:from:to:subject:date:mime-version:content-type; bh=Lk+6BQBfeFwdUuFRtWxz9DQo3CknedHhZVeodjVips4=; b=W6s4QzjDLEidggiNJrPISgAV++TwpFONdftEGtViosAl50eXEHG5Hhundjw1XdslVK VdVgD4WIc9l6HfxV97ymDV9TpxYgJ/sPn3Op3NQBAzvIr3KGv+6eTBnqlm+29fBEMfTt 6eNGmFmcO46WxfA+tRXmPBEQ/CNtSCHMOd4JY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:from:to:subject:date:mime-version :content-type; bh=Lk+6BQBfeFwdUuFRtWxz9DQo3CknedHhZVeodjVips4=; b=hrxYIVKWvPtuCaHWTkoRkCQJVwDXtZH6HyJxf3HQwHldxnDAY2VbBYwCoMK1CA0cdp 5chYD4TMSEcDgBXPdRV/IMoCRHoFfSjqNYjPPMnp4IjXzAktjbPg13MdiJe4ZTOEhNla Gc758ZI6MyGt5latJ2LrAoARf2T1GFn/qx1cTRV0OaNQA9Q17KNCALVqtPr+5z4p23bB n2PzzlGNeOQpJcmy5ci9jaQtNQtF7rcmKhGxvaOpscPNbRjiPMlpYg4K9RGwqw6RDl6w 15Q9uxHigjpX8Nj+shUJu1aKXBd7ZLsJU4Ih7uRe2UOdeZhxicsyhCSAJvAA+tJ3f/Gl ncMg== X-Gm-Message-State: ALoCoQlF/KN6Q2anTRo9lk7X8WzxU6UIU0R7bdijyFpjHvFeZ4Zws4BFpAZhQFPC4ClslgZbv/vc X-Received: by 10.140.90.40 with SMTP id w37mr10634212qgd.52.1406606025405; Mon, 28 Jul 2014 20:53:45 -0700 (PDT) Received: from debian (ip68-1-72-35.pn.at.cox.net. [68.1.72.35]) by mx.google.com with ESMTPSA id u1sm33152678qat.27.2014.07.28.20.53.44 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 28 Jul 2014 20:53:44 -0700 (PDT) X-Mailer: sendEmail-1.56 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: This is a multi-part message in MIME format. To properly display this message you need a MIME-Version 1.0 compliant Email program. ------MIME delimiter for sendEmail-397402.219547988 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Remotes are stored as an array, so looking one up or adding one without duplication is an O(n) operation. Reading an entire config file full of remotes is O(n^2) in the number of remotes. For a repository with tens of thousands of remotes, the running time can hit multiple minutes. Hash tables are way faster. So we add a hashmap from remote name to struct remote and use it for all lookups. The time to add a new remote to a repo that already has 50,000 remotes drops from ~2 minutes to < 1 second. We retain the old array of remotes so iterators proceed in config-file order. --- remote.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++----------------- remote.h | 3 +++ 2 files changed, 49 insertions(+), 17 deletions(-) diff --git a/remote.c b/remote.c index a0701f6..88499a4 100644 --- a/remote.c +++ b/remote.c @@ -42,6 +42,7 @@ struct rewrites { static struct remote **remotes; static int remotes_alloc; static int remotes_nr; +static struct hashmap remotes_hash; static struct branch **branches; static int branches_alloc; @@ -136,26 +137,51 @@ static void add_url_alias(struct remote *remote, const char *url) add_pushurl_alias(remote, url); } +struct remotes_hash_key { + const char *str; + int len; +}; + +static int remotes_hash_cmp(const struct remote *a, const struct remote *b, const struct remotes_hash_key *key) +{ + if (key) + return strncmp(a->name, key->str, key->len) || a->name[key->len]; + else + return strcmp(a->name, b->name); +} + +static inline void init_remotes_hash() +{ + if (!remotes_hash.cmpfn) + hashmap_init(&remotes_hash, (hashmap_cmp_fn)remotes_hash_cmp, 0); +} + static struct remote *make_remote(const char *name, int len) { - struct remote *ret; - int i; + struct remote *ret, *replaced; + struct remotes_hash_key lookup; + struct hashmap_entry lookup_entry; - for (i = 0; i < remotes_nr; i++) { - if (len ? (!strncmp(name, remotes[i]->name, len) && - !remotes[i]->name[len]) : - !strcmp(name, remotes[i]->name)) - return remotes[i]; - } + if (!len) + len = strlen(name); + + init_remotes_hash(); + lookup.str = name; + lookup.len = len; + hashmap_entry_init(&lookup_entry, memhash(name, len)); + + if ((ret = hashmap_get(&remotes_hash, &lookup_entry, &lookup)) != NULL) + return ret; ret = xcalloc(1, sizeof(struct remote)); ret->prune = -1; /* unspecified */ ALLOC_GROW(remotes, remotes_nr + 1, remotes_alloc); remotes[remotes_nr++] = ret; - if (len) - ret->name = xstrndup(name, len); - else - ret->name = xstrdup(name); + ret->name = xstrndup(name, len); + + hashmap_entry_init(ret, lookup_entry.hash); + replaced = hashmap_put(&remotes_hash, ret); + assert(replaced == NULL); /* no previous entry overwritten */ return ret; } @@ -722,13 +748,16 @@ struct remote *pushremote_get(const char *name) int remote_is_configured(const char *name) { - int i; + struct remotes_hash_key lookup; + struct hashmap_entry lookup_entry; read_config(); - for (i = 0; i < remotes_nr; i++) - if (!strcmp(name, remotes[i]->name)) - return 1; - return 0; + init_remotes_hash(); + lookup.str = name; + lookup.len = strlen(name); + hashmap_entry_init(&lookup_entry, memhash(name, lookup.len)); + + return hashmap_get(&remotes_hash, &lookup_entry, &lookup) != NULL; } int for_each_remote(each_remote_fn fn, void *priv) diff --git a/remote.h b/remote.h index 917d383..81cb5ff 100644 --- a/remote.h +++ b/remote.h @@ -1,6 +1,7 @@ #ifndef REMOTE_H #define REMOTE_H +#include "hashmap.h" #include "parse-options.h" enum { @@ -10,6 +11,8 @@ enum { }; struct remote { + struct hashmap_entry ent; /* must be first */ + const char *name; int origin; -- 2.0.0.rc4 ------MIME delimiter for sendEmail-397402.219547988--