From: Ingo Molnar <mingo@elte.hu>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, "Shawn O. Pearce" <spearce@spearce.org>
Subject: Re: [PATCH] lookup_object(): Speed up 'git gc' by 12%, by reducing hash chain length
Date: Fri, 29 Apr 2011 09:26:04 +0200 [thread overview]
Message-ID: <20110429072604.GA16371@elte.hu> (raw)
In-Reply-To: <7vtydh1r3q.fsf@alter.siamese.dyndns.org>
* Junio C Hamano <gitster@pobox.com> wrote:
> Ingo Molnar <mingo@elte.hu> writes:
>
> > diff --git a/object.c b/object.c
> > index 7e1f2bb..b3fe485 100644
> > --- a/object.c
> > +++ b/object.c
> > @@ -91,7 +91,7 @@ struct object *lookup_object(const unsigned char *sha1)
> > static void grow_object_hash(void)
> > {
> > int i;
> > - int new_hash_size = obj_hash_size < 32 ? 32 : 2 * obj_hash_size;
> > + int new_hash_size = obj_hash_size < 32 ? 32 : 16 * obj_hash_size;
> > struct object **new_hash;
> >
> > new_hash = xcalloc(new_hash_size, sizeof(struct object *));
> > @@ -116,7 +116,7 @@ void *create_object(const unsigned char *sha1, int type, void *o)
> > obj->flags = 0;
> > hashcpy(obj->sha1, sha1);
> >
> > - if (obj_hash_size - 1 <= nr_objs * 2)
> > + if (obj_hash_size - 1 <= nr_objs * 16)
> > grow_object_hash();
> >
> > insert_obj_hash(obj, obj_hash, obj_hash_size);
>
> Shawn was telling me about this exact topic a few months ago, and I do
> agree that object hash grows too slowly when you need to slurp in many
> objects.
I think the main effect might not be the rate of growth and reduced overhead of
reallocating and reconstructing the hash 4-6 times, but the *spread* of objects
within the hash table - i.e. the maximum (i.e. optimal) size of the hash.
In a git gc run the hash grows to the max very quickly, then 99% of execution
time is spent with that optimally sized hash - so growth rate per se does not
matter much. (it might matter in other usecases)
Find below a debug patch i use to run with a configurable spread.
Note, i just ran the patch on a different system and there the effect was much
less pronounced. So i'd prefer independent confirmation as well that it speeds
up things for others as well.
I'll run more numbers - maybe we are just very sensitive to the exact layout of
the object hash and a 16x spread created a different, more optimal layout.
Thanks,
Ingo
---
git.c | 6 ++++++
object.c | 5 +++--
object.h | 1 +
3 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/git.c b/git.c
index 4b7dbfa..4c59316 100644
--- a/git.c
+++ b/git.c
@@ -4,6 +4,7 @@
#include "help.h"
#include "quote.h"
#include "run-command.h"
+#include "object.h"
const char git_usage_string[] =
"git [--version] [--exec-path[=<path>]] [--html-path]\n"
@@ -97,6 +98,11 @@ static int handle_options(const char ***argv, int *argc, int *envchanged)
exit(0);
} else if (!strcmp(cmd, "-p") || !strcmp(cmd, "--paginate")) {
use_pager = 1;
+ } else if (!strcmp(cmd, "--object-hash-spread")) {
+ object_hash_spread = atol((*argv)[1]);
+ printf("object hash spread: %d\n", object_hash_spread);
+ (*argv)++;
+ (*argc)--;
} else if (!strcmp(cmd, "--no-pager")) {
use_pager = 0;
if (envchanged)
diff --git a/object.c b/object.c
index 7e1f2bb..3d16a8a 100644
--- a/object.c
+++ b/object.c
@@ -7,6 +7,7 @@
static struct object **obj_hash;
static int nr_objs, obj_hash_size;
+int object_hash_spread = 2;
unsigned int get_max_object_index(void)
{
@@ -91,7 +92,7 @@ struct object *lookup_object(const unsigned char *sha1)
static void grow_object_hash(void)
{
int i;
- int new_hash_size = obj_hash_size < 32 ? 32 : 2 * obj_hash_size;
+ int new_hash_size = obj_hash_size < 32 ? 32 : object_hash_spread * obj_hash_size;
struct object **new_hash;
new_hash = xcalloc(new_hash_size, sizeof(struct object *));
@@ -116,7 +117,7 @@ void *create_object(const unsigned char *sha1, int type, void *o)
obj->flags = 0;
hashcpy(obj->sha1, sha1);
- if (obj_hash_size - 1 <= nr_objs * 2)
+ if (obj_hash_size - 1 <= nr_objs * object_hash_spread)
grow_object_hash();
insert_obj_hash(obj, obj_hash, obj_hash_size);
diff --git a/object.h b/object.h
index b6618d9..180a6c1 100644
--- a/object.h
+++ b/object.h
@@ -75,5 +75,6 @@ int object_list_contains(struct object_list *list, struct object *obj);
void add_object_array(struct object *obj, const char *name, struct object_array *array);
void add_object_array_with_mode(struct object *obj, const char *name, struct object_array *array, unsigned mode);
void object_array_remove_duplicates(struct object_array *);
+extern int object_hash_spread;
#endif /* OBJECT_H */
next prev parent reply other threads:[~2011-04-29 7:26 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-27 21:35 [PATCH] lookup_object(): Speed up 'git gc' by 12%, by reducing hash chain length Ingo Molnar
2011-04-29 6:22 ` Ingo Molnar
2011-04-29 6:58 ` Junio C Hamano
2011-04-29 7:26 ` Ingo Molnar [this message]
2011-04-29 7:38 ` Ingo Molnar
2011-04-29 7:46 ` Sverre Rabbelier
2011-04-29 9:50 ` Ingo Molnar
2011-04-29 16:57 ` Shawn Pearce
2011-05-01 13:21 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110429072604.GA16371@elte.hu \
--to=mingo@elte.hu \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).