git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Alexandre Julliard <julliard@winehq.org>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] Optimization for git-rev-list --objects
Date: Sat, 11 Feb 2006 18:19:09 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0602111803350.3691@g5.osdl.org> (raw)
In-Reply-To: <87slqpg11q.fsf@wine.dyndns.org>



On Sat, 11 Feb 2006, Alexandre Julliard wrote:
> 
> When building a large list of objects, most of the time is spent in
> created_object() inserting the objects in the sorted list. This patch
> splits the list in 256 sublists to reduce the impact of the O(n^2)
> list insertion.
> 
> On the WineHQ repository (220,000 objects), the patch reduces the time
> needed for a 'git-rev-list --objects HEAD' from 2 minutes to 20 seconds.

Goodie. However, I wonder if we care about the sorting at _all_.

The sorting really only helps "find_object()", and the thing is, we could 
certainly do that other ways too. Keeping a sorted list is kind of 
senseless, when the data is so amenable to be kept in a tree.

We could add left/right pointers to each object, and just keep them in a 
tree. We probably don't even need any fancy balancing, since hashing means 
that we could just keep the first object we ever allocate as the root, and 
things should normally be pretty damn balanced.

Then we could do:

	static struct object *root = NULL;

	static struct object **lookup_object_position(unsigned char *sha1)
	{
		struct object **p = &root;

		for (;;) {
			struct object *object = *p;
			int sign;

			if (!object)
				break;
			sign = memcmp(sha1, object->sha1, 20)
			if (!sign)
				break;
			p = &object->left;
			if (sign < 0)
				continue;
			p = &object->right;
		}
		return p;
	}

and now we _trivially_ have:

	struct object *lookup_object(unsigned char *sha1)
	{
		return *lookup_object_position(unsigned char *sha1);
	}

	void created_object(struct object **pos, unsigned char *sha1, struct object *n)
	{
		memcpy(n->sha1, sha1, 20);
		*pos = n;
	}

and the "lookup_or_create()" functions would become

	struct tag *lookup_tag(unsigned char *sha1)
	{
		struct object **p = lookup_object_position(sha1);
		struct object *object;

		object = *p;
		if (!object) {
			struct tag *ret = xmalloc(sizeof(struct tag));
			memset(ret, 0, sizeof(struct tag));
			created_object(p, sha1, &ret->object);
			ret->object.type = tag_type;
			return ret;
		}
		if (!object->type)
			obj->type = tag_type;

		if (object->type != tag_type) {
			error("Object %s is a %s, not a tag",
				sha1_to_hex(sha1), obj->type);
			return NULL;
		}
		return (struct tag *)obj;
	}

etc etc.

Hmm? That would be a lot faster than what we have now, I suspect, and none 
of the ops are expensive.

Anybody want to try this approach?

		Linus

  parent reply	other threads:[~2006-02-12  2:19 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-11 19:14 [PATCH] Optimization for git-rev-list --objects Alexandre Julliard
2006-02-12  1:57 ` [PATCH] Use a hashtable for objects instead of a sorted list Johannes Schindelin
2006-02-12  2:46   ` Junio C Hamano
2006-02-12  8:52     ` Alexandre Julliard
2006-02-12 12:11       ` Junio C Hamano
2006-02-12 14:31         ` Johannes Schindelin
2006-02-12 11:19     ` Florian Weimer
2006-02-12 12:08       ` ***DONTUSE*** " Junio C Hamano
2006-02-12 13:46         ` Junio C Hamano
2006-02-12 17:26           ` Johannes Schindelin
2006-02-12 18:34             ` Junio C Hamano
2006-02-12  2:19 ` Linus Torvalds [this message]
2006-02-12  2:39   ` [PATCH] Optimization for git-rev-list --objects Junio C Hamano
2006-02-12  4:11     ` [PATCH] binary-tree-based objects Junio C Hamano
2006-02-12  5:06       ` Linus Torvalds
2006-02-12  5:22         ` Linus Torvalds
2006-02-12  5:23           ` Linus Torvalds
2006-02-12  5:48             ` Junio C Hamano
2006-02-12  6:07               ` Junio C Hamano
2006-02-12  6:53                 ` Linus Torvalds
2006-02-12  7:05           ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0602111803350.3691@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=git@vger.kernel.org \
    --cc=julliard@winehq.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).