From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: diffcore-rename performance mode
Date: Tue, 18 Sep 2007 01:49:50 -0700 [thread overview]
Message-ID: <7vsl5cwe6p.fsf@gitster.siamese.dyndns.org> (raw)
In-Reply-To: <20070918082321.GA9883@coredump.intra.peff.net> (Jeff King's message of "Tue, 18 Sep 2007 04:23:22 -0400")
Jeff King <peff@peff.net> writes:
> There seems to be a serious performance problem in diffcore-rename.
> There is infrastructure to cache the "cnt_data" member of each filespec,
> but it never gets used because we immediately free the filespec data
> after use. Oops.
>
> With this patch:
> ...
> My 20-minute diff becomes a 2-minute diff. The downside is that the
> memory usage is much increased (for obvious reasons, it should increase
> by the dataset size, since we are keeping pointers to the data around --
> in my case, around 1G extra).
Yes, these early freeing of filespec_data() were introducd later
specifically to address the memory usage issue.
> However, keeping around _just_ the
> cnt_data caused only about 100M of extra memory consumption (and gave
> the same performance boost).
That would be an interesting and relatively low-hanging optimization.
> The spanhash data structure is a bit confusing. At first, it looked like
> we were doing a linear search for a matching hash, but it's not quite,
> since we seem to start at some magic spot based on the hashval we're
> looking up.
I think it was just a hash table with linear overflow (if your
spot is occupied by somebody else, you look for the next
available vacant spot -- works only if you do not ever delete
items from the table) but sorry, I do not recall the rationale
for picking that data structure. I vaguely recall I did some
measurement between that and the usual "an array that is indexed
with a hash value that holds heads of linked lists" and pointer
chasing appeared quite cache-unfriendly to the point that it
actually degraded performance, but did not try very hard to
optimize it.
next prev parent reply other threads:[~2007-09-18 8:50 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-09-18 8:23 diffcore-rename performance mode Jeff King
2007-09-18 8:49 ` Junio C Hamano [this message]
2007-09-18 8:54 ` Jeff King
2007-09-18 8:58 ` Junio C Hamano
2007-09-18 9:01 ` Jeff King
2007-09-18 9:17 ` Junio C Hamano
2007-09-18 11:20 ` Johannes Schindelin
2007-09-25 16:38 ` Jeff King
2007-09-25 19:06 ` Jeff King
2007-09-25 19:10 ` Andreas Ericsson
2007-09-25 19:32 ` David Kastrup
2007-09-25 19:52 ` Jeff King
2007-09-18 22:12 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7vsl5cwe6p.fsf@gitster.siamese.dyndns.org \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.