git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: diffcore-rename performance mode
Date: Tue, 18 Sep 2007 01:49:50 -0700	[thread overview]
Message-ID: <7vsl5cwe6p.fsf@gitster.siamese.dyndns.org> (raw)
In-Reply-To: <20070918082321.GA9883@coredump.intra.peff.net> (Jeff King's message of "Tue, 18 Sep 2007 04:23:22 -0400")

Jeff King <peff@peff.net> writes:

> There seems to be a serious performance problem in diffcore-rename.
> There is infrastructure to cache the "cnt_data" member of each filespec,
> but it never gets used because we immediately free the filespec data
> after use. Oops.
>
> With this patch:
> ...
> My 20-minute diff becomes a 2-minute diff. The downside is that the
> memory usage is much increased (for obvious reasons, it should increase
> by the dataset size, since we are keeping pointers to the data around --
> in my case, around 1G extra).

Yes, these early freeing of filespec_data() were introducd later
specifically to address the memory usage issue.

> However, keeping around _just_ the
> cnt_data caused only about 100M of extra memory consumption (and gave
> the same performance boost).

That would be an interesting and relatively low-hanging optimization.

> The spanhash data structure is a bit confusing. At first, it looked like
> we were doing a linear search for a matching hash, but it's not quite,
> since we seem to start at some magic spot based on the hashval we're
> looking up.

I think it was just a hash table with linear overflow (if your
spot is occupied by somebody else, you look for the next
available vacant spot -- works only if you do not ever delete
items from the table) but sorry, I do not recall the rationale
for picking that data structure.  I vaguely recall I did some
measurement between that and the usual "an array that is indexed
with a hash value that holds heads of linked lists" and pointer
chasing appeared quite cache-unfriendly to the point that it
actually degraded performance, but did not try very hard to
optimize it.

  reply	other threads:[~2007-09-18  8:50 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-18  8:23 diffcore-rename performance mode Jeff King
2007-09-18  8:49 ` Junio C Hamano [this message]
2007-09-18  8:54   ` Jeff King
2007-09-18  8:58     ` Junio C Hamano
2007-09-18  9:01       ` Jeff King
2007-09-18  9:17         ` Junio C Hamano
2007-09-18 11:20     ` Johannes Schindelin
2007-09-25 16:38     ` Jeff King
2007-09-25 19:06       ` Jeff King
2007-09-25 19:10         ` Andreas Ericsson
2007-09-25 19:32         ` David Kastrup
2007-09-25 19:52           ` Jeff King
2007-09-18 22:12 ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vsl5cwe6p.fsf@gitster.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).