From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Thomas Rast <tr@thomasrast.ch>
Subject: XDL_FAST_HASH can be very slow
Date: Sun, 21 Dec 2014 23:19:45 -0500 [thread overview]
Message-ID: <20141222041944.GA441@peff.net> (raw)
I ran across an interesting case that diffs very slowly with modern git.
And it's even public. You can clone:
git://github.com/outpunk/evil-icons
and try:
git show fc4efe426d5b4e6aa8d5a4dc14babeada7c5f899
(which is also the tip of master as of this writing).
The interesting file there is a 10MB Illustrator file, "assets/ei.ai".
Git treats it as text, as the early part doesn't have any NULs, but it
is mostly non-human-readable. It has a large number of lines, and some
of the lines themselves are quite large.
On my machine, "git show" takes ~77 seconds using v2.2.1. But if I build
the same version with "make XDL_FAST_HASH=", it completes in about 0.4s.
Both produce the same output.
I'm not really sure what's going on. A few points of interest:
- You can replicate this with the very first commit that added
XDL_FAST_HASH, 6942efc (xdiff: load full words in the inner loop of
xdl_hash_record, 2012-04-06). So it was always bad on this case, and
it's not part of any more recent changes.
- We actually _don't_ spend most of our time in xdl_hash_record, the
function modified by 6942efc. Instead, it all goes to
xdl_classify_record, which is looping over the set of hash records.
It's not clear to me if more or different hash records is part of the
design of XDL_FAST_HASH, or if this is actually a bug.
I haven't dug much further than that.
-Peff
next reply other threads:[~2014-12-22 4:20 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-22 4:19 Jeff King [this message]
2014-12-22 9:08 ` XDL_FAST_HASH can be very slow Patrick Reynolds
2014-12-22 10:48 ` Thomas Rast
2014-12-23 2:51 ` demerphq
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141222041944.GA441@peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=tr@thomasrast.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.