All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Thomas Rast <tr@thomasrast.ch>
Subject: XDL_FAST_HASH can be very slow
Date: Sun, 21 Dec 2014 23:19:45 -0500	[thread overview]
Message-ID: <20141222041944.GA441@peff.net> (raw)

I ran across an interesting case that diffs very slowly with modern git.
And it's even public. You can clone:

  git://github.com/outpunk/evil-icons

and try:

  git show fc4efe426d5b4e6aa8d5a4dc14babeada7c5f899

(which is also the tip of master as of this writing).

The interesting file there is a 10MB Illustrator file, "assets/ei.ai".
Git treats it as text, as the early part doesn't have any NULs, but it
is mostly non-human-readable. It has a large number of lines, and some
of the lines themselves are quite large.

On my machine, "git show" takes ~77 seconds using v2.2.1. But if I build
the same version with "make XDL_FAST_HASH=", it completes in about 0.4s.
Both produce the same output.

I'm not really sure what's going on.  A few points of interest:

 - You can replicate this with the very first commit that added
   XDL_FAST_HASH, 6942efc (xdiff: load full words in the inner loop of
   xdl_hash_record, 2012-04-06). So it was always bad on this case, and
   it's not part of any more recent changes.

 - We actually _don't_ spend most of our time in xdl_hash_record, the
   function modified by 6942efc. Instead, it all goes to
   xdl_classify_record, which is looping over the set of hash records.
   It's not clear to me if more or different hash records is part of the
   design of XDL_FAST_HASH, or if this is actually a bug.

I haven't dug much further than that.

-Peff

             reply	other threads:[~2014-12-22  4:20 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-22  4:19 Jeff King [this message]
2014-12-22  9:08 ` XDL_FAST_HASH can be very slow Patrick Reynolds
2014-12-22 10:48   ` Thomas Rast
2014-12-23  2:51     ` demerphq

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141222041944.GA441@peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=tr@thomasrast.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.