git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Felipe Contreras <felipe.contreras@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Antoine Pelisse <apelisse@gmail.com>,
	Johannes Schindelin <johannes.schindelin@gmx.de>
Subject: Re: [PATCH v2 2/3] fast-export: improve speed by skipping blobs
Date: Mon, 6 May 2013 08:31:11 -0400	[thread overview]
Message-ID: <20130506123111.GB3809@sigill.intra.peff.net> (raw)
In-Reply-To: <1367793534-8401-3-git-send-email-felipe.contreras@gmail.com>

On Sun, May 05, 2013 at 05:38:53PM -0500, Felipe Contreras wrote:

> We don't care about blobs, or any object other than commits, but in
> order to find the type of object, we are parsing the whole thing, which
> is slow, specially in big repositories with lots of big files.

I did a double-take on reading this subject line and first paragraph,
thinking "surely fast-export needs to actually output blobs?".

Reading the patch, I see that this is only about not bothering to load
blob marks from --import-marks. It might be nice to mention that in the
commit message, which is otherwise quite confusing.

I'm also not sure why your claim "we don't care about blobs" is true,
because naively we would want future runs of fast-export to avoid having
to write out the whole blob content when mentioning the blob again. I
think one argument could be "if we write a mark for blob X, we will also
have written a mark for commit Y which contains it; on subsequent runs,
we will just show the mark for Y in the first place, and not even care
about showing X (as a part of Y) either way. We would only refer to the
mark for X if it appears as part of a different commit, but that is a
rare case not worth worrying about."

Does that match your reasoning?

> Before this, loading the objects of a fresh emacs import, with 260598
> blobs took 14 minutes, after this patch, it takes 3 seconds.

Presumably most of that speed improvement comes from not parsing the
blob objects. I wonder if you could get similar speedups by applying the
"do not bother parsing" rule from your patch 3. You would still incur
some cost to create a "struct blob", but it may or may not be
measurable.  That would mean we get the "case not worth worrying about"
from above for free. I doubt it would make that big a difference,
though, given the rarity of it. So I am OK with it either way.

-Peff

  reply	other threads:[~2013-05-06 12:31 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-05 22:38 [PATCH v2 0/3] fast-export: speed improvements Felipe Contreras
2013-05-05 22:38 ` [PATCH v2 1/3] fast-{import,export}: use get_sha1_hex() directly Felipe Contreras
2013-05-07 14:38   ` Junio C Hamano
2013-05-07 22:13     ` Felipe Contreras
2013-05-07 23:19       ` Junio C Hamano
2013-05-05 22:38 ` [PATCH v2 2/3] fast-export: improve speed by skipping blobs Felipe Contreras
2013-05-06 12:31   ` Jeff King [this message]
2013-05-06 15:08     ` Junio C Hamano
2013-05-06 16:17       ` Junio C Hamano
2013-05-06 16:20       ` Jeff King
2013-05-06 16:32         ` Junio C Hamano
2013-05-06 16:40           ` Jeff King
2013-05-06 17:17             ` Junio C Hamano
2013-05-06 17:19               ` Jeff King
2013-05-06 17:41                 ` Jeff King
2013-05-06 19:12         ` Felipe Contreras
2013-05-06 19:09       ` Felipe Contreras
2013-05-06 20:58         ` Junio C Hamano
2013-05-06 21:30           ` Felipe Contreras
2013-05-07  1:59             ` Junio C Hamano
2013-05-07  3:49               ` Felipe Contreras
2013-05-06 19:02     ` Felipe Contreras
2013-05-06 19:11       ` Jeff King
2013-05-06 19:15         ` Felipe Contreras
2013-05-05 22:38 ` [PATCH v2 3/3] fast-export: don't parse all the commits Felipe Contreras

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130506123111.GB3809@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=apelisse@gmail.com \
    --cc=felipe.contreras@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).