git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Felipe Contreras <felipe.contreras@gmail.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Antoine Pelisse <apelisse@gmail.com>,
	Johannes Schindelin <johannes.schindelin@gmx.de>
Subject: Re: [PATCH v2 2/3] fast-export: improve speed by skipping blobs
Date: Mon, 6 May 2013 14:02:13 -0500	[thread overview]
Message-ID: <CAMP44s2rvu2PaazHEx-OXaZbuBAX7sZb04Oh38ifyAEG6kcxkw@mail.gmail.com> (raw)
In-Reply-To: <20130506123111.GB3809@sigill.intra.peff.net>

On Mon, May 6, 2013 at 7:31 AM, Jeff King <peff@peff.net> wrote:
> On Sun, May 05, 2013 at 05:38:53PM -0500, Felipe Contreras wrote:
>
>> We don't care about blobs, or any object other than commits, but in
>> order to find the type of object, we are parsing the whole thing, which
>> is slow, specially in big repositories with lots of big files.
>
> I did a double-take on reading this subject line and first paragraph,
> thinking "surely fast-export needs to actually output blobs?".

If you think that, then you are not familiar with the code.

--export-marks=<file>::
	Dumps the internal marks table to <file> when complete.
	Marks are written one per line as `:markid SHA-1`. Only marks
	for revisions are dumped; marks for blobs are ignored.

		if (deco->base && deco->base->type == 1) {
			mark = ptr_to_mark(deco->decoration);
			if (fprintf(f, ":%"PRIu32" %s\n", mark,
				sha1_to_hex(deco->base->sha1)) < 0) {
			    e = 1;
			    break;
			}
		}

> Reading the patch, I see that this is only about not bothering to load
> blob marks from --import-marks. It might be nice to mention that in the
> commit message, which is otherwise quite confusing.

The commit message says it exactly like it is: we don't care about blobs.

If an object is not a commit, we *already* skip it. But as the commit
message already says, we do so by parsing the whole thing.

> I'm also not sure why your claim "we don't care about blobs" is true,
> because naively we would want future runs of fast-export to avoid having
> to write out the whole blob content when mentioning the blob again.

Because it's pointless to have hundreds and thousands of blob marks
that are *never* going to be used, only for an extremely tiny minority
that would.

> Does that match your reasoning?

It doesn't matter, it has been that way since --export-marks was introduced.

>> Before this, loading the objects of a fresh emacs import, with 260598
>> blobs took 14 minutes, after this patch, it takes 3 seconds.
>
> Presumably most of that speed improvement comes from not parsing the
> blob objects. I wonder if you could get similar speedups by applying the
> "do not bother parsing" rule from your patch 3. You would still incur
> some cost to create a "struct blob", but it may or may not be
> measurable.  That would mean we get the "case not worth worrying about"
> from above for free. I doubt it would make that big a difference,
> though, given the rarity of it. So I am OK with it either way.

How would I know if it's a blob or a commit, if not by the code this
patch introduces?

-- 
Felipe Contreras

  parent reply	other threads:[~2013-05-06 19:02 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-05 22:38 [PATCH v2 0/3] fast-export: speed improvements Felipe Contreras
2013-05-05 22:38 ` [PATCH v2 1/3] fast-{import,export}: use get_sha1_hex() directly Felipe Contreras
2013-05-07 14:38   ` Junio C Hamano
2013-05-07 22:13     ` Felipe Contreras
2013-05-07 23:19       ` Junio C Hamano
2013-05-05 22:38 ` [PATCH v2 2/3] fast-export: improve speed by skipping blobs Felipe Contreras
2013-05-06 12:31   ` Jeff King
2013-05-06 15:08     ` Junio C Hamano
2013-05-06 16:17       ` Junio C Hamano
2013-05-06 16:20       ` Jeff King
2013-05-06 16:32         ` Junio C Hamano
2013-05-06 16:40           ` Jeff King
2013-05-06 17:17             ` Junio C Hamano
2013-05-06 17:19               ` Jeff King
2013-05-06 17:41                 ` Jeff King
2013-05-06 19:12         ` Felipe Contreras
2013-05-06 19:09       ` Felipe Contreras
2013-05-06 20:58         ` Junio C Hamano
2013-05-06 21:30           ` Felipe Contreras
2013-05-07  1:59             ` Junio C Hamano
2013-05-07  3:49               ` Felipe Contreras
2013-05-06 19:02     ` Felipe Contreras [this message]
2013-05-06 19:11       ` Jeff King
2013-05-06 19:15         ` Felipe Contreras
2013-05-05 22:38 ` [PATCH v2 3/3] fast-export: don't parse all the commits Felipe Contreras

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMP44s2rvu2PaazHEx-OXaZbuBAX7sZb04Oh38ifyAEG6kcxkw@mail.gmail.com \
    --to=felipe.contreras@gmail.com \
    --cc=apelisse@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).