From: Felipe Contreras <felipe.contreras@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Jeff King <peff@peff.net>,
git@vger.kernel.org, Antoine Pelisse <apelisse@gmail.com>,
Johannes Schindelin <johannes.schindelin@gmx.de>
Subject: Re: [PATCH v2 2/3] fast-export: improve speed by skipping blobs
Date: Mon, 6 May 2013 14:09:56 -0500 [thread overview]
Message-ID: <CAMP44s2rdkND40QDQA9T7MNGoKPtnr50nV98aExUe4bCOXZGyA@mail.gmail.com> (raw)
In-Reply-To: <7v7gjctabm.fsf@alter.siamese.dyndns.org>
On Mon, May 6, 2013 at 10:08 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Jeff King <peff@peff.net> writes:
>
>> On Sun, May 05, 2013 at 05:38:53PM -0500, Felipe Contreras wrote:
>>
>>> We don't care about blobs, or any object other than commits, but in
>>> order to find the type of object, we are parsing the whole thing, which
>>> is slow, specially in big repositories with lots of big files.
>>
>> I did a double-take on reading this subject line and first paragraph,
>> thinking "surely fast-export needs to actually output blobs?".
>>
>> Reading the patch, I see that this is only about not bothering to load
>> blob marks from --import-marks. It might be nice to mention that in the
>> commit message, which is otherwise quite confusing.
>
> I had the same reaction first, but not writing the blob _objects_
> out to the output stream would not make any sense, so it was fairly
> easy to guess what the author wanted to say ;-).
That's how fast-export has worked since --export-marks was introduced.
>> I'm also not sure why your claim "we don't care about blobs" is true,
>> because naively we would want future runs of fast-export to avoid having
>> to write out the whole blob content when mentioning the blob again.
>
> The existing documentation is fairly clear that marks for objects
> other than commits are not exported, and the import-marks codepath
> discards anything but commits, so there is no mechanism for the
> existing fast-export users to leave blob marks in the marks file for
> later runs of fast-export to take advantage of. The second
> invocation cannot refer to such a blob in the first place.
>
> The story is different on the fast-import side, where we do say we
> dump the full table and a later run can depend on these marks.
Yes, and gaining nothing but increased disk-space.
> By discarding marks on blobs, we may be robbing some optimization
> possibilities, and by discarding marks on tags, we may be robbing
> some features, from users of fast-export; we might want to add an
> option "--use-object-marks={blob,commit,tag}" or something to both
> fast-export and fast-import, so that the former can optionally write
> marks for non-commits out, and the latter can omit non commit marks
> if the user do not need them. But that is a separate issue.
How? The only way we might rob optimizations is if there's an obscene
amount files, otherwise the number of blob marks that we are
*actually* going to use ever again is extremely tiny.
--
Felipe Contreras
next prev parent reply other threads:[~2013-05-06 19:10 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-05 22:38 [PATCH v2 0/3] fast-export: speed improvements Felipe Contreras
2013-05-05 22:38 ` [PATCH v2 1/3] fast-{import,export}: use get_sha1_hex() directly Felipe Contreras
2013-05-07 14:38 ` Junio C Hamano
2013-05-07 22:13 ` Felipe Contreras
2013-05-07 23:19 ` Junio C Hamano
2013-05-05 22:38 ` [PATCH v2 2/3] fast-export: improve speed by skipping blobs Felipe Contreras
2013-05-06 12:31 ` Jeff King
2013-05-06 15:08 ` Junio C Hamano
2013-05-06 16:17 ` Junio C Hamano
2013-05-06 16:20 ` Jeff King
2013-05-06 16:32 ` Junio C Hamano
2013-05-06 16:40 ` Jeff King
2013-05-06 17:17 ` Junio C Hamano
2013-05-06 17:19 ` Jeff King
2013-05-06 17:41 ` Jeff King
2013-05-06 19:12 ` Felipe Contreras
2013-05-06 19:09 ` Felipe Contreras [this message]
2013-05-06 20:58 ` Junio C Hamano
2013-05-06 21:30 ` Felipe Contreras
2013-05-07 1:59 ` Junio C Hamano
2013-05-07 3:49 ` Felipe Contreras
2013-05-06 19:02 ` Felipe Contreras
2013-05-06 19:11 ` Jeff King
2013-05-06 19:15 ` Felipe Contreras
2013-05-05 22:38 ` [PATCH v2 3/3] fast-export: don't parse all the commits Felipe Contreras
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAMP44s2rdkND40QDQA9T7MNGoKPtnr50nV98aExUe4bCOXZGyA@mail.gmail.com \
--to=felipe.contreras@gmail.com \
--cc=apelisse@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=johannes.schindelin@gmx.de \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).