From: Dmitry Ivankov <divanorama@gmail.com>
To: git@vger.kernel.org
Cc: Jonathan Nieder <jrnieder@gmail.com>,
"Shawn O. Pearce" <spearce@spearce.org>,
David Barr <davidbarr@google.com>,
Dmitry Ivankov <divanorama@gmail.com>
Subject: [PATCH 2/2] fast-import: treat cat-blob as a delta base hint for next blob
Date: Sun, 21 Aug 2011 01:04:12 +0600 [thread overview]
Message-ID: <1313867052-11993-3-git-send-email-divanorama@gmail.com> (raw)
In-Reply-To: <1313867052-11993-1-git-send-email-divanorama@gmail.com>
Delta base for blobs is chosen as a previously saved blob. If we
treat cat-blob's blob as a delta base for the next blob, nothing
is likely to become worse.
For fast-import stream producer like svn-fe cat-blob is used like
following:
- svn-fe reads file delta in svn format
- to apply it, svn-fe asks cat-blob 'svn delta base'
- applies 'svn delta' to the response
- produces a blob command to store the result
Currently there is no way for svn-fe to give fast-import a hint on
object delta base. While what's requested in cat-blob is most of
the time a best delta base possible. Of course, it could be not a
good delta base, but we don't know any better one anyway.
So do treat cat-blob's result as a delta base for next blob. The
profit is nice: 2x to 7x reduction in pack size AND 1.2x to 3x
time speedup due to diff_delta being faster on good deltas. git gc
--aggressive can compress it even more, by 10% to 70%, utilizing
more cpu time, real time and 3 cpu cores.
Tested on 213M and 2.7G fast-import streams, resulting packs are 22M
and 113M, import time is 7s and 60s, both streams are produced by
svn-fe, sniffed and then used as raw input for fast-import.
For git-fast-export produced streams there is no change as it doesn't
use cat-blob and doesn't try to reorder blobs in some smart way to
make successive deltas small.
Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
fast-import.c | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)
diff --git a/fast-import.c b/fast-import.c
index 2b069e3..0480fbf 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2802,7 +2802,12 @@ static void cat_blob(struct object_entry *oe, unsigned char sha1[20])
strbuf_release(&line);
cat_blob_write(buf, size);
cat_blob_write("\n", 1);
- free(buf);
+ if (oe && oe->pack_id == pack_id) {
+ last_blob.offset = oe->idx.offset;
+ strbuf_attach(&last_blob.data, buf, size, size);
+ last_blob.depth = oe->depth;
+ } else
+ free(buf);
}
static void parse_cat_blob(void)
--
1.7.3.4
next prev parent reply other threads:[~2011-08-20 19:02 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-20 19:04 [PATCH 0/2] fast-import: improve deltas for blobs Dmitry Ivankov
2011-08-20 19:04 ` [PATCH 1/2] fast-import: count and report # of calls to diff_delta in stats Dmitry Ivankov
2011-08-20 19:04 ` Dmitry Ivankov [this message]
2011-08-20 19:17 ` [PATCH 2/2] fast-import: treat cat-blob as a delta base hint for next blob Jonathan Nieder
2011-08-21 11:01 ` David Michael Barr
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1313867052-11993-3-git-send-email-divanorama@gmail.com \
--to=divanorama@gmail.com \
--cc=davidbarr@google.com \
--cc=git@vger.kernel.org \
--cc=jrnieder@gmail.com \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).