From: Dmitry Ivankov <divanorama@gmail.com>
To: git@vger.kernel.org
Cc: Jonathan Nieder <jrnieder@gmail.com>,
"Shawn O. Pearce" <spearce@spearce.org>,
David Barr <davidbarr@google.com>,
Dmitry Ivankov <divanorama@gmail.com>
Subject: [PATCH 1/2] fast-import: count and report # of calls to diff_delta in stats
Date: Sun, 21 Aug 2011 01:04:11 +0600 [thread overview]
Message-ID: <1313867052-11993-2-git-send-email-divanorama@gmail.com> (raw)
In-Reply-To: <1313867052-11993-1-git-send-email-divanorama@gmail.com>
It's an interesting number, how often do we try to deltify each type of
objects and how often do we succeed. So do add it to stats.
Success doesn't mean much gain in pack size though. As we allow delta to
be as big as (data.len - 20). And delta close to data.len gains nothing
compared to no delta at all even after zlib compression (delta is pretty
much the same as data, just with few modifications).
We should try to make less attempts that result in huge deltas as these
consume more cpu than trivial small deltas. Either by choosing a better
delta base or reducing delta size upper bound or doing less delta attempts
at all.
Currently, delta base for blobs is a waste literally. Each blob delta
base is chosen as a previously stored blob. Disabling deltas for blobs
doesn't increase pack size and reduce import time, or at least doesn't
increase time for all fast-import streams I've tried.
Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
fast-import.c | 10 ++++++----
1 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/fast-import.c b/fast-import.c
index 7cc2262..2b069e3 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -284,6 +284,7 @@ static uintmax_t marks_set_count;
static uintmax_t object_count_by_type[1 << TYPE_BITS];
static uintmax_t duplicate_count_by_type[1 << TYPE_BITS];
static uintmax_t delta_count_by_type[1 << TYPE_BITS];
+static uintmax_t delta_count_attempts_by_type[1 << TYPE_BITS];
static unsigned long object_count;
static unsigned long branch_count;
static unsigned long branch_load_count;
@@ -1045,6 +1046,7 @@ static int store_object(
}
if (last && last->data.buf && last->depth < max_depth && dat->len > 20) {
+ delta_count_attempts_by_type[type]++;
delta = diff_delta(last->data.buf, last->data.len,
dat->buf, dat->len,
&deltalen, dat->len - 20);
@@ -3338,10 +3340,10 @@ int main(int argc, const char **argv)
fprintf(stderr, "---------------------------------------------------------------------\n");
fprintf(stderr, "Alloc'd objects: %10" PRIuMAX "\n", alloc_count);
fprintf(stderr, "Total objects: %10" PRIuMAX " (%10" PRIuMAX " duplicates )\n", total_count, duplicate_count);
- fprintf(stderr, " blobs : %10" PRIuMAX " (%10" PRIuMAX " duplicates %10" PRIuMAX " deltas)\n", object_count_by_type[OBJ_BLOB], duplicate_count_by_type[OBJ_BLOB], delta_count_by_type[OBJ_BLOB]);
- fprintf(stderr, " trees : %10" PRIuMAX " (%10" PRIuMAX " duplicates %10" PRIuMAX " deltas)\n", object_count_by_type[OBJ_TREE], duplicate_count_by_type[OBJ_TREE], delta_count_by_type[OBJ_TREE]);
- fprintf(stderr, " commits: %10" PRIuMAX " (%10" PRIuMAX " duplicates %10" PRIuMAX " deltas)\n", object_count_by_type[OBJ_COMMIT], duplicate_count_by_type[OBJ_COMMIT], delta_count_by_type[OBJ_COMMIT]);
- fprintf(stderr, " tags : %10" PRIuMAX " (%10" PRIuMAX " duplicates %10" PRIuMAX " deltas)\n", object_count_by_type[OBJ_TAG], duplicate_count_by_type[OBJ_TAG], delta_count_by_type[OBJ_TAG]);
+ fprintf(stderr, " blobs : %10" PRIuMAX " (%10" PRIuMAX " duplicates %10" PRIuMAX " deltas of %10" PRIuMAX" attempts)\n", object_count_by_type[OBJ_BLOB], duplicate_count_by_type[OBJ_BLOB], delta_count_by_type[OBJ_BLOB], delta_count_attempts_by_type[OBJ_BLOB]);
+ fprintf(stderr, " trees : %10" PRIuMAX " (%10" PRIuMAX " duplicates %10" PRIuMAX " deltas of %10" PRIuMAX" attempts)\n", object_count_by_type[OBJ_TREE], duplicate_count_by_type[OBJ_TREE], delta_count_by_type[OBJ_TREE], delta_count_attempts_by_type[OBJ_TREE]);
+ fprintf(stderr, " commits: %10" PRIuMAX " (%10" PRIuMAX " duplicates %10" PRIuMAX " deltas of %10" PRIuMAX" attempts)\n", object_count_by_type[OBJ_COMMIT], duplicate_count_by_type[OBJ_COMMIT], delta_count_by_type[OBJ_COMMIT], delta_count_attempts_by_type[OBJ_COMMIT]);
+ fprintf(stderr, " tags : %10" PRIuMAX " (%10" PRIuMAX " duplicates %10" PRIuMAX " deltas of %10" PRIuMAX" attempts)\n", object_count_by_type[OBJ_TAG], duplicate_count_by_type[OBJ_TAG], delta_count_by_type[OBJ_TAG], delta_count_attempts_by_type[OBJ_TAG]);
fprintf(stderr, "Total branches: %10lu (%10lu loads )\n", branch_count, branch_load_count);
fprintf(stderr, " marks: %10" PRIuMAX " (%10" PRIuMAX " unique )\n", (((uintmax_t)1) << marks->shift) * 1024, marks_set_count);
fprintf(stderr, " atoms: %10u\n", atom_cnt);
--
1.7.3.4
next prev parent reply other threads:[~2011-08-20 19:02 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-20 19:04 [PATCH 0/2] fast-import: improve deltas for blobs Dmitry Ivankov
2011-08-20 19:04 ` Dmitry Ivankov [this message]
2011-08-20 19:04 ` [PATCH 2/2] fast-import: treat cat-blob as a delta base hint for next blob Dmitry Ivankov
2011-08-20 19:17 ` Jonathan Nieder
2011-08-21 11:01 ` David Michael Barr
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1313867052-11993-2-git-send-email-divanorama@gmail.com \
--to=divanorama@gmail.com \
--cc=davidbarr@google.com \
--cc=git@vger.kernel.org \
--cc=jrnieder@gmail.com \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).