* [BUG] fast-import producing very deep tree deltas
@ 2007-11-12 11:03 Brian Downing
2007-11-12 11:13 ` [BUG] fast-import quoting broken for renames Brian Downing
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Brian Downing @ 2007-11-12 11:03 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: git
I've happened upon a case where fast-import produces deep tree deltas.
How deep? Really deep. 6035 entries deep to be precise for this case:
depths: count 135970 total 120567366 min 0 max 6035 mean 886.72 median 3 std_dev 1653.48
27b8a20bdf39fecd917e8401d3499013e49449d0 tree 32 99609547 6035 0000000000000000000000000000000000000000
This was with git-fast-import from 'next' as of a couple days ago,
run with the default options (no --depth passed in).
Needless to say the pack that resulted was just about useless. Trying to
repack it resulted in the "counting objects" phase running at about five
objects per second.
I don't know much about the fast-import code, but I'd guess that the
delta_depth member for the tree_content struct is either getting cleared
inappropriately or is not being propagated correctly. I added a printout
of the depth just before the store_object call in store_tree and it is
never non-zero, even though the pack file clearly was generated with
plenty of deltas.
I may have time to look at this more later this week, but I just wanted
it to be known that this problem existed.
-bcd
^ permalink raw reply [flat|nested] 6+ messages in thread
* [BUG] fast-import quoting broken for renames
2007-11-12 11:03 [BUG] fast-import producing very deep tree deltas Brian Downing
@ 2007-11-12 11:13 ` Brian Downing
2007-11-12 20:26 ` [BUG] fast-import producing very deep tree deltas Linus Torvalds
2007-11-13 8:53 ` Shawn O. Pearce
2 siblings, 0 replies; 6+ messages in thread
From: Brian Downing @ 2007-11-12 11:13 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: git
While I'm at it...
$ GIT_DIR=foo.git git init
$ GIT_DIR=foo.git git fast-import <<EOC
commit refs/import
committer foo <foo> 0 +0000
data <<EOF
test
EOF
M 644 inline "foo"
data <<EOF
foo
EOF
R foo "bar"
EOC
fatal: Garbage after dest in: R foo "bar"
$ GIT_DIR=foo.git git fast-import <<EOC
commit refs/import
committer foo <foo> 0 +0000
data <<EOF
test
EOF
M 644 inline "foo"
data <<EOF
foo
EOF
R "foo" bar
EOC
fatal: Missing space after source: R "foo" bar
-bcd
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] fast-import producing very deep tree deltas
2007-11-12 11:03 [BUG] fast-import producing very deep tree deltas Brian Downing
2007-11-12 11:13 ` [BUG] fast-import quoting broken for renames Brian Downing
@ 2007-11-12 20:26 ` Linus Torvalds
2007-11-13 8:53 ` Shawn O. Pearce
2 siblings, 0 replies; 6+ messages in thread
From: Linus Torvalds @ 2007-11-12 20:26 UTC (permalink / raw)
To: Brian Downing; +Cc: Shawn O. Pearce, git
On Mon, 12 Nov 2007, Brian Downing wrote:
>
> depths: count 135970 total 120567366 min 0 max 6035 mean 886.72 median 3 std_dev 1653.48
>
> Needless to say the pack that resulted was just about useless. Trying to
> repack it resulted in the "counting objects" phase running at about five
> objects per second.
Hmm. Quick hack: increase the delta cache window. The reason (I think) why
performance turns glacial with really deep delta chains is that it turns
into an O(n^2) thing when you don't hit in the delta cache, and your delta
depth is so deep that following a *single* delta chain will flush the
cache.
So I bet that making the delta cache bigger will "fix" it. You probably
don't have to make it 6000+ entries, but with the *median* being that
deep, making it 2k entries should improve it for most cases.
Obviously fastimport should be fixed to not do those insanely deep chains
too, but it might be a good idea to at least make the delta cache a bit
bigger by default, and perhaps have some config option for setting it.
Linus
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] fast-import producing very deep tree deltas
2007-11-12 11:03 [BUG] fast-import producing very deep tree deltas Brian Downing
2007-11-12 11:13 ` [BUG] fast-import quoting broken for renames Brian Downing
2007-11-12 20:26 ` [BUG] fast-import producing very deep tree deltas Linus Torvalds
@ 2007-11-13 8:53 ` Shawn O. Pearce
2007-11-13 9:27 ` Shawn O. Pearce
2 siblings, 1 reply; 6+ messages in thread
From: Shawn O. Pearce @ 2007-11-13 8:53 UTC (permalink / raw)
To: Brian Downing; +Cc: git
Brian Downing <bdowning@lavos.net> wrote:
> I've happened upon a case where fast-import produces deep tree deltas.
> How deep? Really deep. 6035 entries deep to be precise for this case:
>
> depths: count 135970 total 120567366 min 0 max 6035 mean 886.72 median 3 std_dev 1653.48
>
> 27b8a20bdf39fecd917e8401d3499013e49449d0 tree 32 99609547 6035 0000000000000000000000000000000000000000
>
> This was with git-fast-import from 'next' as of a couple days ago,
> run with the default options (no --depth passed in).
>
> Needless to say the pack that resulted was just about useless. Trying to
> repack it resulted in the "counting objects" phase running at about five
> objects per second.
Heh.
I think what's happening here is your active branch cache isn't
big enough. We're swapping out the branch and thus recycling the
tree information (struct tree_content) back into the free pool.
When we later reload the tree we set the delta_depth to 0 but we
kept the tree we just reloaded as a delta base.
So if the tree we reloaded was already at the maximum we wouldn't
know it and make the new tree a delta. Multiply the number of times
the branch cache has to swap out the tree times max_depth (10) and
you get the maximum delta depth of a tree created by fast-import.
Given your above data of 6035 I'm guessing your active branch cache
had to swap the branch out 603/604 times during this import.
I think the fix is going to involve caching the depth within struct
object_entry so we can restore it when the tree is reloaded.
--
Shawn.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] fast-import producing very deep tree deltas
2007-11-13 8:53 ` Shawn O. Pearce
@ 2007-11-13 9:27 ` Shawn O. Pearce
2007-11-13 14:36 ` Brian Downing
0 siblings, 1 reply; 6+ messages in thread
From: Shawn O. Pearce @ 2007-11-13 9:27 UTC (permalink / raw)
To: Brian Downing; +Cc: git
"Shawn O. Pearce" <spearce@spearce.org> wrote:
> Brian Downing <bdowning@lavos.net> wrote:
> > I've happened upon a case where fast-import produces deep tree deltas.
> > How deep? Really deep. 6035 entries deep to be precise for this case:
> >
> > depths: count 135970 total 120567366 min 0 max 6035 mean 886.72 median 3 std_dev 1653.48
> >
> > 27b8a20bdf39fecd917e8401d3499013e49449d0 tree 32 99609547 6035 0000000000000000000000000000000000000000
> >
> > This was with git-fast-import from 'next' as of a couple days ago,
> > run with the default options (no --depth passed in).
> >
> > Needless to say the pack that resulted was just about useless. Trying to
> > repack it resulted in the "counting objects" phase running at about five
> > objects per second.
Brian, does this fix it?
--8>--
From ff39dd457564b9198344e0cc785afa8cac05b486 Mon Sep 17 00:00:00 2001
From: Shawn O. Pearce <spearce@spearce.org>
Date: Tue, 13 Nov 2007 04:26:24 -0500
Subject: [PATCH] Don't allow fast-import tree delta chains to exceed maximum depth
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Brian Downing noticed fast-import can produce tree depths of up
to 6,035 objects and even deeper. Long delta chains can create
very small packfiles but cause problems during repacking as git
needs to unpack each tree to count the reachable blobs.
What's happening here is the active branch cache isn't big enough.
We're swapping out the branch and thus recycling the tree information
(struct tree_content) back into the free pool. When we later reload
the tree we set the delta_depth to 0 but we kept the tree we just
reloaded as a delta base.
So if the tree we reloaded was already at the maximum depth we
wouldn't know it and make the new tree a delta. Multiply the
number of times the branch cache has to swap out the tree times
max_depth (10) and you get the maximum delta depth of a tree created
by fast-import. In Brian's case above the active branch cache had
to swap the branch out 603/604 times during this import to produce
a tree with a delta depth of 6035.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
fast-import.c | 16 ++++++++++++----
1 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/fast-import.c b/fast-import.c
index f93d7d6..215f1e7 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -153,13 +153,16 @@ Format of STDIN stream:
#define PACK_ID_BITS 16
#define MAX_PACK_ID ((1<<PACK_ID_BITS)-1)
+#define DEPTH_BITS 13
+#define MAX_DEPTH ((1<<DEPTH_BITS)-1)
struct object_entry
{
struct object_entry *next;
uint32_t offset;
- unsigned type : TYPE_BITS;
- unsigned pack_id : PACK_ID_BITS;
+ uint32_t type : TYPE_BITS,
+ pack_id : PACK_ID_BITS,
+ depth : DEPTH_BITS;
unsigned char sha1[20];
};
@@ -1084,6 +1087,7 @@ static int store_object(
delta_count_by_type[type]++;
last->depth++;
+ e->depth = last->depth;
hdrlen = encode_header(OBJ_OFS_DELTA, deltalen, hdr);
write_or_die(pack_data->pack_fd, hdr, hdrlen);
@@ -1097,6 +1101,7 @@ static int store_object(
} else {
if (last)
last->depth = 0;
+ e->depth = 0;
hdrlen = encode_header(type, dat->len, hdr);
write_or_die(pack_data->pack_fd, hdr, hdrlen);
pack_size += hdrlen;
@@ -1160,7 +1165,7 @@ static void load_tree(struct tree_entry *root)
if (myoe && myoe->pack_id != MAX_PACK_ID) {
if (myoe->type != OBJ_TREE)
die("Not a tree: %s", sha1_to_hex(sha1));
- t->delta_depth = 0;
+ t->delta_depth = myoe->depth;
buf = gfi_unpack_entry(myoe, &size);
} else {
enum object_type type;
@@ -2289,8 +2294,11 @@ int main(int argc, const char **argv)
}
else if (!prefixcmp(a, "--max-pack-size="))
max_packsize = strtoumax(a + 16, NULL, 0) * 1024 * 1024;
- else if (!prefixcmp(a, "--depth="))
+ else if (!prefixcmp(a, "--depth=")) {
max_depth = strtoul(a + 8, NULL, 0);
+ if (max_depth > MAX_DEPTH)
+ die("--depth cannot exceed %u", MAX_DEPTH);
+ }
else if (!prefixcmp(a, "--active-branches="))
max_active_branches = strtoul(a + 18, NULL, 0);
else if (!prefixcmp(a, "--import-marks="))
--
1.5.3.5.1661.gcbf0
--
Shawn.
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [BUG] fast-import producing very deep tree deltas
2007-11-13 9:27 ` Shawn O. Pearce
@ 2007-11-13 14:36 ` Brian Downing
0 siblings, 0 replies; 6+ messages in thread
From: Brian Downing @ 2007-11-13 14:36 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: git
On Tue, Nov 13, 2007 at 04:27:21AM -0500, Shawn O. Pearce wrote:
> Brian, does this fix it?
>
> So if the tree we reloaded was already at the maximum depth we
> wouldn't know it and make the new tree a delta. Multiply the
> number of times the branch cache has to swap out the tree times
> max_depth (10) and you get the maximum delta depth of a tree created
> by fast-import. In Brian's case above the active branch cache had
> to swap the branch out 603/604 times during this import to produce
> a tree with a delta depth of 6035.
>
> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
/Much/ better:
git-fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects: 140000
Total objects: 135970 ( 62664 duplicates )
blobs : 42196 ( 13695 duplicates 19898 deltas)
trees : 72143 ( 48969 duplicates 62402 deltas)
commits: 21631 ( 0 duplicates 0 deltas)
tags : 0 ( 0 duplicates 0 deltas)
Total branches: 10 ( 1 loads )
marks: 1048576 ( 63827 unique )
atoms: 18971
Memory total: 8329 KiB
pools: 2860 KiB
objects: 5468 KiB
---------------------------------------------------------------------
pack_report: getpagesize() = 4096
pack_report: core.packedGitWindowSize = 1073741824
pack_report: core.packedGitLimit = 8589934592
pack_report: pack_used_ctr = 273071
pack_report: pack_mmap_calls = 16855
pack_report: pack_open_windows = 50 / 363
pack_report: pack_mapped = 8529277175 / 8589933814
---------------------------------------------------------------------
depths: count 135970 total 380519 min 0 max 10 mean 2.80 median 1 std_dev 3.22
In addition, fast-import ran much (probably 10x) faster and with much less
memory usage (last time it peaked around 1GB):
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13098 bdowning 18 0 8223m 34m 6576 R 72 1.7 0:51.44 git-fast-import
Presumably not having to rebuild the root tree object from a hundreds-deep
delta chain many hundreds of times sped things up a bit.
Acked-by: Brian Downing <bdowning@lavos.net>
-bcd
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-11-13 14:36 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-12 11:03 [BUG] fast-import producing very deep tree deltas Brian Downing
2007-11-12 11:13 ` [BUG] fast-import quoting broken for renames Brian Downing
2007-11-12 20:26 ` [BUG] fast-import producing very deep tree deltas Linus Torvalds
2007-11-13 8:53 ` Shawn O. Pearce
2007-11-13 9:27 ` Shawn O. Pearce
2007-11-13 14:36 ` Brian Downing
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).