git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@cam.org>
To: "Robin H. Johnson" <robbat2@gentoo.org>
Cc: git@vger.kernel.org
Subject: Re: Weird growth in packfile during initial push
Date: Wed, 15 Apr 2009 15:51:40 -0400 (EDT)	[thread overview]
Message-ID: <alpine.LFD.2.00.0904151443030.6741@xanadu.home> (raw)
In-Reply-To: <20090415182754.GF23644@curie-int>

On Wed, 15 Apr 2009, Robin H. Johnson wrote:

> I was doing a more recent conversion of the Gentoo repo, and ran into
> some odd behavior in the packfile size.
> 
> For anybody else following the repo, you can now get it on the new hardware at:
> http://git-exp.overlays.gentoo.org/gitweb/?p=exp/gentoo-x86.git;a=summary
> 
> I did the conversion with cvs2svn, packed, added the remote and pushed, only to
> find that the pack on the remote side suddenly seemed to be ~60MiB larger.

Hmmm.

> $ ls -la /tmp/convert/gentoo-x86-cvs2git/.git/objects/pack
> total 903804
> drwxr-xr-x 2 robbat2 users       119 Apr 14 08:05 .
> drwxr-xr-x 4 robbat2 users        28 Apr 14 08:05 ..
> -r--r--r-- 1 robbat2 users 139155472 Apr 14 08:05 pack-f805bb448f864becfeac9c7f8a8ac2ef90c26787.idx
> -r--r--r-- 1 robbat2 users 786336481 Apr 14 08:05 pack-f805bb448f864becfeac9c7f8a8ac2ef90c26787.pack
> 
> $ git remote add origin git+ssh://git@git-exp.overlays.gentoo.org/exp/gentoo-x86.git
> $ git push origin master:master
> Initialized empty Git repository in /var/gitroot/exp/gentoo-x86.git/
> Counting objects: 4969800, done.
> Delta compression using up to 8 threads.
> Compressing objects: 100% (1217809/1217809), done.
> Writing objects: 100% (4969800/4969800), 810.56 MiB | 21608 KiB/s, done.
> Total 4969800 (delta 3735812), reused 4969800 (delta 3735812)

Here we know for sure that all objects were directly reused, so no 
attempt at recompressing them was done.  The only thing that 
pack-objects might do in this case in addition to directly streaming the 
existing pack is to convert delta object headers from OFS_DELTA to 
REF_DELTA.

> $ ls -la /var/gitroot/exp/gentoo-x86.git/objects/pack
> total 966876
> drwxr-xr-x 2 git git      4096 Apr 14 08:43 .
> drwxr-xr-x 4 git git      4096 Apr 14 08:35 ..
> -r--r--r-- 1 git git 139155472 Apr 14 08:43 pack-f805bb448f864becfeac9c7f8a8ac2ef90c26787.idx
> -r--r--r-- 1 git git 849936308 Apr 14 08:43 pack-f805bb448f864becfeac9c7f8a8ac2ef90c26787.pack

Let's see if my theory stands:

	849936308 - 786336481 = 63599827
	63599827 / 3735812 = 17.02

Hence an average difference of 17 bytes per delta.  Given that REF_DELTA 
objects have a 20-byte SHA1 base reference which is replaced with a 
variable length encoding of a pack offset in the OFS_DELTA case, we're 
talking about 2.98 bytes for that offset encoding which feels about 
right.

[...]

And the code matches this theory as well.  Can you try this patch if you 
have a chance?

diff --git a/builtin-send-pack.c b/builtin-send-pack.c
index 91c3651..e41adbf 100644
--- a/builtin-send-pack.c
+++ b/builtin-send-pack.c
@@ -44,12 +44,16 @@ static int pack_objects(int fd, struct ref *refs, struct extra_have_objects *ext
 		"--stdout",
 		NULL,
 		NULL,
+		NULL,
 	};
 	struct child_process po;
 	int i;
 
+	i = 4;
 	if (args->use_thin_pack)
-		argv[4] = "--thin";
+		argv[i++] = "--thin";
+	if (args->use_ofs_delta)
+		argv[i++] = "--delta-base-offset";
 	memset(&po, 0, sizeof(po));
 	po.argv = argv;
 	po.in = -1;
@@ -316,6 +320,8 @@ int send_pack(struct send_pack_args *args,
 		ask_for_status_report = 1;
 	if (server_supports("delete-refs"))
 		allow_deleting_refs = 1;
+	if (server_supports("ofs-delta"))
+		args->use_ofs_delta = 1;
 
 	if (!remote_refs) {
 		fprintf(stderr, "No refs in common and none specified; doing nothing.\n"
diff --git a/send-pack.h b/send-pack.h
index 83d76c7..1d7b1b3 100644
--- a/send-pack.h
+++ b/send-pack.h
@@ -6,6 +6,7 @@ struct send_pack_args {
 		send_mirror:1,
 		force_update:1,
 		use_thin_pack:1,
+		use_ofs_delta:1,
 		dry_run:1;
 };
 


Nicolas

  reply	other threads:[~2009-04-15 19:53 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-15 18:27 Weird growth in packfile during initial push Robin H. Johnson
2009-04-15 19:51 ` Nicolas Pitre [this message]
2009-04-29 23:57   ` Junio C Hamano
2009-04-30  2:52     ` Nicolas Pitre
2009-05-01  6:17     ` Robin H. Johnson
2009-05-01 20:56     ` [PATCH] allow OFS_DELTA objects during a push Nicolas Pitre
2009-05-01 23:49       ` Junio C Hamano
2009-05-02  0:01         ` Compatibility between git.git and jgit Shawn O. Pearce
2009-05-02  1:14           ` A Large Angry SCM
2009-05-02  1:39           ` Nicolas Pitre
2009-05-02  1:59             ` Shawn O. Pearce
2009-05-02 16:56             ` Ealdwulf Wuffinga
2009-05-02  1:40           ` Michael Witten
2009-05-02  0:24         ` [PATCH] allow OFS_DELTA objects during a push Nicolas Pitre
2009-05-04 22:11       ` Shawn O. Pearce
2009-05-04 22:30         ` Shawn O. Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.00.0904151443030.6741@xanadu.home \
    --to=nico@cam.org \
    --cc=git@vger.kernel.org \
    --cc=robbat2@gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).