git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@cam.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: [PATCH 6/7] pack-objects: allow for early delta deflating
Date: Fri, 02 May 2008 15:11:50 -0400	[thread overview]
Message-ID: <1209755511-7840-7-git-send-email-nico@cam.org> (raw)
In-Reply-To: <1209755511-7840-6-git-send-email-nico@cam.org>

When the delta data is cached in memory until it is written to a pack
file on disk, it is best to compress it right away in find_deltas() for
the following reasons:

  - we have to compress that data anyway;

  - this allows for caching more deltas with the same cache size limit;

  - compression is potentially threaded.

This last point is especially relevant for SMP run time.  For example,
repacking the Linux repo on a quad core processor using 4 threads with
all default settings produce the following results before this change:

	real    2m27.929s
	user    4m36.492s
	sys     0m3.091s

And with this change applied:

	real    2m13.787s
	user    4m37.486s
	sys     0m3.159s

So the actual execution time stayed more or less the same but the
wall clock time is shorter.

This is however not a good thing to do when generating a pack for
network transmission.  In that case, the network is most likely to
throttle the data throughput, so it is best to make find_deltas()
faster in order to start writing data ASAP since we can afford
spending more time between writes to compress the data
at that point.

Signed-off-by: Nicolas Pitre <nico@cam.org>
---
 builtin-pack-objects.c |   35 ++++++++++++++++++++++++++++++++++-
 1 files changed, 34 insertions(+), 1 deletions(-)

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 4319e7b..f5dcd20 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -43,6 +43,7 @@ struct object_entry {
 					     */
 	void *delta_data;	/* cached delta (uncompressed) */
 	unsigned long delta_size;	/* delta data size (uncompressed) */
+	unsigned long z_delta_size;	/* delta data size (compressed) */
 	unsigned int hash;	/* name hint hash */
 	enum object_type type;
 	enum object_type in_pack_type;	/* could be delta */
@@ -301,6 +302,11 @@ static unsigned long write_object(struct sha1file *f,
 			buf = read_sha1_file(entry->idx.sha1, &type, &size);
 			if (!buf)
 				die("unable to read %s", sha1_to_hex(entry->idx.sha1));
+			/* make sure no cached delta data remains from a
+			   previous attempt before a pack split occured */
+			free(entry->delta_data);
+			entry->delta_data = NULL;
+			entry->z_delta_size = 0;
 		} else if (entry->delta_data) {
 			size = entry->delta_size;
 			buf = entry->delta_data;
@@ -313,7 +319,11 @@ static unsigned long write_object(struct sha1file *f,
 			type = (allow_ofs_delta && entry->delta->idx.offset) ?
 				OBJ_OFS_DELTA : OBJ_REF_DELTA;
 		}
-		datalen = do_compress(&buf, size);
+
+		if (entry->z_delta_size)
+			datalen = entry->z_delta_size;
+		else
+			datalen = do_compress(&buf, size);
 
 		/*
 		 * The object header is a byte of 'type' followed by zero or
@@ -1447,6 +1457,29 @@ static void find_deltas(struct object_entry **list, unsigned *list_size,
 				best_base = other_idx;
 		}
 
+		/*
+		 * If we decided to cache the delta data, then it is best
+		 * to compress it right away.  First because we have to do
+		 * it anyway, and doing it here while we're threaded will
+		 * save a lot of time in the non threaded write phase,
+		 * as well as allow for caching more deltas within
+		 * the same cache size limit.
+		 * ...
+		 * But only if not writing to stdout, since in that case
+		 * the network is most likely throttling writes anyway,
+		 * and therefore it is best to go to the write phase ASAP
+		 * instead, as we can afford spending more time compressing
+		 * between writes at that moment.
+		 */
+		if (entry->delta_data && !pack_to_stdout) {
+			entry->z_delta_size = do_compress(&entry->delta_data,
+							  entry->delta_size);
+			cache_lock();
+			delta_cache_size -= entry->delta_size;
+			delta_cache_size += entry->z_delta_size;
+			cache_unlock();
+		}
+
 		/* if we made n a delta, and if n is already at max
 		 * depth, leaving it in the window is pointless.  we
 		 * should evict it first.
-- 
1.5.5.1.226.g6f6e8

  reply	other threads:[~2008-05-02 19:13 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-02 19:11 [PATCH 0/7] assorted pack-objects cleanups and improvements Nicolas Pitre
2008-05-02 19:11 ` [PATCH 1/7] pack-objects: small cleanup Nicolas Pitre
2008-05-02 19:11   ` [PATCH 2/7] pack-objects: remove some double negative logic Nicolas Pitre
2008-05-02 19:11     ` [PATCH 3/7] pack-objects: simplify the condition associated with --all-progress Nicolas Pitre
2008-05-02 19:11       ` [PATCH 4/7] pack-objects: clean up write_object() a bit Nicolas Pitre
2008-05-02 19:11         ` [PATCH 5/7] pack-objects: move compression code in a separate function Nicolas Pitre
2008-05-02 19:11           ` Nicolas Pitre [this message]
2008-05-02 19:11             ` [PATCH 7/7] pack-objects: fix early eviction for max depth delta objects Nicolas Pitre
2008-05-02 22:44             ` [PATCH 6/7] pack-objects: allow for early delta deflating A Large Angry SCM
2008-05-02 23:03               ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1209755511-7840-7-git-send-email-nico@cam.org \
    --to=nico@cam.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).