git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sha1_file: don't malloc the whole compressed result when writing out objects
@ 2010-02-21  4:27 Nicolas Pitre
  2010-02-21 19:45 ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Nicolas Pitre @ 2010-02-21  4:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

There is no real advantage to malloc the whole output buffer and
deflate the data in a single pass when writing loose objects. That is
like only 1% faster while using more memory, especially with large
files where memory usage is far more. It is best to deflate and write
the data out in small chunks reusing the same memory instead.

For example, using 'git add' on a few large files averaging 40 MB ...

Before:
21.45user 1.10system 0:22.57elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+828040outputs (0major+142640minor)pagefaults 0swaps

After:
21.50user 1.25system 0:22.76elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+828040outputs (0major+104408minor)pagefaults 0swaps

While the runtime stayed relatively the same, the number of minor page
faults went down significantly.

Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
---

I think this is worth doing independently of the paranoid mode being 
discussed.

diff --git a/sha1_file.c b/sha1_file.c
index 657825e..9196b57 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2281,8 +2281,7 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 			      void *buf, unsigned long len, time_t mtime)
 {
 	int fd, ret;
-	size_t size;
-	unsigned char *compressed;
+	unsigned char compressed[4096];
 	z_stream stream;
 	char *filename;
 	static char tmpfile[PATH_MAX];
@@ -2301,12 +2300,8 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 	/* Set it up */
 	memset(&stream, 0, sizeof(stream));
 	deflateInit(&stream, zlib_compression_level);
-	size = 8 + deflateBound(&stream, len+hdrlen);
-	compressed = xmalloc(size);
-
-	/* Compress it */
 	stream.next_out = compressed;
-	stream.avail_out = size;
+	stream.avail_out = sizeof(compressed);
 
 	/* First header.. */
 	stream.next_in = (unsigned char *)hdr;
@@ -2317,20 +2312,21 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 	/* Then the data itself.. */
 	stream.next_in = buf;
 	stream.avail_in = len;
-	ret = deflate(&stream, Z_FINISH);
+	do {
+		ret = deflate(&stream, Z_FINISH);
+		if (write_buffer(fd, compressed, stream.next_out - compressed) < 0)
+			die("unable to write sha1 file");
+		stream.next_out = compressed;
+		stream.avail_out = sizeof(compressed);
+	} while (ret == Z_OK);
+
 	if (ret != Z_STREAM_END)
 		die("unable to deflate new object %s (%d)", sha1_to_hex(sha1), ret);
-
 	ret = deflateEnd(&stream);
 	if (ret != Z_OK)
 		die("deflateEnd on object %s failed (%d)", sha1_to_hex(sha1), ret);
 
-	size = stream.total_out;
-
-	if (write_buffer(fd, compressed, size) < 0)
-		die("unable to write sha1 file");
 	close_sha1_file(fd);
-	free(compressed);
 
 	if (mtime) {
 		struct utimbuf utb;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-02-22 19:56 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-21  4:27 [PATCH] sha1_file: don't malloc the whole compressed result when writing out objects Nicolas Pitre
2010-02-21 19:45 ` Junio C Hamano
2010-02-21 21:26   ` Nicolas Pitre
2010-02-21 22:22     ` Junio C Hamano
2010-02-21 22:30     ` Junio C Hamano
2010-02-22  1:35       ` Nicolas Pitre
2010-02-22  5:30         ` Junio C Hamano
2010-02-22  5:50           ` Nicolas Pitre
2010-02-22  6:17             ` Junio C Hamano
2010-02-22  6:31               ` Junio C Hamano
2010-02-22 17:36                 ` Nicolas Pitre
2010-02-22 19:55                   ` Junio C Hamano
2010-02-22  6:27             ` Dmitry Potapov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).