git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sha1_file.c: zlib can only process 4GB at a time
@ 2011-06-08 19:03 Junio C Hamano
  2011-06-08 19:16 ` Andreas Schwab
  2011-06-09 20:11 ` [PATCH] unpack-objects: " Junio C Hamano
  0 siblings, 2 replies; 3+ messages in thread
From: Junio C Hamano @ 2011-06-08 19:03 UTC (permalink / raw)
  To: git

The size of objects we read from the repository and data we try to put
into the repository are represented in "unsigned long", so that on larger
architectures we can handle objects that weigh more than 4GB.

But the interface defined in zlib.h to communicate with inflate/deflate
limits avail_in (how many bytes of input are we calling zlib with) and
avail_out (how many bytes of output from zlib are we ready to accept)
fields effectively to 4GB by defining their type to be uInt.

In many places in our code, we allocate a large buffer (e.g. mmap'ing a
large loose object file) and tell zlib its size by assigning the size to
avail_in field of the stream, but that will truncate the high octets of
the real size.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * There are a lot more places than just this call site, but I wanted to
   send this out to get a quick sanity check by other people if the
   approach of fixing these issues is sane.

   Of course, we should be using streaming for more codepath, but that is
   totally a separate issue. We would want our code to work correctly when
   streaming is not used and your machine is beefy enough with zetabytes
   of ram, and that is the topic of this patch.

 cache.h     |   11 +++++++++++
 sha1_file.c |   11 +++++++++--
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index e11cf6a..b917ae5 100644
--- a/cache.h
+++ b/cache.h
@@ -24,6 +24,17 @@ void git_inflate_init(z_streamp strm);
 void git_inflate_end(z_streamp strm);
 int git_inflate(z_streamp strm, int flush);
 
+/*
+ * avail_in and avail_out are counted in uInt, which typically limits
+ * the size of the buffer we can use to 4GB when interacting with zlib
+ * in a single call to inflate/deflate.
+ */
+#define ZLIB_BUF_MAX ((1UL << ((sizeof(uInt) * 8))) - 1)
+static inline uInt zlib_buf_cap(unsigned long len)
+{
+	return (ZLIB_BUF_MAX < len ? ZLIB_BUF_MAX : len);
+}
+
 #if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT)
 #define DTYPE(de)	((de)->d_type)
 #else
diff --git a/sha1_file.c b/sha1_file.c
index 064a330..51236ab 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2429,6 +2429,7 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 	unsigned char parano_sha1[20];
 	char *filename;
 	static char tmpfile[PATH_MAX];
+	unsigned long bytes_to_deflate;
 
 	filename = sha1_file_name(sha1);
 	fd = create_tmpfile(tmpfile, sizeof(tmpfile), filename);
@@ -2454,14 +2455,20 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 	git_SHA1_Update(&c, hdr, hdrlen);
 
 	/* Then the data itself.. */
+	bytes_to_deflate = len;
 	stream.next_in = (void *)buf;
-	stream.avail_in = len;
+	stream.avail_in = zlib_buf_cap(bytes_to_deflate);
 	do {
 		unsigned char *in0 = stream.next_in;
+		size_t consumed;
+
 		ret = deflate(&stream, Z_FINISH);
-		git_SHA1_Update(&c, in0, stream.next_in - in0);
+		consumed = stream.next_in - in0;
+		git_SHA1_Update(&c, in0, consumed);
 		if (write_buffer(fd, compressed, stream.next_out - compressed) < 0)
 			die("unable to write sha1 file");
+		bytes_to_deflate -= consumed;
+		stream.avail_in = zlib_buf_cap(bytes_to_deflate);
 		stream.next_out = compressed;
 		stream.avail_out = sizeof(compressed);
 	} while (ret == Z_OK);

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-06-09 20:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-08 19:03 [PATCH] sha1_file.c: zlib can only process 4GB at a time Junio C Hamano
2011-06-08 19:16 ` Andreas Schwab
2011-06-09 20:11 ` [PATCH] unpack-objects: " Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).