From: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
"René Scharfe" <rene.scharfe@lsrfire.ath.cx>,
"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: [PATCH 5/5] archive-zip: stream large blobs into zip file
Date: Mon, 30 Apr 2012 11:57:17 +0700 [thread overview]
Message-ID: <1335761837-12482-6-git-send-email-pclouds@gmail.com> (raw)
In-Reply-To: <1335761837-12482-1-git-send-email-pclouds@gmail.com>
A large blob will be read twice. One for calculating crc32, one for
actual writing. Large blobs are written uncompressed for simplicity.
Writing compressed large blobs is possible. But a naive implementation
would need to decompress/compress the blob twice: one to calculate
compressed size, one for actual writing, assuming compressed blobs are
still over large file limit.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
I think we could extract compressed size from pack index, then stream
the compressed blob directly from pack to zip file. But that makes
git-archive sensitive to pack format. And to be honest I don't care
that much about large file support to do it. This patch is good
enough for me.
Documentation/git-archive.txt | 3 ++
archive-zip.c | 42 ++++++++++++++++++++++++++++++++++++++++-
t/t1050-large.sh | 4 +++
3 files changed, 48 insertions(+), 1 deletions(-)
diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index ac7006e..6df85a6 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -120,6 +120,9 @@ tar.<format>.remote::
user-defined formats, but true for the "tar.gz" and "tgz"
formats.
+core.bigFileThreshold::
+ Files larger than this size are stored uncompressed in zip format.
+
ATTRIBUTES
----------
diff --git a/archive-zip.c b/archive-zip.c
index f8039ba..ee58bda 100644
--- a/archive-zip.c
+++ b/archive-zip.c
@@ -3,6 +3,7 @@
*/
#include "cache.h"
#include "archive.h"
+#include "streaming.h"
static int zip_date;
static int zip_time;
@@ -120,6 +121,29 @@ static void *zlib_deflate(void *data, unsigned long size,
return buffer;
}
+static int crc32_stream(const unsigned char *sha1, unsigned long *crc)
+{
+ struct git_istream *st;
+ enum object_type type;
+ unsigned long sz;
+
+ st = open_istream(sha1, &type, &sz, NULL);
+ if (!st)
+ return error("cannot stream blob %s", sha1_to_hex(sha1));
+ for (;;) {
+ char buf[1024];
+ ssize_t readlen;
+
+ readlen = read_istream(st, buf, sizeof(buf));
+
+ if (readlen <= 0)
+ return readlen;
+ *crc = crc32(*crc, (unsigned char*)buf, readlen);
+ }
+ close_istream(st);
+ return 0;
+}
+
static int write_zip_entry(struct archiver_args *args,
const unsigned char *sha1,
const char *path, size_t pathlen,
@@ -153,6 +177,19 @@ static int write_zip_entry(struct archiver_args *args,
compressed_size = 0;
buffer = NULL;
size = 0;
+ } else if (!args->convert && S_ISREG(mode) &&
+ sha1_object_info(sha1, &size) == OBJ_BLOB &&
+ size > big_file_threshold) {
+ buffer = NULL;
+ method = 0;
+ attr2 = S_ISLNK(mode) ? ((mode | 0777) << 16) :
+ (mode & 0111) ? ((mode) << 16) : 0;
+ if (crc32_stream(sha1, &crc) < 0)
+ return error("failed to calculate crc32 from blob %s, SHA1 %s",
+ path, sha1_to_hex(sha1));
+ out = buffer;
+ uncompressed_size = size;
+ compressed_size = size;
} else if (S_ISREG(mode) || S_ISLNK(mode)) {
enum object_type type;
buffer = sha1_file_to_archive(args, path, sha1, mode, &type, &size);
@@ -234,7 +271,10 @@ static int write_zip_entry(struct archiver_args *args,
write_or_die(1, path, pathlen);
zip_offset += pathlen;
if (compressed_size > 0) {
- write_or_die(1, out, compressed_size);
+ if (out)
+ write_or_die(1, out, compressed_size);
+ else
+ stream_blob_to_fd(1, sha1, NULL, 0);
zip_offset += compressed_size;
}
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index fe47554..458fdde 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -138,4 +138,8 @@ test_expect_success 'tar achiving' '
git archive --format=tar HEAD >/dev/null
'
+test_expect_success 'zip achiving' '
+ git archive --format=zip HEAD >/dev/null
+'
+
test_done
--
1.7.8.36.g69ee2
next prev parent reply other threads:[~2012-04-30 5:01 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-30 4:57 [PATCH 0/5] Large file support for git-archive Nguyễn Thái Ngọc Duy
2012-04-30 4:57 ` [PATCH 1/5] archive-tar: turn write_tar_entry into blob-writing only Nguyễn Thái Ngọc Duy
2012-04-30 18:15 ` Junio C Hamano
2012-04-30 22:11 ` René Scharfe
2012-04-30 4:57 ` [PATCH 2/5] archive-tar: unindent write_tar_entry by one level Nguyễn Thái Ngọc Duy
2012-04-30 4:57 ` [PATCH 3/5] archive: delegate blob reading to backend Nguyễn Thái Ngọc Duy
2012-04-30 21:07 ` René Scharfe
2012-04-30 4:57 ` [PATCH 4/5] archive-tar: stream large blobs to tar file Nguyễn Thái Ngọc Duy
2012-04-30 19:01 ` Junio C Hamano
2012-04-30 21:08 ` René Scharfe
2012-04-30 21:36 ` Junio C Hamano
2012-04-30 22:12 ` René Scharfe
2012-04-30 4:57 ` Nguyễn Thái Ngọc Duy [this message]
2012-04-30 19:12 ` [PATCH 5/5] archive-zip: stream large blobs into zip file Junio C Hamano
2012-04-30 22:54 ` René Scharfe
2012-04-30 22:11 ` [PATCH 5a/5] streaming: void pointer instead of char pointer René Scharfe
2012-04-30 22:12 ` [PATCH 6a/5] archive-zip: remove uncompressed_size René Scharfe
2012-04-30 22:12 ` [PATCH 7a/5] archive-zip: factor out helpers for writing sizes and CRC René Scharfe
2012-04-30 22:12 ` [PATCH 8a/5] archive-zip: streaming for stored files René Scharfe
2012-04-30 22:12 ` [PATCH 9a/5] archive-zip: streaming for deflated files René Scharfe
2012-04-30 19:15 ` [PATCH 0/5] Large file support for git-archive Junio C Hamano
2012-04-30 21:07 ` René Scharfe
2012-05-01 10:19 ` Nguyen Thai Ngoc Duy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1335761837-12482-6-git-send-email-pclouds@gmail.com \
--to=pclouds@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=rene.scharfe@lsrfire.ath.cx \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).