From: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
"René Scharfe" <rene.scharfe@lsrfire.ath.cx>,
"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: [PATCH 5/5] archive-zip: stream large blobs into zip file
Date: Mon, 30 Apr 2012 11:57:17 +0700 [thread overview]
Message-ID: <1335761837-12482-6-git-send-email-pclouds@gmail.com> (raw)
In-Reply-To: <1335761837-12482-1-git-send-email-pclouds@gmail.com>
A large blob will be read twice. One for calculating crc32, one for
actual writing. Large blobs are written uncompressed for simplicity.
Writing compressed large blobs is possible. But a naive implementation
would need to decompress/compress the blob twice: one to calculate
compressed size, one for actual writing, assuming compressed blobs are
still over large file limit.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
I think we could extract compressed size from pack index, then stream
the compressed blob directly from pack to zip file. But that makes
git-archive sensitive to pack format. And to be honest I don't care
that much about large file support to do it. This patch is good
enough for me.
Documentation/git-archive.txt | 3 ++
archive-zip.c | 42 ++++++++++++++++++++++++++++++++++++++++-
t/t1050-large.sh | 4 +++
3 files changed, 48 insertions(+), 1 deletions(-)
diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index ac7006e..6df85a6 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -120,6 +120,9 @@ tar.<format>.remote::
user-defined formats, but true for the "tar.gz" and "tgz"
formats.
+core.bigFileThreshold::
+ Files larger than this size are stored uncompressed in zip format.
+
ATTRIBUTES
----------
diff --git a/archive-zip.c b/archive-zip.c
index f8039ba..ee58bda 100644
--- a/archive-zip.c
+++ b/archive-zip.c
@@ -3,6 +3,7 @@
*/
#include "cache.h"
#include "archive.h"
+#include "streaming.h"
static int zip_date;
static int zip_time;
@@ -120,6 +121,29 @@ static void *zlib_deflate(void *data, unsigned long size,
return buffer;
}
+static int crc32_stream(const unsigned char *sha1, unsigned long *crc)
+{
+ struct git_istream *st;
+ enum object_type type;
+ unsigned long sz;
+
+ st = open_istream(sha1, &type, &sz, NULL);
+ if (!st)
+ return error("cannot stream blob %s", sha1_to_hex(sha1));
+ for (;;) {
+ char buf[1024];
+ ssize_t readlen;
+
+ readlen = read_istream(st, buf, sizeof(buf));
+
+ if (readlen <= 0)
+ return readlen;
+ *crc = crc32(*crc, (unsigned char*)buf, readlen);
+ }
+ close_istream(st);
+ return 0;
+}
+
static int write_zip_entry(struct archiver_args *args,
const unsigned char *sha1,
const char *path, size_t pathlen,
@@ -153,6 +177,19 @@ static int write_zip_entry(struct archiver_args *args,
compressed_size = 0;
buffer = NULL;
size = 0;
+ } else if (!args->convert && S_ISREG(mode) &&
+ sha1_object_info(sha1, &size) == OBJ_BLOB &&
+ size > big_file_threshold) {
+ buffer = NULL;
+ method = 0;
+ attr2 = S_ISLNK(mode) ? ((mode | 0777) << 16) :
+ (mode & 0111) ? ((mode) << 16) : 0;
+ if (crc32_stream(sha1, &crc) < 0)
+ return error("failed to calculate crc32 from blob %s, SHA1 %s",
+ path, sha1_to_hex(sha1));
+ out = buffer;
+ uncompressed_size = size;
+ compressed_size = size;
} else if (S_ISREG(mode) || S_ISLNK(mode)) {
enum object_type type;
buffer = sha1_file_to_archive(args, path, sha1, mode, &type, &size);
@@ -234,7 +271,10 @@ static int write_zip_entry(struct archiver_args *args,
write_or_die(1, path, pathlen);
zip_offset += pathlen;
if (compressed_size > 0) {
- write_or_die(1, out, compressed_size);
+ if (out)
+ write_or_die(1, out, compressed_size);
+ else
+ stream_blob_to_fd(1, sha1, NULL, 0);
zip_offset += compressed_size;
}
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index fe47554..458fdde 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -138,4 +138,8 @@ test_expect_success 'tar achiving' '
git archive --format=tar HEAD >/dev/null
'
+test_expect_success 'zip achiving' '
+ git archive --format=zip HEAD >/dev/null
+'
+
test_done
--
1.7.8.36.g69ee2
next prev parent reply other threads:[~2012-04-30 5:01 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-30 4:57 [PATCH 0/5] Large file support for git-archive Nguyễn Thái Ngọc Duy
2012-04-30 4:57 ` [PATCH 1/5] archive-tar: turn write_tar_entry into blob-writing only Nguyễn Thái Ngọc Duy
2012-04-30 18:15 ` Junio C Hamano
2012-04-30 22:11 ` René Scharfe
2012-04-30 4:57 ` [PATCH 2/5] archive-tar: unindent write_tar_entry by one level Nguyễn Thái Ngọc Duy
2012-04-30 4:57 ` [PATCH 3/5] archive: delegate blob reading to backend Nguyễn Thái Ngọc Duy
2012-04-30 21:07 ` René Scharfe
2012-04-30 4:57 ` [PATCH 4/5] archive-tar: stream large blobs to tar file Nguyễn Thái Ngọc Duy
2012-04-30 19:01 ` Junio C Hamano
2012-04-30 21:08 ` René Scharfe
2012-04-30 21:36 ` Junio C Hamano
2012-04-30 22:12 ` René Scharfe
2012-04-30 4:57 ` Nguyễn Thái Ngọc Duy [this message]
2012-04-30 19:12 ` [PATCH 5/5] archive-zip: stream large blobs into zip file Junio C Hamano
2012-04-30 22:54 ` René Scharfe
2012-04-30 22:11 ` [PATCH 5a/5] streaming: void pointer instead of char pointer René Scharfe
2012-04-30 22:12 ` [PATCH 6a/5] archive-zip: remove uncompressed_size René Scharfe
2012-04-30 22:12 ` [PATCH 7a/5] archive-zip: factor out helpers for writing sizes and CRC René Scharfe
2012-04-30 22:12 ` [PATCH 8a/5] archive-zip: streaming for stored files René Scharfe
2012-04-30 22:12 ` [PATCH 9a/5] archive-zip: streaming for deflated files René Scharfe
2012-04-30 19:15 ` [PATCH 0/5] Large file support for git-archive Junio C Hamano
2012-04-30 21:07 ` René Scharfe
2012-05-01 10:19 ` Nguyen Thai Ngoc Duy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1335761837-12482-6-git-send-email-pclouds@gmail.com \
--to=pclouds@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=rene.scharfe@lsrfire.ath.cx \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.