* [PATCH v2 01/10] Add more large blob test cases
2012-03-04 12:59 ` [PATCH v2 00/10] " Nguyễn Thái Ngọc Duy
@ 2012-03-04 12:59 ` Nguyễn Thái Ngọc Duy
2012-03-06 0:59 ` Junio C Hamano
2012-03-04 12:59 ` [PATCH v2 02/10] streaming: make streaming-write-entry to be more reusable Nguyễn Thái Ngọc Duy
` (20 subsequent siblings)
21 siblings, 1 reply; 48+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-03-04 12:59 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
New test cases list commands that should work when memory is
limited. All memory allocation functions (*) learn to reject any
allocation larger than $GIT_ALLOC_LIMIT if set.
(*) Not exactly all. Some places do not use x* functions, but
malloc/calloc directly, notably diff-delta. These code path should
never be run on large blobs.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
t/t1050-large.sh | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
wrapper.c | 27 ++++++++++++++++++++++--
2 files changed, 82 insertions(+), 4 deletions(-)
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index 29d6024..f245e59 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -10,7 +10,9 @@ test_expect_success setup '
echo X | dd of=large1 bs=1k seek=2000 &&
echo X | dd of=large2 bs=1k seek=2000 &&
echo X | dd of=large3 bs=1k seek=2000 &&
- echo Y | dd of=huge bs=1k seek=2500
+ echo Y | dd of=huge bs=1k seek=2500 &&
+ GIT_ALLOC_LIMIT=1500 &&
+ export GIT_ALLOC_LIMIT
'
test_expect_success 'add a large file or two' '
@@ -100,4 +102,59 @@ test_expect_success 'packsize limit' '
)
'
+test_expect_success 'diff --raw' '
+ git commit -q -m initial &&
+ echo modified >>large1 &&
+ git add large1 &&
+ git commit -q -m modified &&
+ git diff --raw HEAD^
+'
+
+test_expect_success 'hash-object' '
+ git hash-object large1
+'
+
+test_expect_failure 'cat-file a large file' '
+ git cat-file blob :large1 >/dev/null
+'
+
+test_expect_failure 'git-show a large file' '
+ git show :large1 >/dev/null
+
+'
+
+test_expect_failure 'clone' '
+ git clone -n file://"$PWD"/.git new &&
+ (
+ cd new &&
+ git config core.bigfilethreshold 200k &&
+ git checkout master
+ )
+'
+
+test_expect_failure 'fetch updates' '
+ echo modified >> large1 &&
+ git commit -q -a -m updated &&
+ (
+ cd new &&
+ git fetch --keep # FIXME should not need --keep
+ )
+'
+
+test_expect_failure 'fsck' '
+ git fsck --full
+'
+
+test_expect_success 'repack' '
+ git repack -ad
+'
+
+test_expect_failure 'tar achiving' '
+ git archive --format=tar HEAD >/dev/null
+'
+
+test_expect_failure 'zip achiving' '
+ git archive --format=zip HEAD >/dev/null
+'
+
test_done
diff --git a/wrapper.c b/wrapper.c
index 85f09df..d4c0972 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -9,6 +9,18 @@ static void do_nothing(size_t size)
static void (*try_to_free_routine)(size_t size) = do_nothing;
+static void memory_limit_check(size_t size)
+{
+ static int limit = -1;
+ if (limit == -1) {
+ const char *env = getenv("GIT_ALLOC_LIMIT");
+ limit = env ? atoi(env) * 1024 : 0;
+ }
+ if (limit && size > limit)
+ die("attempting to allocate %d over limit %d",
+ size, limit);
+}
+
try_to_free_t set_try_to_free_routine(try_to_free_t routine)
{
try_to_free_t old = try_to_free_routine;
@@ -32,7 +44,10 @@ char *xstrdup(const char *str)
void *xmalloc(size_t size)
{
- void *ret = malloc(size);
+ void *ret;
+
+ memory_limit_check(size);
+ ret = malloc(size);
if (!ret && !size)
ret = malloc(1);
if (!ret) {
@@ -79,7 +94,10 @@ char *xstrndup(const char *str, size_t len)
void *xrealloc(void *ptr, size_t size)
{
- void *ret = realloc(ptr, size);
+ void *ret;
+
+ memory_limit_check(size);
+ ret = realloc(ptr, size);
if (!ret && !size)
ret = realloc(ptr, 1);
if (!ret) {
@@ -95,7 +113,10 @@ void *xrealloc(void *ptr, size_t size)
void *xcalloc(size_t nmemb, size_t size)
{
- void *ret = calloc(nmemb, size);
+ void *ret;
+
+ memory_limit_check(size * nmemb);
+ ret = calloc(nmemb, size);
if (!ret && (!nmemb || !size))
ret = calloc(1, 1);
if (!ret) {
--
1.7.8.36.g69ee2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v2 01/10] Add more large blob test cases
2012-03-04 12:59 ` [PATCH v2 01/10] Add more large blob test cases Nguyễn Thái Ngọc Duy
@ 2012-03-06 0:59 ` Junio C Hamano
0 siblings, 0 replies; 48+ messages in thread
From: Junio C Hamano @ 2012-03-06 0:59 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: git
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> diff --git a/wrapper.c b/wrapper.c
> index 85f09df..d4c0972 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -9,6 +9,18 @@ static void do_nothing(size_t size)
>
> static void (*try_to_free_routine)(size_t size) = do_nothing;
>
> +static void memory_limit_check(size_t size)
> +{
> + static int limit = -1;
> + if (limit == -1) {
> + const char *env = getenv("GIT_ALLOC_LIMIT");
> + limit = env ? atoi(env) * 1024 : 0;
> + }
> + if (limit && size > limit)
> + die("attempting to allocate %d over limit %d",
> + size, limit);
size is size_t and %d calls for an int.
I'll push out a fixed-up version later to 'pu'.
^ permalink raw reply [flat|nested] 48+ messages in thread
* [PATCH v2 02/10] streaming: make streaming-write-entry to be more reusable
2012-03-04 12:59 ` [PATCH v2 00/10] " Nguyễn Thái Ngọc Duy
2012-03-04 12:59 ` [PATCH v2 01/10] Add more large blob test cases Nguyễn Thái Ngọc Duy
@ 2012-03-04 12:59 ` Nguyễn Thái Ngọc Duy
2012-03-04 12:59 ` [PATCH v2 03/10] cat-file: use streaming interface to print blobs Nguyễn Thái Ngọc Duy
` (19 subsequent siblings)
21 siblings, 0 replies; 48+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-03-04 12:59 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
From: Junio C Hamano <gitster@pobox.com>
The static function in entry.c takes a cache entry and streams its blob
contents to a file in the working tree. Refactor the logic to a new API
function stream_blob_to_fd() that takes an object name and an open file
descriptor, so that it can be reused by other callers.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
entry.c | 53 +++++------------------------------------------------
streaming.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
streaming.h | 2 ++
3 files changed, 62 insertions(+), 48 deletions(-)
diff --git a/entry.c b/entry.c
index 852fea1..17a6bcc 100644
--- a/entry.c
+++ b/entry.c
@@ -120,58 +120,15 @@ static int streaming_write_entry(struct cache_entry *ce, char *path,
const struct checkout *state, int to_tempfile,
int *fstat_done, struct stat *statbuf)
{
- struct git_istream *st;
- enum object_type type;
- unsigned long sz;
int result = -1;
- ssize_t kept = 0;
- int fd = -1;
-
- st = open_istream(ce->sha1, &type, &sz, filter);
- if (!st)
- return -1;
- if (type != OBJ_BLOB)
- goto close_and_exit;
+ int fd;
fd = open_output_fd(path, ce, to_tempfile);
- if (fd < 0)
- goto close_and_exit;
-
- for (;;) {
- char buf[1024 * 16];
- ssize_t wrote, holeto;
- ssize_t readlen = read_istream(st, buf, sizeof(buf));
-
- if (!readlen)
- break;
- if (sizeof(buf) == readlen) {
- for (holeto = 0; holeto < readlen; holeto++)
- if (buf[holeto])
- break;
- if (readlen == holeto) {
- kept += holeto;
- continue;
- }
- }
-
- if (kept && lseek(fd, kept, SEEK_CUR) == (off_t) -1)
- goto close_and_exit;
- else
- kept = 0;
- wrote = write_in_full(fd, buf, readlen);
-
- if (wrote != readlen)
- goto close_and_exit;
- }
- if (kept && (lseek(fd, kept - 1, SEEK_CUR) == (off_t) -1 ||
- write(fd, "", 1) != 1))
- goto close_and_exit;
- *fstat_done = fstat_output(fd, state, statbuf);
-
-close_and_exit:
- close_istream(st);
- if (0 <= fd)
+ if (0 <= fd) {
+ result = stream_blob_to_fd(fd, ce->sha1, filter, 1);
+ *fstat_done = fstat_output(fd, state, statbuf);
result = close(fd);
+ }
if (result && 0 <= fd)
unlink(path);
return result;
diff --git a/streaming.c b/streaming.c
index 71072e1..7e7ee2b 100644
--- a/streaming.c
+++ b/streaming.c
@@ -489,3 +489,58 @@ static open_method_decl(incore)
return st->u.incore.buf ? 0 : -1;
}
+
+
+/****************************************************************
+ * Users of streaming interface
+ ****************************************************************/
+
+int stream_blob_to_fd(int fd, unsigned const char *sha1, struct stream_filter *filter,
+ int can_seek)
+{
+ struct git_istream *st;
+ enum object_type type;
+ unsigned long sz;
+ ssize_t kept = 0;
+ int result = -1;
+
+ st = open_istream(sha1, &type, &sz, filter);
+ if (!st)
+ return result;
+ if (type != OBJ_BLOB)
+ goto close_and_exit;
+ for (;;) {
+ char buf[1024 * 16];
+ ssize_t wrote, holeto;
+ ssize_t readlen = read_istream(st, buf, sizeof(buf));
+
+ if (!readlen)
+ break;
+ if (can_seek && sizeof(buf) == readlen) {
+ for (holeto = 0; holeto < readlen; holeto++)
+ if (buf[holeto])
+ break;
+ if (readlen == holeto) {
+ kept += holeto;
+ continue;
+ }
+ }
+
+ if (kept && lseek(fd, kept, SEEK_CUR) == (off_t) -1)
+ goto close_and_exit;
+ else
+ kept = 0;
+ wrote = write_in_full(fd, buf, readlen);
+
+ if (wrote != readlen)
+ goto close_and_exit;
+ }
+ if (kept && (lseek(fd, kept - 1, SEEK_CUR) == (off_t) -1 ||
+ write(fd, "", 1) != 1))
+ goto close_and_exit;
+ result = 0;
+
+ close_and_exit:
+ close_istream(st);
+ return result;
+}
diff --git a/streaming.h b/streaming.h
index 589e857..3e82770 100644
--- a/streaming.h
+++ b/streaming.h
@@ -12,4 +12,6 @@ extern struct git_istream *open_istream(const unsigned char *, enum object_type
extern int close_istream(struct git_istream *);
extern ssize_t read_istream(struct git_istream *, char *, size_t);
+extern int stream_blob_to_fd(int fd, const unsigned char *, struct stream_filter *, int can_seek);
+
#endif /* STREAMING_H */
--
1.7.8.36.g69ee2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v2 03/10] cat-file: use streaming interface to print blobs
2012-03-04 12:59 ` [PATCH v2 00/10] " Nguyễn Thái Ngọc Duy
2012-03-04 12:59 ` [PATCH v2 01/10] Add more large blob test cases Nguyễn Thái Ngọc Duy
2012-03-04 12:59 ` [PATCH v2 02/10] streaming: make streaming-write-entry to be more reusable Nguyễn Thái Ngọc Duy
@ 2012-03-04 12:59 ` Nguyễn Thái Ngọc Duy
2012-03-04 23:12 ` Junio C Hamano
2012-03-04 12:59 ` [PATCH v2 04/10] parse_object: special code path for blobs to avoid putting whole object in memory Nguyễn Thái Ngọc Duy
` (18 subsequent siblings)
21 siblings, 1 reply; 48+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-03-04 12:59 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/cat-file.c | 23 +++++++++++++++++++++++
t/t1050-large.sh | 2 +-
2 files changed, 24 insertions(+), 1 deletions(-)
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 8ed501f..bc6cc9f 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -11,6 +11,7 @@
#include "parse-options.h"
#include "diff.h"
#include "userdiff.h"
+#include "streaming.h"
#define BATCH 1
#define BATCH_CHECK 2
@@ -82,6 +83,24 @@ static void pprint_tag(const unsigned char *sha1, const char *buf, unsigned long
write_or_die(1, cp, endp - cp);
}
+static int write_blob(const unsigned char *sha1)
+{
+ unsigned char new_sha1[20];
+
+ if (sha1_object_info(sha1, NULL) == OBJ_TAG) {
+ enum object_type type;
+ unsigned long size;
+ char *buffer = read_sha1_file(sha1, &type, &size);
+ if (memcmp(buffer, "object ", 7) ||
+ get_sha1_hex(buffer + 7, new_sha1))
+ die("%s not a valid tag", sha1_to_hex(sha1));
+ sha1 = new_sha1;
+ free(buffer);
+ }
+
+ return stream_blob_to_fd(1, sha1, NULL, 0);
+}
+
static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
{
unsigned char sha1[20];
@@ -127,6 +146,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
return cmd_ls_tree(2, ls_args, NULL);
}
+ if (type == OBJ_BLOB)
+ return write_blob(sha1);
buf = read_sha1_file(sha1, &type, &size);
if (!buf)
die("Cannot read object %s", obj_name);
@@ -149,6 +170,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
break;
case 0:
+ if (type_from_string(exp_type) == OBJ_BLOB)
+ return write_blob(sha1);
buf = read_object_with_reference(sha1, exp_type, &size, NULL);
break;
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index f245e59..39a3e77 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -114,7 +114,7 @@ test_expect_success 'hash-object' '
git hash-object large1
'
-test_expect_failure 'cat-file a large file' '
+test_expect_success 'cat-file a large file' '
git cat-file blob :large1 >/dev/null
'
--
1.7.8.36.g69ee2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v2 03/10] cat-file: use streaming interface to print blobs
2012-03-04 12:59 ` [PATCH v2 03/10] cat-file: use streaming interface to print blobs Nguyễn Thái Ngọc Duy
@ 2012-03-04 23:12 ` Junio C Hamano
2012-03-05 2:42 ` Nguyen Thai Ngoc Duy
0 siblings, 1 reply; 48+ messages in thread
From: Junio C Hamano @ 2012-03-04 23:12 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: git
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> +static int write_blob(const unsigned char *sha1)
> +{
> + unsigned char new_sha1[20];
> +
> + if (sha1_object_info(sha1, NULL) == OBJ_TAG) {
Hrm, didn't I say that it tastes bad for a function write_blob() to have
to worry about OBJ_TAG already?
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v2 03/10] cat-file: use streaming interface to print blobs
2012-03-04 23:12 ` Junio C Hamano
@ 2012-03-05 2:42 ` Nguyen Thai Ngoc Duy
0 siblings, 0 replies; 48+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-03-05 2:42 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
2012/3/5 Junio C Hamano <gitster@pobox.com>:
> Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
>
>> +static int write_blob(const unsigned char *sha1)
>> +{
>> + unsigned char new_sha1[20];
>> +
>> + if (sha1_object_info(sha1, NULL) == OBJ_TAG) {
>
> Hrm, didn't I say that it tastes bad for a function write_blob() to have
> to worry about OBJ_TAG already?
My bad. Reworked, added another test case for the dereference case,
and clone exceeded memory limit again due to new test case :( Will
need some more work on this.
--
Duy
^ permalink raw reply [flat|nested] 48+ messages in thread
* [PATCH v2 04/10] parse_object: special code path for blobs to avoid putting whole object in memory
2012-03-04 12:59 ` [PATCH v2 00/10] " Nguyễn Thái Ngọc Duy
` (2 preceding siblings ...)
2012-03-04 12:59 ` [PATCH v2 03/10] cat-file: use streaming interface to print blobs Nguyễn Thái Ngọc Duy
@ 2012-03-04 12:59 ` Nguyễn Thái Ngọc Duy
2012-03-04 12:59 ` [PATCH v2 05/10] show: use streaming interface for showing blobs Nguyễn Thái Ngọc Duy
` (17 subsequent siblings)
21 siblings, 0 replies; 48+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-03-04 12:59 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
object.c | 11 +++++++++++
sha1_file.c | 33 ++++++++++++++++++++++++++++++++-
2 files changed, 43 insertions(+), 1 deletions(-)
diff --git a/object.c b/object.c
index 6b06297..0498b18 100644
--- a/object.c
+++ b/object.c
@@ -198,6 +198,17 @@ struct object *parse_object(const unsigned char *sha1)
if (obj && obj->parsed)
return obj;
+ if ((obj && obj->type == OBJ_BLOB) ||
+ (!obj && has_sha1_file(sha1) &&
+ sha1_object_info(sha1, NULL) == OBJ_BLOB)) {
+ if (check_sha1_signature(repl, NULL, 0, NULL) < 0) {
+ error("sha1 mismatch %s\n", sha1_to_hex(repl));
+ return NULL;
+ }
+ parse_blob_buffer(lookup_blob(sha1), NULL, 0);
+ return lookup_object(sha1);
+ }
+
buffer = read_sha1_file(sha1, &type, &size);
if (buffer) {
if (check_sha1_signature(repl, buffer, size, typename(type)) < 0) {
diff --git a/sha1_file.c b/sha1_file.c
index f9f8d5e..a77ef0a 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -19,6 +19,7 @@
#include "pack-revindex.h"
#include "sha1-lookup.h"
#include "bulk-checkin.h"
+#include "streaming.h"
#ifndef O_NOATIME
#if defined(__linux__) && (defined(__i386__) || defined(__PPC__))
@@ -1149,7 +1150,37 @@ static const struct packed_git *has_packed_and_bad(const unsigned char *sha1)
int check_sha1_signature(const unsigned char *sha1, void *map, unsigned long size, const char *type)
{
unsigned char real_sha1[20];
- hash_sha1_file(map, size, type, real_sha1);
+ enum object_type obj_type;
+ struct git_istream *st;
+ git_SHA_CTX c;
+ char hdr[32];
+ int hdrlen;
+
+ if (map) {
+ hash_sha1_file(map, size, type, real_sha1);
+ return hashcmp(sha1, real_sha1) ? -1 : 0;
+ }
+
+ st = open_istream(sha1, &obj_type, &size, NULL);
+ if (!st)
+ return -1;
+
+ /* Generate the header */
+ hdrlen = sprintf(hdr, "%s %lu", typename(obj_type), size) + 1;
+
+ /* Sha1.. */
+ git_SHA1_Init(&c);
+ git_SHA1_Update(&c, hdr, hdrlen);
+ for (;;) {
+ char buf[1024 * 16];
+ ssize_t readlen = read_istream(st, buf, sizeof(buf));
+
+ if (!readlen)
+ break;
+ git_SHA1_Update(&c, buf, readlen);
+ }
+ git_SHA1_Final(real_sha1, &c);
+ close_istream(st);
return hashcmp(sha1, real_sha1) ? -1 : 0;
}
--
1.7.8.36.g69ee2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v2 05/10] show: use streaming interface for showing blobs
2012-03-04 12:59 ` [PATCH v2 00/10] " Nguyễn Thái Ngọc Duy
` (3 preceding siblings ...)
2012-03-04 12:59 ` [PATCH v2 04/10] parse_object: special code path for blobs to avoid putting whole object in memory Nguyễn Thái Ngọc Duy
@ 2012-03-04 12:59 ` Nguyễn Thái Ngọc Duy
2012-03-04 12:59 ` [PATCH v2 06/10] index-pack: split second pass obj handling into own function Nguyễn Thái Ngọc Duy
` (16 subsequent siblings)
21 siblings, 0 replies; 48+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-03-04 12:59 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/log.c | 34 ++++++++++++++++++++--------------
t/t1050-large.sh | 2 +-
2 files changed, 21 insertions(+), 15 deletions(-)
diff --git a/builtin/log.c b/builtin/log.c
index 7d1f6f8..d1702e7 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -20,6 +20,7 @@
#include "string-list.h"
#include "parse-options.h"
#include "branch.h"
+#include "streaming.h"
/* Set a default date-time format for git log ("log.date" config variable) */
static const char *default_date_mode = NULL;
@@ -381,8 +382,13 @@ static void show_tagger(char *buf, int len, struct rev_info *rev)
strbuf_release(&out);
}
-static int show_object(const unsigned char *sha1, int show_tag_object,
- struct rev_info *rev)
+static int show_blob_object(const unsigned char *sha1, struct rev_info *rev)
+{
+ fflush(stdout);
+ return stream_blob_to_fd(1, sha1, NULL, 0);
+}
+
+static int show_tag_object(const unsigned char *sha1, struct rev_info *rev)
{
unsigned long size;
enum object_type type;
@@ -392,16 +398,16 @@ static int show_object(const unsigned char *sha1, int show_tag_object,
if (!buf)
return error(_("Could not read object %s"), sha1_to_hex(sha1));
- if (show_tag_object)
- while (offset < size && buf[offset] != '\n') {
- int new_offset = offset + 1;
- while (new_offset < size && buf[new_offset++] != '\n')
- ; /* do nothing */
- if (!prefixcmp(buf + offset, "tagger "))
- show_tagger(buf + offset + 7,
- new_offset - offset - 7, rev);
- offset = new_offset;
- }
+ assert(type == OBJ_TAG);
+ while (offset < size && buf[offset] != '\n') {
+ int new_offset = offset + 1;
+ while (new_offset < size && buf[new_offset++] != '\n')
+ ; /* do nothing */
+ if (!prefixcmp(buf + offset, "tagger "))
+ show_tagger(buf + offset + 7,
+ new_offset - offset - 7, rev);
+ offset = new_offset;
+ }
if (offset < size)
fwrite(buf + offset, size - offset, 1, stdout);
@@ -459,7 +465,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
const char *name = objects[i].name;
switch (o->type) {
case OBJ_BLOB:
- ret = show_object(o->sha1, 0, NULL);
+ ret = show_blob_object(o->sha1, NULL);
break;
case OBJ_TAG: {
struct tag *t = (struct tag *)o;
@@ -470,7 +476,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
diff_get_color_opt(&rev.diffopt, DIFF_COMMIT),
t->tag,
diff_get_color_opt(&rev.diffopt, DIFF_RESET));
- ret = show_object(o->sha1, 1, &rev);
+ ret = show_tag_object(o->sha1, &rev);
rev.shown_one = 1;
if (ret)
break;
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index 39a3e77..66acb3b 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -118,7 +118,7 @@ test_expect_success 'cat-file a large file' '
git cat-file blob :large1 >/dev/null
'
-test_expect_failure 'git-show a large file' '
+test_expect_success 'git-show a large file' '
git show :large1 >/dev/null
'
--
1.7.8.36.g69ee2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v2 06/10] index-pack: split second pass obj handling into own function
2012-03-04 12:59 ` [PATCH v2 00/10] " Nguyễn Thái Ngọc Duy
` (4 preceding siblings ...)
2012-03-04 12:59 ` [PATCH v2 05/10] show: use streaming interface for showing blobs Nguyễn Thái Ngọc Duy
@ 2012-03-04 12:59 ` Nguyễn Thái Ngọc Duy
2012-03-04 12:59 ` [PATCH v2 07/10] index-pack: reduce memory usage when the pack has large blobs Nguyễn Thái Ngọc Duy
` (15 subsequent siblings)
21 siblings, 0 replies; 48+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-03-04 12:59 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/index-pack.c | 31 ++++++++++++++++++-------------
1 files changed, 18 insertions(+), 13 deletions(-)
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index dd1c5c9..918684f 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -682,6 +682,23 @@ static int compare_delta_entry(const void *a, const void *b)
objects[delta_b->obj_no].type);
}
+/*
+ * Second pass:
+ * - for all non-delta objects, look if it is used as a base for
+ * deltas;
+ * - if used as a base, uncompress the object and apply all deltas,
+ * recursively checking if the resulting object is used as a base
+ * for some more deltas.
+ */
+static void second_pass(struct object_entry *obj)
+{
+ struct base_data *base_obj = alloc_base_data();
+ base_obj->obj = obj;
+ base_obj->data = NULL;
+ find_unresolved_deltas(base_obj);
+ display_progress(progress, nr_resolved_deltas);
+}
+
/* Parse all objects and return the pack content SHA1 hash */
static void parse_pack_objects(unsigned char *sha1)
{
@@ -736,26 +753,14 @@ static void parse_pack_objects(unsigned char *sha1)
qsort(deltas, nr_deltas, sizeof(struct delta_entry),
compare_delta_entry);
- /*
- * Second pass:
- * - for all non-delta objects, look if it is used as a base for
- * deltas;
- * - if used as a base, uncompress the object and apply all deltas,
- * recursively checking if the resulting object is used as a base
- * for some more deltas.
- */
if (verbose)
progress = start_progress("Resolving deltas", nr_deltas);
for (i = 0; i < nr_objects; i++) {
struct object_entry *obj = &objects[i];
- struct base_data *base_obj = alloc_base_data();
if (is_delta_type(obj->type))
continue;
- base_obj->obj = obj;
- base_obj->data = NULL;
- find_unresolved_deltas(base_obj);
- display_progress(progress, nr_resolved_deltas);
+ second_pass(obj);
}
}
--
1.7.8.36.g69ee2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v2 07/10] index-pack: reduce memory usage when the pack has large blobs
2012-03-04 12:59 ` [PATCH v2 00/10] " Nguyễn Thái Ngọc Duy
` (5 preceding siblings ...)
2012-03-04 12:59 ` [PATCH v2 06/10] index-pack: split second pass obj handling into own function Nguyễn Thái Ngọc Duy
@ 2012-03-04 12:59 ` Nguyễn Thái Ngọc Duy
2012-03-04 12:59 ` [PATCH v2 08/10] pack-check: do not unpack blobs Nguyễn Thái Ngọc Duy
` (14 subsequent siblings)
21 siblings, 0 replies; 48+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-03-04 12:59 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
This command unpacks every non-delta objects in order to:
1. calculate sha-1
2. do byte-to-byte sha-1 collision test if we happen to have objects
with the same sha-1
3. validate object content in strict mode
All this requires the entire object to stay in memory, a bad news for
giant blobs. This patch lowers memory consumption by not saving the
object in memory whenever possible, calculating SHA-1 while unpacking
the object.
This patch assumes that the collision test is rarely needed. The
collision test will be done later in second pass if necessary, which
puts the entire object back to memory again (We could even do the
collision test without putting the entire object back in memory, by
comparing as we unpack it).
In strict mode, it always keeps non-blob objects in memory for
validation (blobs do not need data validation).
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/index-pack.c | 64 +++++++++++++++++++++++++++++++++++++++----------
t/t1050-large.sh | 4 +-
2 files changed, 53 insertions(+), 15 deletions(-)
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 918684f..db27133 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -276,30 +276,60 @@ static void unlink_base_data(struct base_data *c)
free_base_data(c);
}
-static void *unpack_entry_data(unsigned long offset, unsigned long size)
+static void *unpack_entry_data(unsigned long offset, unsigned long size,
+ enum object_type type, unsigned char *sha1)
{
+ static char fixed_buf[8192];
int status;
git_zstream stream;
- void *buf = xmalloc(size);
+ void *buf;
+ git_SHA_CTX c;
+
+ if (sha1) { /* do hash_sha1_file internally */
+ char hdr[32];
+ int hdrlen = sprintf(hdr, "%s %lu", typename(type), size)+1;
+ git_SHA1_Init(&c);
+ git_SHA1_Update(&c, hdr, hdrlen);
+
+ buf = fixed_buf;
+ } else {
+ buf = xmalloc(size);
+ }
memset(&stream, 0, sizeof(stream));
git_inflate_init(&stream);
stream.next_out = buf;
- stream.avail_out = size;
+ stream.avail_out = buf == fixed_buf ? sizeof(fixed_buf) : size;
do {
stream.next_in = fill(1);
stream.avail_in = input_len;
status = git_inflate(&stream, 0);
use(input_len - stream.avail_in);
+ if (sha1) {
+ git_SHA1_Update(&c, buf, stream.next_out - (unsigned char *)buf);
+ stream.next_out = buf;
+ stream.avail_out = sizeof(fixed_buf);
+ }
} while (status == Z_OK);
if (stream.total_out != size || status != Z_STREAM_END)
bad_object(offset, "inflate returned %d", status);
git_inflate_end(&stream);
+ if (sha1) {
+ git_SHA1_Final(sha1, &c);
+ buf = NULL;
+ }
return buf;
}
-static void *unpack_raw_entry(struct object_entry *obj, union delta_base *delta_base)
+static int is_delta_type(enum object_type type)
+{
+ return (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA);
+}
+
+static void *unpack_raw_entry(struct object_entry *obj,
+ union delta_base *delta_base,
+ unsigned char *sha1)
{
unsigned char *p;
unsigned long size, c;
@@ -359,7 +389,9 @@ static void *unpack_raw_entry(struct object_entry *obj, union delta_base *delta_
}
obj->hdr_size = consumed_bytes - obj->idx.offset;
- data = unpack_entry_data(obj->idx.offset, obj->size);
+ if (is_delta_type(obj->type) || strict)
+ sha1 = NULL; /* save unpacked object */
+ data = unpack_entry_data(obj->idx.offset, obj->size, obj->type, sha1);
obj->idx.crc32 = input_crc32;
return data;
}
@@ -460,8 +492,9 @@ static void find_delta_children(const union delta_base *base,
static void sha1_object(const void *data, unsigned long size,
enum object_type type, unsigned char *sha1)
{
- hash_sha1_file(data, size, typename(type), sha1);
- if (has_sha1_file(sha1)) {
+ if (data)
+ hash_sha1_file(data, size, typename(type), sha1);
+ if (data && has_sha1_file(sha1)) {
void *has_data;
enum object_type has_type;
unsigned long has_size;
@@ -510,11 +543,6 @@ static void sha1_object(const void *data, unsigned long size,
}
}
-static int is_delta_type(enum object_type type)
-{
- return (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA);
-}
-
/*
* This function is part of find_unresolved_deltas(). There are two
* walkers going in the opposite ways.
@@ -689,10 +717,20 @@ static int compare_delta_entry(const void *a, const void *b)
* - if used as a base, uncompress the object and apply all deltas,
* recursively checking if the resulting object is used as a base
* for some more deltas.
+ * - if the same object exists in repository and we're not in strict
+ * mode, we skipped the sha-1 collision test in the first pass.
+ * Do it now.
*/
static void second_pass(struct object_entry *obj)
{
struct base_data *base_obj = alloc_base_data();
+
+ if (!strict && has_sha1_file(obj->idx.sha1)) {
+ void *data = get_data_from_pack(obj);
+ sha1_object(data, obj->size, obj->type, obj->idx.sha1);
+ free(data);
+ }
+
base_obj->obj = obj;
base_obj->data = NULL;
find_unresolved_deltas(base_obj);
@@ -718,7 +756,7 @@ static void parse_pack_objects(unsigned char *sha1)
nr_objects);
for (i = 0; i < nr_objects; i++) {
struct object_entry *obj = &objects[i];
- void *data = unpack_raw_entry(obj, &delta->base);
+ void *data = unpack_raw_entry(obj, &delta->base, obj->idx.sha1);
obj->real_type = obj->type;
if (is_delta_type(obj->type)) {
nr_deltas++;
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index 66acb3b..7e78c72 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -123,7 +123,7 @@ test_expect_success 'git-show a large file' '
'
-test_expect_failure 'clone' '
+test_expect_success 'clone' '
git clone -n file://"$PWD"/.git new &&
(
cd new &&
@@ -132,7 +132,7 @@ test_expect_failure 'clone' '
)
'
-test_expect_failure 'fetch updates' '
+test_expect_success 'fetch updates' '
echo modified >> large1 &&
git commit -q -a -m updated &&
(
--
1.7.8.36.g69ee2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v2 08/10] pack-check: do not unpack blobs
2012-03-04 12:59 ` [PATCH v2 00/10] " Nguyễn Thái Ngọc Duy
` (6 preceding siblings ...)
2012-03-04 12:59 ` [PATCH v2 07/10] index-pack: reduce memory usage when the pack has large blobs Nguyễn Thái Ngọc Duy
@ 2012-03-04 12:59 ` Nguyễn Thái Ngọc Duy
2012-03-04 12:59 ` [PATCH v2 09/10] archive: support streaming large files to a tar archive Nguyễn Thái Ngọc Duy
` (13 subsequent siblings)
21 siblings, 0 replies; 48+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-03-04 12:59 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
blob content is not used by verify_pack caller (currently only fsck),
we only need to make sure blob sha-1 signature matches its
content. unpack_entry() is taught to hash pack entry as it is
unpacked, eliminating the need to keep whole blob in memory.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
cache.h | 2 +-
fast-import.c | 2 +-
pack-check.c | 21 ++++++++++++++++++++-
sha1_file.c | 45 +++++++++++++++++++++++++++++++++++----------
t/t1050-large.sh | 2 +-
5 files changed, 58 insertions(+), 14 deletions(-)
diff --git a/cache.h b/cache.h
index e12b15f..3365f89 100644
--- a/cache.h
+++ b/cache.h
@@ -1062,7 +1062,7 @@ extern const unsigned char *nth_packed_object_sha1(struct packed_git *, uint32_t
extern off_t nth_packed_object_offset(const struct packed_git *, uint32_t);
extern off_t find_pack_entry_one(const unsigned char *, struct packed_git *);
extern int is_pack_valid(struct packed_git *);
-extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, unsigned long *);
+extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, unsigned long *, unsigned char *);
extern unsigned long unpack_object_header_buffer(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep);
extern unsigned long get_size_from_delta(struct packed_git *, struct pack_window **, off_t);
extern int unpack_object_header(struct packed_git *, struct pack_window **, off_t *, unsigned long *);
diff --git a/fast-import.c b/fast-import.c
index 6cd19e5..5e94a64 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1303,7 +1303,7 @@ static void *gfi_unpack_entry(
*/
p->pack_size = pack_size + 20;
}
- return unpack_entry(p, oe->idx.offset, &type, sizep);
+ return unpack_entry(p, oe->idx.offset, &type, sizep, NULL);
}
static const char *get_mode(const char *str, uint16_t *modep)
diff --git a/pack-check.c b/pack-check.c
index 63a595c..1920bdb 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -105,6 +105,7 @@ static int verify_packfile(struct packed_git *p,
void *data;
enum object_type type;
unsigned long size;
+ off_t curpos = entries[i].offset;
if (p->index_version > 1) {
off_t offset = entries[i].offset;
@@ -116,7 +117,25 @@ static int verify_packfile(struct packed_git *p,
sha1_to_hex(entries[i].sha1),
p->pack_name, (uintmax_t)offset);
}
- data = unpack_entry(p, entries[i].offset, &type, &size);
+ type = unpack_object_header(p, w_curs, &curpos, &size);
+ unuse_pack(w_curs);
+ if (type == OBJ_BLOB) {
+ unsigned char sha1[20];
+ data = unpack_entry(p, entries[i].offset, &type, &size, sha1);
+ if (!data) {
+ if (hashcmp(entries[i].sha1, sha1))
+ err = error("packed %s from %s is corrupt",
+ sha1_to_hex(entries[i].sha1), p->pack_name);
+ else if (fn) {
+ int eaten = 0;
+ fn(entries[i].sha1, type, size, NULL, &eaten);
+ }
+ if (((base_count + i) & 1023) == 0)
+ display_progress(progress, base_count + i);
+ continue;
+ }
+ }
+ data = unpack_entry(p, entries[i].offset, &type, &size, NULL);
if (!data)
err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
sha1_to_hex(entries[i].sha1), p->pack_name,
diff --git a/sha1_file.c b/sha1_file.c
index a77ef0a..d68a5b0 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1653,28 +1653,51 @@ static int packed_object_info(struct packed_git *p, off_t obj_offset,
}
static void *unpack_compressed_entry(struct packed_git *p,
- struct pack_window **w_curs,
- off_t curpos,
- unsigned long size)
+ struct pack_window **w_curs,
+ off_t curpos,
+ unsigned long size,
+ enum object_type type,
+ unsigned char *sha1)
{
+ static unsigned char fixed_buf[8192];
int st;
git_zstream stream;
unsigned char *buffer, *in;
+ git_SHA_CTX c;
+
+ if (sha1) { /* do hash_sha1_file internally */
+ char hdr[32];
+ int hdrlen = sprintf(hdr, "%s %lu", typename(type), size)+1;
+ git_SHA1_Init(&c);
+ git_SHA1_Update(&c, hdr, hdrlen);
+
+ buffer = fixed_buf;
+ } else {
+ buffer = xmallocz(size);
+ }
- buffer = xmallocz(size);
memset(&stream, 0, sizeof(stream));
stream.next_out = buffer;
- stream.avail_out = size + 1;
+ stream.avail_out = buffer == fixed_buf ? sizeof(fixed_buf) : size + 1;
git_inflate_init(&stream);
do {
in = use_pack(p, w_curs, curpos, &stream.avail_in);
stream.next_in = in;
st = git_inflate(&stream, Z_FINISH);
- if (!stream.avail_out)
+ if (sha1) {
+ git_SHA1_Update(&c, buffer, stream.next_out - (unsigned char *)buffer);
+ stream.next_out = buffer;
+ stream.avail_out = sizeof(fixed_buf);
+ }
+ else if (!stream.avail_out)
break; /* the payload is larger than it should be */
curpos += stream.next_in - in;
} while (st == Z_OK || st == Z_BUF_ERROR);
+ if (sha1) {
+ git_SHA1_Final(sha1, &c);
+ buffer = NULL;
+ }
git_inflate_end(&stream);
if ((st != Z_STREAM_END) || stream.total_out != size) {
free(buffer);
@@ -1727,7 +1750,7 @@ static void *cache_or_unpack_entry(struct packed_git *p, off_t base_offset,
ret = ent->data;
if (!ret || ent->p != p || ent->base_offset != base_offset)
- return unpack_entry(p, base_offset, type, base_size);
+ return unpack_entry(p, base_offset, type, base_size, NULL);
if (!keep_cache) {
ent->data = NULL;
@@ -1844,7 +1867,7 @@ static void *unpack_delta_entry(struct packed_git *p,
return NULL;
}
- delta_data = unpack_compressed_entry(p, w_curs, curpos, delta_size);
+ delta_data = unpack_compressed_entry(p, w_curs, curpos, delta_size, OBJ_NONE, NULL);
if (!delta_data) {
error("failed to unpack compressed delta "
"at offset %"PRIuMAX" from %s",
@@ -1883,7 +1906,8 @@ static void write_pack_access_log(struct packed_git *p, off_t obj_offset)
int do_check_packed_object_crc;
void *unpack_entry(struct packed_git *p, off_t obj_offset,
- enum object_type *type, unsigned long *sizep)
+ enum object_type *type, unsigned long *sizep,
+ unsigned char *sha1)
{
struct pack_window *w_curs = NULL;
off_t curpos = obj_offset;
@@ -1917,7 +1941,8 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
case OBJ_TREE:
case OBJ_BLOB:
case OBJ_TAG:
- data = unpack_compressed_entry(p, &w_curs, curpos, *sizep);
+ data = unpack_compressed_entry(p, &w_curs, curpos,
+ *sizep, *type, sha1);
break;
default:
data = NULL;
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index 7e78c72..c749ecb 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -141,7 +141,7 @@ test_expect_success 'fetch updates' '
)
'
-test_expect_failure 'fsck' '
+test_expect_success 'fsck' '
git fsck --full
'
--
1.7.8.36.g69ee2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v2 09/10] archive: support streaming large files to a tar archive
2012-03-04 12:59 ` [PATCH v2 00/10] " Nguyễn Thái Ngọc Duy
` (7 preceding siblings ...)
2012-03-04 12:59 ` [PATCH v2 08/10] pack-check: do not unpack blobs Nguyễn Thái Ngọc Duy
@ 2012-03-04 12:59 ` Nguyễn Thái Ngọc Duy
2012-03-04 12:59 ` [PATCH v2 10/10] fsck: use streaming interface for writing lost-found blobs Nguyễn Thái Ngọc Duy
` (12 subsequent siblings)
21 siblings, 0 replies; 48+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-03-04 12:59 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
archive-tar.c | 35 ++++++++++++++++++++++++++++-------
archive-zip.c | 9 +++++----
archive.c | 51 ++++++++++++++++++++++++++++++++++-----------------
archive.h | 11 +++++++++--
t/t1050-large.sh | 2 +-
5 files changed, 77 insertions(+), 31 deletions(-)
diff --git a/archive-tar.c b/archive-tar.c
index 20af005..5bffe49 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -5,6 +5,7 @@
#include "tar.h"
#include "archive.h"
#include "run-command.h"
+#include "streaming.h"
#define RECORDSIZE (512)
#define BLOCKSIZE (RECORDSIZE * 20)
@@ -123,9 +124,29 @@ static size_t get_path_prefix(const char *path, size_t pathlen, size_t maxlen)
return i;
}
+static void write_file(struct git_istream *stream, const void *buffer,
+ unsigned long size)
+{
+ if (!stream) {
+ write_blocked(buffer, size);
+ return;
+ }
+ for (;;) {
+ char buf[1024 * 16];
+ ssize_t readlen;
+
+ readlen = read_istream(stream, buf, sizeof(buf));
+
+ if (!readlen)
+ break;
+ write_blocked(buf, readlen);
+ }
+}
+
static int write_tar_entry(struct archiver_args *args,
- const unsigned char *sha1, const char *path, size_t pathlen,
- unsigned int mode, void *buffer, unsigned long size)
+ const unsigned char *sha1, const char *path,
+ size_t pathlen, unsigned int mode, void *buffer,
+ struct git_istream *stream, unsigned long size)
{
struct ustar_header header;
struct strbuf ext_header = STRBUF_INIT;
@@ -200,14 +221,14 @@ static int write_tar_entry(struct archiver_args *args,
if (ext_header.len > 0) {
err = write_tar_entry(args, sha1, NULL, 0, 0, ext_header.buf,
- ext_header.len);
+ NULL, ext_header.len);
if (err)
return err;
}
strbuf_release(&ext_header);
write_blocked(&header, sizeof(header));
- if (S_ISREG(mode) && buffer && size > 0)
- write_blocked(buffer, size);
+ if (S_ISREG(mode) && size > 0)
+ write_file(stream, buffer, size);
return err;
}
@@ -219,7 +240,7 @@ static int write_global_extended_header(struct archiver_args *args)
strbuf_append_ext_header(&ext_header, "comment", sha1_to_hex(sha1), 40);
err = write_tar_entry(args, NULL, NULL, 0, 0, ext_header.buf,
- ext_header.len);
+ NULL, ext_header.len);
strbuf_release(&ext_header);
return err;
}
@@ -308,7 +329,7 @@ static int write_tar_archive(const struct archiver *ar,
if (args->commit_sha1)
err = write_global_extended_header(args);
if (!err)
- err = write_archive_entries(args, write_tar_entry);
+ err = write_archive_entries(args, write_tar_entry, 1);
if (!err)
write_trailer();
return err;
diff --git a/archive-zip.c b/archive-zip.c
index 02d1f37..4a1e917 100644
--- a/archive-zip.c
+++ b/archive-zip.c
@@ -120,9 +120,10 @@ static void *zlib_deflate(void *data, unsigned long size,
return buffer;
}
-static int write_zip_entry(struct archiver_args *args,
- const unsigned char *sha1, const char *path, size_t pathlen,
- unsigned int mode, void *buffer, unsigned long size)
+int write_zip_entry(struct archiver_args *args,
+ const unsigned char *sha1, const char *path,
+ size_t pathlen, unsigned int mode, void *buffer,
+ struct git_istream *stream, unsigned long size)
{
struct zip_local_header header;
struct zip_dir_header dirent;
@@ -271,7 +272,7 @@ static int write_zip_archive(const struct archiver *ar,
zip_dir = xmalloc(ZIP_DIRECTORY_MIN_SIZE);
zip_dir_size = ZIP_DIRECTORY_MIN_SIZE;
- err = write_archive_entries(args, write_zip_entry);
+ err = write_archive_entries(args, write_zip_entry, 0);
if (!err)
write_zip_trailer(args->commit_sha1);
diff --git a/archive.c b/archive.c
index 1ee837d..257eadf 100644
--- a/archive.c
+++ b/archive.c
@@ -5,6 +5,7 @@
#include "archive.h"
#include "parse-options.h"
#include "unpack-trees.h"
+#include "streaming.h"
static char const * const archive_usage[] = {
"git archive [options] <tree-ish> [<path>...]",
@@ -59,26 +60,35 @@ static void format_subst(const struct commit *commit,
free(to_free);
}
-static void *sha1_file_to_archive(const char *path, const unsigned char *sha1,
- unsigned int mode, enum object_type *type,
- unsigned long *sizep, const struct commit *commit)
+void sha1_file_to_archive(void **buffer, struct git_istream **stream,
+ const char *path, const unsigned char *sha1,
+ unsigned int mode, enum object_type *type,
+ unsigned long *sizep,
+ const struct commit *commit)
{
- void *buffer;
+ if (stream) {
+ struct stream_filter *filter;
+ filter = get_stream_filter(path, sha1);
+ if (!commit && S_ISREG(mode) && is_null_stream_filter(filter)) {
+ *buffer = NULL;
+ *stream = open_istream(sha1, type, sizep, NULL);
+ return;
+ }
+ *stream = NULL;
+ }
- buffer = read_sha1_file(sha1, type, sizep);
- if (buffer && S_ISREG(mode)) {
+ *buffer = read_sha1_file(sha1, type, sizep);
+ if (*buffer && S_ISREG(mode)) {
struct strbuf buf = STRBUF_INIT;
size_t size = 0;
- strbuf_attach(&buf, buffer, *sizep, *sizep + 1);
+ strbuf_attach(&buf, *buffer, *sizep, *sizep + 1);
convert_to_working_tree(path, buf.buf, buf.len, &buf);
if (commit)
format_subst(commit, buf.buf, buf.len, &buf);
- buffer = strbuf_detach(&buf, &size);
+ *buffer = strbuf_detach(&buf, &size);
*sizep = size;
}
-
- return buffer;
}
static void setup_archive_check(struct git_attr_check *check)
@@ -97,6 +107,7 @@ static void setup_archive_check(struct git_attr_check *check)
struct archiver_context {
struct archiver_args *args;
write_archive_entry_fn_t write_entry;
+ int stream_ok;
};
static int write_archive_entry(const unsigned char *sha1, const char *base,
@@ -109,6 +120,7 @@ static int write_archive_entry(const unsigned char *sha1, const char *base,
write_archive_entry_fn_t write_entry = c->write_entry;
struct git_attr_check check[2];
const char *path_without_prefix;
+ struct git_istream *stream = NULL;
int convert = 0;
int err;
enum object_type type;
@@ -133,25 +145,29 @@ static int write_archive_entry(const unsigned char *sha1, const char *base,
strbuf_addch(&path, '/');
if (args->verbose)
fprintf(stderr, "%.*s\n", (int)path.len, path.buf);
- err = write_entry(args, sha1, path.buf, path.len, mode, NULL, 0);
+ err = write_entry(args, sha1, path.buf, path.len, mode, NULL, NULL, 0);
if (err)
return err;
return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0);
}
- buffer = sha1_file_to_archive(path_without_prefix, sha1, mode,
- &type, &size, convert ? args->commit : NULL);
- if (!buffer)
+ sha1_file_to_archive(&buffer, c->stream_ok ? &stream : NULL,
+ path_without_prefix, sha1, mode,
+ &type, &size, convert ? args->commit : NULL);
+ if (!buffer && !stream)
return error("cannot read %s", sha1_to_hex(sha1));
if (args->verbose)
fprintf(stderr, "%.*s\n", (int)path.len, path.buf);
- err = write_entry(args, sha1, path.buf, path.len, mode, buffer, size);
+ err = write_entry(args, sha1, path.buf, path.len, mode, buffer, stream, size);
+ if (stream)
+ close_istream(stream);
free(buffer);
return err;
}
int write_archive_entries(struct archiver_args *args,
- write_archive_entry_fn_t write_entry)
+ write_archive_entry_fn_t write_entry,
+ int stream_ok)
{
struct archiver_context context;
struct unpack_trees_options opts;
@@ -167,13 +183,14 @@ int write_archive_entries(struct archiver_args *args,
if (args->verbose)
fprintf(stderr, "%.*s\n", (int)len, args->base);
err = write_entry(args, args->tree->object.sha1, args->base,
- len, 040777, NULL, 0);
+ len, 040777, NULL, NULL, 0);
if (err)
return err;
}
context.args = args;
context.write_entry = write_entry;
+ context.stream_ok = stream_ok;
/*
* Setup index and instruct attr to read index only
diff --git a/archive.h b/archive.h
index 2b0884f..370cca9 100644
--- a/archive.h
+++ b/archive.h
@@ -27,9 +27,16 @@ extern void register_archiver(struct archiver *);
extern void init_tar_archiver(void);
extern void init_zip_archiver(void);
-typedef int (*write_archive_entry_fn_t)(struct archiver_args *args, const unsigned char *sha1, const char *path, size_t pathlen, unsigned int mode, void *buffer, unsigned long size);
+struct git_istream;
+typedef int (*write_archive_entry_fn_t)(struct archiver_args *args,
+ const unsigned char *sha1,
+ const char *path, size_t pathlen,
+ unsigned int mode,
+ void *buffer,
+ struct git_istream *stream,
+ unsigned long size);
-extern int write_archive_entries(struct archiver_args *args, write_archive_entry_fn_t write_entry);
+extern int write_archive_entries(struct archiver_args *args, write_archive_entry_fn_t write_entry, int stream_ok);
extern int write_archive(int argc, const char **argv, const char *prefix, int setup_prefix, const char *name_hint, int remote);
const char *archive_format_from_filename(const char *filename);
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index c749ecb..1e64692 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -149,7 +149,7 @@ test_expect_success 'repack' '
git repack -ad
'
-test_expect_failure 'tar achiving' '
+test_expect_success 'tar achiving' '
git archive --format=tar HEAD >/dev/null
'
--
1.7.8.36.g69ee2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v2 10/10] fsck: use streaming interface for writing lost-found blobs
2012-03-04 12:59 ` [PATCH v2 00/10] " Nguyễn Thái Ngọc Duy
` (8 preceding siblings ...)
2012-03-04 12:59 ` [PATCH v2 09/10] archive: support streaming large files to a tar archive Nguyễn Thái Ngọc Duy
@ 2012-03-04 12:59 ` Nguyễn Thái Ngọc Duy
2012-03-05 3:43 ` [PATCH v3 00/11] Large blob fixes Nguyễn Thái Ngọc Duy
` (11 subsequent siblings)
21 siblings, 0 replies; 48+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-03-04 12:59 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/fsck.c | 8 ++------
1 files changed, 2 insertions(+), 6 deletions(-)
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 8c479a7..7fcb33e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -12,6 +12,7 @@
#include "parse-options.h"
#include "dir.h"
#include "progress.h"
+#include "streaming.h"
#define REACHABLE 0x0001
#define SEEN 0x0002
@@ -236,13 +237,8 @@ static void check_unreachable_object(struct object *obj)
if (!(f = fopen(filename, "w")))
die_errno("Could not open '%s'", filename);
if (obj->type == OBJ_BLOB) {
- enum object_type type;
- unsigned long size;
- char *buf = read_sha1_file(obj->sha1,
- &type, &size);
- if (buf && fwrite(buf, 1, size, f) != size)
+ if (stream_blob_to_fd(fileno(f), obj->sha1, NULL, 1))
die_errno("Could not write '%s'", filename);
- free(buf);
} else
fprintf(f, "%s\n", sha1_to_hex(obj->sha1));
if (fclose(f))
--
1.7.8.36.g69ee2
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v3 00/11] Large blob fixes
2012-03-04 12:59 ` [PATCH v2 00/10] " Nguyễn Thái Ngọc Duy
` (9 preceding siblings ...)
2012-03-04 12:59 ` [PATCH v2 10/10] fsck: use streaming interface for writing lost-found blobs Nguyễn Thái Ngọc Duy
@ 2012-03-05 3:43 ` Nguyễn Thái Ngọc Duy
2012-03-05 3:43 ` [PATCH v3 01/11] Add more large blob test cases Nguyễn Thái Ngọc Duy
` (10 subsequent siblings)
21 siblings, 0 replies; 48+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-03-05 3:43 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
Changes from v2:
- set core.bigfilethreshold globally in t1050 to make git-clone happy
because there's currently no way to specify this in git-clone (or
is there?)
- fix the bad coding taste in builtin/cat-file.c
- make update-server-info respect core.bigfilethreshold,
which makes repack pass on repositories that have tags
Junio C Hamano (1):
streaming: make streaming-write-entry to be more reusable
Nguyễn Thái Ngọc Duy (10):
Add more large blob test cases
cat-file: use streaming interface to print blobs
parse_object: special code path for blobs to avoid putting whole
object in memory
show: use streaming interface for showing blobs
index-pack: split second pass obj handling into own function
index-pack: reduce memory usage when the pack has large blobs
pack-check: do not unpack blobs
archive: support streaming large files to a tar archive
fsck: use streaming interface for writing lost-found blobs
update-server-info: respect core.bigfilethreshold
archive-tar.c | 35 ++++++++++++---
archive-zip.c | 9 ++--
archive.c | 51 +++++++++++++++-------
archive.h | 11 ++++-
builtin/cat-file.c | 24 +++++++++++
builtin/fsck.c | 8 +---
builtin/index-pack.c | 95 ++++++++++++++++++++++++++++++-----------
builtin/log.c | 34 +++++++++------
builtin/update-server-info.c | 1 +
cache.h | 2 +-
entry.c | 53 ++---------------------
fast-import.c | 2 +-
object.c | 11 +++++
pack-check.c | 21 +++++++++-
sha1_file.c | 78 +++++++++++++++++++++++++++++-----
streaming.c | 55 ++++++++++++++++++++++++
streaming.h | 2 +
t/t1050-large.sh | 63 +++++++++++++++++++++++++++-
wrapper.c | 27 +++++++++++-
19 files changed, 439 insertions(+), 143 deletions(-)
--
1.7.3.1.256.g2539c.dirty
^ permalink raw reply [flat|nested] 48+ messages in thread
* [PATCH v3 01/11] Add more large blob test cases
2012-03-04 12:59 ` [PATCH v2 00/10] " Nguyễn Thái Ngọc Duy
` (10 preceding siblings ...)
2012-03-05 3:43 ` [PATCH v3 00/11] Large blob fixes Nguyễn Thái Ngọc Duy
@ 2012-03-05 3:43 ` Nguyễn Thái Ngọc Duy
2012-03-05 3:43 ` [PATCH v3 02/11] streaming: make streaming-write-entry to be more reusable Nguyễn Thái Ngọc Duy
` (9 subsequent siblings)
21 siblings, 0 replies; 48+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-03-05 3:43 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
New test cases list commands that should work when memory is
limited. All memory allocation functions (*) learn to reject any
allocation larger than $GIT_ALLOC_LIMIT if set.
(*) Not exactly all. Some places do not use x* functions, but
malloc/calloc directly, notably diff-delta. These code path should
never be run on large blobs.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
t/t1050-large.sh | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
wrapper.c | 27 ++++++++++++++++++++--
2 files changed, 85 insertions(+), 5 deletions(-)
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index 29d6024..80f157a 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -6,11 +6,15 @@ test_description='adding and checking out large blobs'
. ./test-lib.sh
test_expect_success setup '
- git config core.bigfilethreshold 200k &&
+ # clone does not allow us to pass core.bigfilethreshold to
+ # new repos, so set core.bigfilethreshold globally
+ git config --global core.bigfilethreshold 200k &&
echo X | dd of=large1 bs=1k seek=2000 &&
echo X | dd of=large2 bs=1k seek=2000 &&
echo X | dd of=large3 bs=1k seek=2000 &&
- echo Y | dd of=huge bs=1k seek=2500
+ echo Y | dd of=huge bs=1k seek=2500 &&
+ GIT_ALLOC_LIMIT=1500 &&
+ export GIT_ALLOC_LIMIT
'
test_expect_success 'add a large file or two' '
@@ -100,4 +104,59 @@ test_expect_success 'packsize limit' '
)
'
+test_expect_success 'diff --raw' '
+ git commit -q -m initial &&
+ echo modified >>large1 &&
+ git add large1 &&
+ git commit -q -m modified &&
+ git diff --raw HEAD^
+'
+
+test_expect_success 'hash-object' '
+ git hash-object large1
+'
+
+test_expect_failure 'cat-file a large file' '
+ git cat-file blob :large1 >/dev/null
+'
+
+test_expect_failure 'cat-file a large file from a tag' '
+ git tag -m largefile largefiletag :large1 &&
+ git cat-file blob largefiletag >/dev/null
+'
+
+test_expect_failure 'git-show a large file' '
+ git show :large1 >/dev/null
+
+'
+
+test_expect_failure 'clone' '
+ git clone file://"$PWD"/.git new
+'
+
+test_expect_failure 'fetch updates' '
+ echo modified >> large1 &&
+ git commit -q -a -m updated &&
+ (
+ cd new &&
+ git fetch --keep # FIXME should not need --keep
+ )
+'
+
+test_expect_failure 'fsck' '
+ git fsck --full
+'
+
+test_expect_failure 'repack' '
+ git repack -ad
+'
+
+test_expect_failure 'tar achiving' '
+ git archive --format=tar HEAD >/dev/null
+'
+
+test_expect_failure 'zip achiving' '
+ git archive --format=zip HEAD >/dev/null
+'
+
test_done
diff --git a/wrapper.c b/wrapper.c
index 85f09df..d4c0972 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -9,6 +9,18 @@ static void do_nothing(size_t size)
static void (*try_to_free_routine)(size_t size) = do_nothing;
+static void memory_limit_check(size_t size)
+{
+ static int limit = -1;
+ if (limit == -1) {
+ const char *env = getenv("GIT_ALLOC_LIMIT");
+ limit = env ? atoi(env) * 1024 : 0;
+ }
+ if (limit && size > limit)
+ die("attempting to allocate %d over limit %d",
+ size, limit);
+}
+
try_to_free_t set_try_to_free_routine(try_to_free_t routine)
{
try_to_free_t old = try_to_free_routine;
@@ -32,7 +44,10 @@ char *xstrdup(const char *str)
void *xmalloc(size_t size)
{
- void *ret = malloc(size);
+ void *ret;
+
+ memory_limit_check(size);
+ ret = malloc(size);
if (!ret && !size)
ret = malloc(1);
if (!ret) {
@@ -79,7 +94,10 @@ char *xstrndup(const char *str, size_t len)
void *xrealloc(void *ptr, size_t size)
{
- void *ret = realloc(ptr, size);
+ void *ret;
+
+ memory_limit_check(size);
+ ret = realloc(ptr, size);
if (!ret && !size)
ret = realloc(ptr, 1);
if (!ret) {
@@ -95,7 +113,10 @@ void *xrealloc(void *ptr, size_t size)
void *xcalloc(size_t nmemb, size_t size)
{
- void *ret = calloc(nmemb, size);
+ void *ret;
+
+ memory_limit_check(size * nmemb);
+ ret = calloc(nmemb, size);
if (!ret && (!nmemb || !size))
ret = calloc(1, 1);
if (!ret) {
--
1.7.3.1.256.g2539c.dirty
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v3 02/11] streaming: make streaming-write-entry to be more reusable
2012-03-04 12:59 ` [PATCH v2 00/10] " Nguyễn Thái Ngọc Duy
` (11 preceding siblings ...)
2012-03-05 3:43 ` [PATCH v3 01/11] Add more large blob test cases Nguyễn Thái Ngọc Duy
@ 2012-03-05 3:43 ` Nguyễn Thái Ngọc Duy
2012-03-05 3:43 ` [PATCH v3 03/11] cat-file: use streaming interface to print blobs Nguyễn Thái Ngọc Duy
` (8 subsequent siblings)
21 siblings, 0 replies; 48+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-03-05 3:43 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
From: Junio C Hamano <gitster@pobox.com>
The static function in entry.c takes a cache entry and streams its blob
contents to a file in the working tree. Refactor the logic to a new API
function stream_blob_to_fd() that takes an object name and an open file
descriptor, so that it can be reused by other callers.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
entry.c | 53 +++++------------------------------------------------
streaming.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
streaming.h | 2 ++
3 files changed, 62 insertions(+), 48 deletions(-)
diff --git a/entry.c b/entry.c
index 852fea1..17a6bcc 100644
--- a/entry.c
+++ b/entry.c
@@ -120,58 +120,15 @@ static int streaming_write_entry(struct cache_entry *ce, char *path,
const struct checkout *state, int to_tempfile,
int *fstat_done, struct stat *statbuf)
{
- struct git_istream *st;
- enum object_type type;
- unsigned long sz;
int result = -1;
- ssize_t kept = 0;
- int fd = -1;
-
- st = open_istream(ce->sha1, &type, &sz, filter);
- if (!st)
- return -1;
- if (type != OBJ_BLOB)
- goto close_and_exit;
+ int fd;
fd = open_output_fd(path, ce, to_tempfile);
- if (fd < 0)
- goto close_and_exit;
-
- for (;;) {
- char buf[1024 * 16];
- ssize_t wrote, holeto;
- ssize_t readlen = read_istream(st, buf, sizeof(buf));
-
- if (!readlen)
- break;
- if (sizeof(buf) == readlen) {
- for (holeto = 0; holeto < readlen; holeto++)
- if (buf[holeto])
- break;
- if (readlen == holeto) {
- kept += holeto;
- continue;
- }
- }
-
- if (kept && lseek(fd, kept, SEEK_CUR) == (off_t) -1)
- goto close_and_exit;
- else
- kept = 0;
- wrote = write_in_full(fd, buf, readlen);
-
- if (wrote != readlen)
- goto close_and_exit;
- }
- if (kept && (lseek(fd, kept - 1, SEEK_CUR) == (off_t) -1 ||
- write(fd, "", 1) != 1))
- goto close_and_exit;
- *fstat_done = fstat_output(fd, state, statbuf);
-
-close_and_exit:
- close_istream(st);
- if (0 <= fd)
+ if (0 <= fd) {
+ result = stream_blob_to_fd(fd, ce->sha1, filter, 1);
+ *fstat_done = fstat_output(fd, state, statbuf);
result = close(fd);
+ }
if (result && 0 <= fd)
unlink(path);
return result;
diff --git a/streaming.c b/streaming.c
index 71072e1..7e7ee2b 100644
--- a/streaming.c
+++ b/streaming.c
@@ -489,3 +489,58 @@ static open_method_decl(incore)
return st->u.incore.buf ? 0 : -1;
}
+
+
+/****************************************************************
+ * Users of streaming interface
+ ****************************************************************/
+
+int stream_blob_to_fd(int fd, unsigned const char *sha1, struct stream_filter *filter,
+ int can_seek)
+{
+ struct git_istream *st;
+ enum object_type type;
+ unsigned long sz;
+ ssize_t kept = 0;
+ int result = -1;
+
+ st = open_istream(sha1, &type, &sz, filter);
+ if (!st)
+ return result;
+ if (type != OBJ_BLOB)
+ goto close_and_exit;
+ for (;;) {
+ char buf[1024 * 16];
+ ssize_t wrote, holeto;
+ ssize_t readlen = read_istream(st, buf, sizeof(buf));
+
+ if (!readlen)
+ break;
+ if (can_seek && sizeof(buf) == readlen) {
+ for (holeto = 0; holeto < readlen; holeto++)
+ if (buf[holeto])
+ break;
+ if (readlen == holeto) {
+ kept += holeto;
+ continue;
+ }
+ }
+
+ if (kept && lseek(fd, kept, SEEK_CUR) == (off_t) -1)
+ goto close_and_exit;
+ else
+ kept = 0;
+ wrote = write_in_full(fd, buf, readlen);
+
+ if (wrote != readlen)
+ goto close_and_exit;
+ }
+ if (kept && (lseek(fd, kept - 1, SEEK_CUR) == (off_t) -1 ||
+ write(fd, "", 1) != 1))
+ goto close_and_exit;
+ result = 0;
+
+ close_and_exit:
+ close_istream(st);
+ return result;
+}
diff --git a/streaming.h b/streaming.h
index 589e857..3e82770 100644
--- a/streaming.h
+++ b/streaming.h
@@ -12,4 +12,6 @@ extern struct git_istream *open_istream(const unsigned char *, enum object_type
extern int close_istream(struct git_istream *);
extern ssize_t read_istream(struct git_istream *, char *, size_t);
+extern int stream_blob_to_fd(int fd, const unsigned char *, struct stream_filter *, int can_seek);
+
#endif /* STREAMING_H */
--
1.7.3.1.256.g2539c.dirty
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v3 03/11] cat-file: use streaming interface to print blobs
2012-03-04 12:59 ` [PATCH v2 00/10] " Nguyễn Thái Ngọc Duy
` (12 preceding siblings ...)
2012-03-05 3:43 ` [PATCH v3 02/11] streaming: make streaming-write-entry to be more reusable Nguyễn Thái Ngọc Duy
@ 2012-03-05 3:43 ` Nguyễn Thái Ngọc Duy
2012-03-05 3:43 ` [PATCH v3 04/11] parse_object: special code path for blobs to avoid putting whole object in memory Nguyễn Thái Ngọc Duy
` (7 subsequent siblings)
21 siblings, 0 replies; 48+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-03-05 3:43 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/cat-file.c | 24 ++++++++++++++++++++++++
t/t1050-large.sh | 4 ++--
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 8ed501f..ce68a20 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -11,6 +11,7 @@
#include "parse-options.h"
#include "diff.h"
#include "userdiff.h"
+#include "streaming.h"
#define BATCH 1
#define BATCH_CHECK 2
@@ -127,6 +128,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
return cmd_ls_tree(2, ls_args, NULL);
}
+ if (type == OBJ_BLOB)
+ return stream_blob_to_fd(1, sha1, NULL, 0);
buf = read_sha1_file(sha1, &type, &size);
if (!buf)
die("Cannot read object %s", obj_name);
@@ -149,6 +152,27 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
break;
case 0:
+ if (type_from_string(exp_type) == OBJ_BLOB) {
+ unsigned char blob_sha1[20];
+ if (sha1_object_info(sha1, NULL) == OBJ_TAG) {
+ enum object_type type;
+ unsigned long size;
+ char *buffer = read_sha1_file(sha1, &type, &size);
+ if (memcmp(buffer, "object ", 7) ||
+ get_sha1_hex(buffer + 7, blob_sha1))
+ die("%s not a valid tag", sha1_to_hex(sha1));
+ free(buffer);
+ } else
+ hashcpy(blob_sha1, sha1);
+
+ if (sha1_object_info(blob_sha1, NULL) == OBJ_BLOB)
+ return stream_blob_to_fd(1, blob_sha1, NULL, 0);
+ /* we attempted to dereference a tag to a blob
+ and failed, perhaps there are new dereference
+ mechanisms this code is not aware of,
+ fallthrough and let read_object_with_reference
+ deal with it */
+ }
buf = read_object_with_reference(sha1, exp_type, &size, NULL);
break;
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index 80f157a..97ad5b3 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -116,11 +116,11 @@ test_expect_success 'hash-object' '
git hash-object large1
'
-test_expect_failure 'cat-file a large file' '
+test_expect_success 'cat-file a large file' '
git cat-file blob :large1 >/dev/null
'
-test_expect_failure 'cat-file a large file from a tag' '
+test_expect_success 'cat-file a large file from a tag' '
git tag -m largefile largefiletag :large1 &&
git cat-file blob largefiletag >/dev/null
'
--
1.7.3.1.256.g2539c.dirty
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v3 04/11] parse_object: special code path for blobs to avoid putting whole object in memory
2012-03-04 12:59 ` [PATCH v2 00/10] " Nguyễn Thái Ngọc Duy
` (13 preceding siblings ...)
2012-03-05 3:43 ` [PATCH v3 03/11] cat-file: use streaming interface to print blobs Nguyễn Thái Ngọc Duy
@ 2012-03-05 3:43 ` Nguyễn Thái Ngọc Duy
2012-03-06 0:57 ` Junio C Hamano
2012-03-05 3:43 ` [PATCH v3 05/11] show: use streaming interface for showing blobs Nguyễn Thái Ngọc Duy
` (6 subsequent siblings)
21 siblings, 1 reply; 48+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-03-05 3:43 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
object.c | 11 +++++++++++
sha1_file.c | 33 ++++++++++++++++++++++++++++++++-
2 files changed, 43 insertions(+), 1 deletions(-)
diff --git a/object.c b/object.c
index 6b06297..0498b18 100644
--- a/object.c
+++ b/object.c
@@ -198,6 +198,17 @@ struct object *parse_object(const unsigned char *sha1)
if (obj && obj->parsed)
return obj;
+ if ((obj && obj->type == OBJ_BLOB) ||
+ (!obj && has_sha1_file(sha1) &&
+ sha1_object_info(sha1, NULL) == OBJ_BLOB)) {
+ if (check_sha1_signature(repl, NULL, 0, NULL) < 0) {
+ error("sha1 mismatch %s\n", sha1_to_hex(repl));
+ return NULL;
+ }
+ parse_blob_buffer(lookup_blob(sha1), NULL, 0);
+ return lookup_object(sha1);
+ }
+
buffer = read_sha1_file(sha1, &type, &size);
if (buffer) {
if (check_sha1_signature(repl, buffer, size, typename(type)) < 0) {
diff --git a/sha1_file.c b/sha1_file.c
index f9f8d5e..a77ef0a 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -19,6 +19,7 @@
#include "pack-revindex.h"
#include "sha1-lookup.h"
#include "bulk-checkin.h"
+#include "streaming.h"
#ifndef O_NOATIME
#if defined(__linux__) && (defined(__i386__) || defined(__PPC__))
@@ -1149,7 +1150,37 @@ static const struct packed_git *has_packed_and_bad(const unsigned char *sha1)
int check_sha1_signature(const unsigned char *sha1, void *map, unsigned long size, const char *type)
{
unsigned char real_sha1[20];
- hash_sha1_file(map, size, type, real_sha1);
+ enum object_type obj_type;
+ struct git_istream *st;
+ git_SHA_CTX c;
+ char hdr[32];
+ int hdrlen;
+
+ if (map) {
+ hash_sha1_file(map, size, type, real_sha1);
+ return hashcmp(sha1, real_sha1) ? -1 : 0;
+ }
+
+ st = open_istream(sha1, &obj_type, &size, NULL);
+ if (!st)
+ return -1;
+
+ /* Generate the header */
+ hdrlen = sprintf(hdr, "%s %lu", typename(obj_type), size) + 1;
+
+ /* Sha1.. */
+ git_SHA1_Init(&c);
+ git_SHA1_Update(&c, hdr, hdrlen);
+ for (;;) {
+ char buf[1024 * 16];
+ ssize_t readlen = read_istream(st, buf, sizeof(buf));
+
+ if (!readlen)
+ break;
+ git_SHA1_Update(&c, buf, readlen);
+ }
+ git_SHA1_Final(real_sha1, &c);
+ close_istream(st);
return hashcmp(sha1, real_sha1) ? -1 : 0;
}
--
1.7.3.1.256.g2539c.dirty
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v3 05/11] show: use streaming interface for showing blobs
2012-03-04 12:59 ` [PATCH v2 00/10] " Nguyễn Thái Ngọc Duy
` (14 preceding siblings ...)
2012-03-05 3:43 ` [PATCH v3 04/11] parse_object: special code path for blobs to avoid putting whole object in memory Nguyễn Thái Ngọc Duy
@ 2012-03-05 3:43 ` Nguyễn Thái Ngọc Duy
2012-03-05 3:43 ` [PATCH v3 06/11] index-pack: split second pass obj handling into own function Nguyễn Thái Ngọc Duy
` (5 subsequent siblings)
21 siblings, 0 replies; 48+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-03-05 3:43 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/log.c | 34 ++++++++++++++++++++--------------
t/t1050-large.sh | 2 +-
2 files changed, 21 insertions(+), 15 deletions(-)
diff --git a/builtin/log.c b/builtin/log.c
index 7d1f6f8..d1702e7 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -20,6 +20,7 @@
#include "string-list.h"
#include "parse-options.h"
#include "branch.h"
+#include "streaming.h"
/* Set a default date-time format for git log ("log.date" config variable) */
static const char *default_date_mode = NULL;
@@ -381,8 +382,13 @@ static void show_tagger(char *buf, int len, struct rev_info *rev)
strbuf_release(&out);
}
-static int show_object(const unsigned char *sha1, int show_tag_object,
- struct rev_info *rev)
+static int show_blob_object(const unsigned char *sha1, struct rev_info *rev)
+{
+ fflush(stdout);
+ return stream_blob_to_fd(1, sha1, NULL, 0);
+}
+
+static int show_tag_object(const unsigned char *sha1, struct rev_info *rev)
{
unsigned long size;
enum object_type type;
@@ -392,16 +398,16 @@ static int show_object(const unsigned char *sha1, int show_tag_object,
if (!buf)
return error(_("Could not read object %s"), sha1_to_hex(sha1));
- if (show_tag_object)
- while (offset < size && buf[offset] != '\n') {
- int new_offset = offset + 1;
- while (new_offset < size && buf[new_offset++] != '\n')
- ; /* do nothing */
- if (!prefixcmp(buf + offset, "tagger "))
- show_tagger(buf + offset + 7,
- new_offset - offset - 7, rev);
- offset = new_offset;
- }
+ assert(type == OBJ_TAG);
+ while (offset < size && buf[offset] != '\n') {
+ int new_offset = offset + 1;
+ while (new_offset < size && buf[new_offset++] != '\n')
+ ; /* do nothing */
+ if (!prefixcmp(buf + offset, "tagger "))
+ show_tagger(buf + offset + 7,
+ new_offset - offset - 7, rev);
+ offset = new_offset;
+ }
if (offset < size)
fwrite(buf + offset, size - offset, 1, stdout);
@@ -459,7 +465,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
const char *name = objects[i].name;
switch (o->type) {
case OBJ_BLOB:
- ret = show_object(o->sha1, 0, NULL);
+ ret = show_blob_object(o->sha1, NULL);
break;
case OBJ_TAG: {
struct tag *t = (struct tag *)o;
@@ -470,7 +476,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
diff_get_color_opt(&rev.diffopt, DIFF_COMMIT),
t->tag,
diff_get_color_opt(&rev.diffopt, DIFF_RESET));
- ret = show_object(o->sha1, 1, &rev);
+ ret = show_tag_object(o->sha1, &rev);
rev.shown_one = 1;
if (ret)
break;
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index 97ad5b3..4e08e02 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -125,7 +125,7 @@ test_expect_success 'cat-file a large file from a tag' '
git cat-file blob largefiletag >/dev/null
'
-test_expect_failure 'git-show a large file' '
+test_expect_success 'git-show a large file' '
git show :large1 >/dev/null
'
--
1.7.3.1.256.g2539c.dirty
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v3 06/11] index-pack: split second pass obj handling into own function
2012-03-04 12:59 ` [PATCH v2 00/10] " Nguyễn Thái Ngọc Duy
` (15 preceding siblings ...)
2012-03-05 3:43 ` [PATCH v3 05/11] show: use streaming interface for showing blobs Nguyễn Thái Ngọc Duy
@ 2012-03-05 3:43 ` Nguyễn Thái Ngọc Duy
2012-03-05 3:43 ` [PATCH v3 07/11] index-pack: reduce memory usage when the pack has large blobs Nguyễn Thái Ngọc Duy
` (4 subsequent siblings)
21 siblings, 0 replies; 48+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-03-05 3:43 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/index-pack.c | 31 ++++++++++++++++++-------------
1 files changed, 18 insertions(+), 13 deletions(-)
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index dd1c5c9..918684f 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -682,6 +682,23 @@ static int compare_delta_entry(const void *a, const void *b)
objects[delta_b->obj_no].type);
}
+/*
+ * Second pass:
+ * - for all non-delta objects, look if it is used as a base for
+ * deltas;
+ * - if used as a base, uncompress the object and apply all deltas,
+ * recursively checking if the resulting object is used as a base
+ * for some more deltas.
+ */
+static void second_pass(struct object_entry *obj)
+{
+ struct base_data *base_obj = alloc_base_data();
+ base_obj->obj = obj;
+ base_obj->data = NULL;
+ find_unresolved_deltas(base_obj);
+ display_progress(progress, nr_resolved_deltas);
+}
+
/* Parse all objects and return the pack content SHA1 hash */
static void parse_pack_objects(unsigned char *sha1)
{
@@ -736,26 +753,14 @@ static void parse_pack_objects(unsigned char *sha1)
qsort(deltas, nr_deltas, sizeof(struct delta_entry),
compare_delta_entry);
- /*
- * Second pass:
- * - for all non-delta objects, look if it is used as a base for
- * deltas;
- * - if used as a base, uncompress the object and apply all deltas,
- * recursively checking if the resulting object is used as a base
- * for some more deltas.
- */
if (verbose)
progress = start_progress("Resolving deltas", nr_deltas);
for (i = 0; i < nr_objects; i++) {
struct object_entry *obj = &objects[i];
- struct base_data *base_obj = alloc_base_data();
if (is_delta_type(obj->type))
continue;
- base_obj->obj = obj;
- base_obj->data = NULL;
- find_unresolved_deltas(base_obj);
- display_progress(progress, nr_resolved_deltas);
+ second_pass(obj);
}
}
--
1.7.3.1.256.g2539c.dirty
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v3 07/11] index-pack: reduce memory usage when the pack has large blobs
2012-03-04 12:59 ` [PATCH v2 00/10] " Nguyễn Thái Ngọc Duy
` (16 preceding siblings ...)
2012-03-05 3:43 ` [PATCH v3 06/11] index-pack: split second pass obj handling into own function Nguyễn Thái Ngọc Duy
@ 2012-03-05 3:43 ` Nguyễn Thái Ngọc Duy
2012-03-05 3:43 ` [PATCH v3 08/11] pack-check: do not unpack blobs Nguyễn Thái Ngọc Duy
` (3 subsequent siblings)
21 siblings, 0 replies; 48+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-03-05 3:43 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
This command unpacks every non-delta objects in order to:
1. calculate sha-1
2. do byte-to-byte sha-1 collision test if we happen to have objects
with the same sha-1
3. validate object content in strict mode
All this requires the entire object to stay in memory, a bad news for
giant blobs. This patch lowers memory consumption by not saving the
object in memory whenever possible, calculating SHA-1 while unpacking
the object.
This patch assumes that the collision test is rarely needed. The
collision test will be done later in second pass if necessary, which
puts the entire object back to memory again (We could even do the
collision test without putting the entire object back in memory, by
comparing as we unpack it).
In strict mode, it always keeps non-blob objects in memory for
validation (blobs do not need data validation).
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/index-pack.c | 64 +++++++++++++++++++++++++++++++++++++++----------
t/t1050-large.sh | 4 +-
2 files changed, 53 insertions(+), 15 deletions(-)
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 918684f..db27133 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -276,30 +276,60 @@ static void unlink_base_data(struct base_data *c)
free_base_data(c);
}
-static void *unpack_entry_data(unsigned long offset, unsigned long size)
+static void *unpack_entry_data(unsigned long offset, unsigned long size,
+ enum object_type type, unsigned char *sha1)
{
+ static char fixed_buf[8192];
int status;
git_zstream stream;
- void *buf = xmalloc(size);
+ void *buf;
+ git_SHA_CTX c;
+
+ if (sha1) { /* do hash_sha1_file internally */
+ char hdr[32];
+ int hdrlen = sprintf(hdr, "%s %lu", typename(type), size)+1;
+ git_SHA1_Init(&c);
+ git_SHA1_Update(&c, hdr, hdrlen);
+
+ buf = fixed_buf;
+ } else {
+ buf = xmalloc(size);
+ }
memset(&stream, 0, sizeof(stream));
git_inflate_init(&stream);
stream.next_out = buf;
- stream.avail_out = size;
+ stream.avail_out = buf == fixed_buf ? sizeof(fixed_buf) : size;
do {
stream.next_in = fill(1);
stream.avail_in = input_len;
status = git_inflate(&stream, 0);
use(input_len - stream.avail_in);
+ if (sha1) {
+ git_SHA1_Update(&c, buf, stream.next_out - (unsigned char *)buf);
+ stream.next_out = buf;
+ stream.avail_out = sizeof(fixed_buf);
+ }
} while (status == Z_OK);
if (stream.total_out != size || status != Z_STREAM_END)
bad_object(offset, "inflate returned %d", status);
git_inflate_end(&stream);
+ if (sha1) {
+ git_SHA1_Final(sha1, &c);
+ buf = NULL;
+ }
return buf;
}
-static void *unpack_raw_entry(struct object_entry *obj, union delta_base *delta_base)
+static int is_delta_type(enum object_type type)
+{
+ return (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA);
+}
+
+static void *unpack_raw_entry(struct object_entry *obj,
+ union delta_base *delta_base,
+ unsigned char *sha1)
{
unsigned char *p;
unsigned long size, c;
@@ -359,7 +389,9 @@ static void *unpack_raw_entry(struct object_entry *obj, union delta_base *delta_
}
obj->hdr_size = consumed_bytes - obj->idx.offset;
- data = unpack_entry_data(obj->idx.offset, obj->size);
+ if (is_delta_type(obj->type) || strict)
+ sha1 = NULL; /* save unpacked object */
+ data = unpack_entry_data(obj->idx.offset, obj->size, obj->type, sha1);
obj->idx.crc32 = input_crc32;
return data;
}
@@ -460,8 +492,9 @@ static void find_delta_children(const union delta_base *base,
static void sha1_object(const void *data, unsigned long size,
enum object_type type, unsigned char *sha1)
{
- hash_sha1_file(data, size, typename(type), sha1);
- if (has_sha1_file(sha1)) {
+ if (data)
+ hash_sha1_file(data, size, typename(type), sha1);
+ if (data && has_sha1_file(sha1)) {
void *has_data;
enum object_type has_type;
unsigned long has_size;
@@ -510,11 +543,6 @@ static void sha1_object(const void *data, unsigned long size,
}
}
-static int is_delta_type(enum object_type type)
-{
- return (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA);
-}
-
/*
* This function is part of find_unresolved_deltas(). There are two
* walkers going in the opposite ways.
@@ -689,10 +717,20 @@ static int compare_delta_entry(const void *a, const void *b)
* - if used as a base, uncompress the object and apply all deltas,
* recursively checking if the resulting object is used as a base
* for some more deltas.
+ * - if the same object exists in repository and we're not in strict
+ * mode, we skipped the sha-1 collision test in the first pass.
+ * Do it now.
*/
static void second_pass(struct object_entry *obj)
{
struct base_data *base_obj = alloc_base_data();
+
+ if (!strict && has_sha1_file(obj->idx.sha1)) {
+ void *data = get_data_from_pack(obj);
+ sha1_object(data, obj->size, obj->type, obj->idx.sha1);
+ free(data);
+ }
+
base_obj->obj = obj;
base_obj->data = NULL;
find_unresolved_deltas(base_obj);
@@ -718,7 +756,7 @@ static void parse_pack_objects(unsigned char *sha1)
nr_objects);
for (i = 0; i < nr_objects; i++) {
struct object_entry *obj = &objects[i];
- void *data = unpack_raw_entry(obj, &delta->base);
+ void *data = unpack_raw_entry(obj, &delta->base, obj->idx.sha1);
obj->real_type = obj->type;
if (is_delta_type(obj->type)) {
nr_deltas++;
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index 4e08e02..e4b77a2 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -130,11 +130,11 @@ test_expect_success 'git-show a large file' '
'
-test_expect_failure 'clone' '
+test_expect_success 'clone' '
git clone file://"$PWD"/.git new
'
-test_expect_failure 'fetch updates' '
+test_expect_success 'fetch updates' '
echo modified >> large1 &&
git commit -q -a -m updated &&
(
--
1.7.3.1.256.g2539c.dirty
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v3 08/11] pack-check: do not unpack blobs
2012-03-04 12:59 ` [PATCH v2 00/10] " Nguyễn Thái Ngọc Duy
` (17 preceding siblings ...)
2012-03-05 3:43 ` [PATCH v3 07/11] index-pack: reduce memory usage when the pack has large blobs Nguyễn Thái Ngọc Duy
@ 2012-03-05 3:43 ` Nguyễn Thái Ngọc Duy
2012-03-05 3:43 ` [PATCH v3 09/11] archive: support streaming large files to a tar archive Nguyễn Thái Ngọc Duy
` (2 subsequent siblings)
21 siblings, 0 replies; 48+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-03-05 3:43 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
blob content is not used by verify_pack caller (currently only fsck),
we only need to make sure blob sha-1 signature matches its
content. unpack_entry() is taught to hash pack entry as it is
unpacked, eliminating the need to keep whole blob in memory.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
cache.h | 2 +-
fast-import.c | 2 +-
pack-check.c | 21 ++++++++++++++++++++-
sha1_file.c | 45 +++++++++++++++++++++++++++++++++++----------
t/t1050-large.sh | 2 +-
5 files changed, 58 insertions(+), 14 deletions(-)
diff --git a/cache.h b/cache.h
index e12b15f..3365f89 100644
--- a/cache.h
+++ b/cache.h
@@ -1062,7 +1062,7 @@ extern const unsigned char *nth_packed_object_sha1(struct packed_git *, uint32_t
extern off_t nth_packed_object_offset(const struct packed_git *, uint32_t);
extern off_t find_pack_entry_one(const unsigned char *, struct packed_git *);
extern int is_pack_valid(struct packed_git *);
-extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, unsigned long *);
+extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, unsigned long *, unsigned char *);
extern unsigned long unpack_object_header_buffer(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep);
extern unsigned long get_size_from_delta(struct packed_git *, struct pack_window **, off_t);
extern int unpack_object_header(struct packed_git *, struct pack_window **, off_t *, unsigned long *);
diff --git a/fast-import.c b/fast-import.c
index 6cd19e5..5e94a64 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1303,7 +1303,7 @@ static void *gfi_unpack_entry(
*/
p->pack_size = pack_size + 20;
}
- return unpack_entry(p, oe->idx.offset, &type, sizep);
+ return unpack_entry(p, oe->idx.offset, &type, sizep, NULL);
}
static const char *get_mode(const char *str, uint16_t *modep)
diff --git a/pack-check.c b/pack-check.c
index 63a595c..1920bdb 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -105,6 +105,7 @@ static int verify_packfile(struct packed_git *p,
void *data;
enum object_type type;
unsigned long size;
+ off_t curpos = entries[i].offset;
if (p->index_version > 1) {
off_t offset = entries[i].offset;
@@ -116,7 +117,25 @@ static int verify_packfile(struct packed_git *p,
sha1_to_hex(entries[i].sha1),
p->pack_name, (uintmax_t)offset);
}
- data = unpack_entry(p, entries[i].offset, &type, &size);
+ type = unpack_object_header(p, w_curs, &curpos, &size);
+ unuse_pack(w_curs);
+ if (type == OBJ_BLOB) {
+ unsigned char sha1[20];
+ data = unpack_entry(p, entries[i].offset, &type, &size, sha1);
+ if (!data) {
+ if (hashcmp(entries[i].sha1, sha1))
+ err = error("packed %s from %s is corrupt",
+ sha1_to_hex(entries[i].sha1), p->pack_name);
+ else if (fn) {
+ int eaten = 0;
+ fn(entries[i].sha1, type, size, NULL, &eaten);
+ }
+ if (((base_count + i) & 1023) == 0)
+ display_progress(progress, base_count + i);
+ continue;
+ }
+ }
+ data = unpack_entry(p, entries[i].offset, &type, &size, NULL);
if (!data)
err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
sha1_to_hex(entries[i].sha1), p->pack_name,
diff --git a/sha1_file.c b/sha1_file.c
index a77ef0a..d68a5b0 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1653,28 +1653,51 @@ static int packed_object_info(struct packed_git *p, off_t obj_offset,
}
static void *unpack_compressed_entry(struct packed_git *p,
- struct pack_window **w_curs,
- off_t curpos,
- unsigned long size)
+ struct pack_window **w_curs,
+ off_t curpos,
+ unsigned long size,
+ enum object_type type,
+ unsigned char *sha1)
{
+ static unsigned char fixed_buf[8192];
int st;
git_zstream stream;
unsigned char *buffer, *in;
+ git_SHA_CTX c;
+
+ if (sha1) { /* do hash_sha1_file internally */
+ char hdr[32];
+ int hdrlen = sprintf(hdr, "%s %lu", typename(type), size)+1;
+ git_SHA1_Init(&c);
+ git_SHA1_Update(&c, hdr, hdrlen);
+
+ buffer = fixed_buf;
+ } else {
+ buffer = xmallocz(size);
+ }
- buffer = xmallocz(size);
memset(&stream, 0, sizeof(stream));
stream.next_out = buffer;
- stream.avail_out = size + 1;
+ stream.avail_out = buffer == fixed_buf ? sizeof(fixed_buf) : size + 1;
git_inflate_init(&stream);
do {
in = use_pack(p, w_curs, curpos, &stream.avail_in);
stream.next_in = in;
st = git_inflate(&stream, Z_FINISH);
- if (!stream.avail_out)
+ if (sha1) {
+ git_SHA1_Update(&c, buffer, stream.next_out - (unsigned char *)buffer);
+ stream.next_out = buffer;
+ stream.avail_out = sizeof(fixed_buf);
+ }
+ else if (!stream.avail_out)
break; /* the payload is larger than it should be */
curpos += stream.next_in - in;
} while (st == Z_OK || st == Z_BUF_ERROR);
+ if (sha1) {
+ git_SHA1_Final(sha1, &c);
+ buffer = NULL;
+ }
git_inflate_end(&stream);
if ((st != Z_STREAM_END) || stream.total_out != size) {
free(buffer);
@@ -1727,7 +1750,7 @@ static void *cache_or_unpack_entry(struct packed_git *p, off_t base_offset,
ret = ent->data;
if (!ret || ent->p != p || ent->base_offset != base_offset)
- return unpack_entry(p, base_offset, type, base_size);
+ return unpack_entry(p, base_offset, type, base_size, NULL);
if (!keep_cache) {
ent->data = NULL;
@@ -1844,7 +1867,7 @@ static void *unpack_delta_entry(struct packed_git *p,
return NULL;
}
- delta_data = unpack_compressed_entry(p, w_curs, curpos, delta_size);
+ delta_data = unpack_compressed_entry(p, w_curs, curpos, delta_size, OBJ_NONE, NULL);
if (!delta_data) {
error("failed to unpack compressed delta "
"at offset %"PRIuMAX" from %s",
@@ -1883,7 +1906,8 @@ static void write_pack_access_log(struct packed_git *p, off_t obj_offset)
int do_check_packed_object_crc;
void *unpack_entry(struct packed_git *p, off_t obj_offset,
- enum object_type *type, unsigned long *sizep)
+ enum object_type *type, unsigned long *sizep,
+ unsigned char *sha1)
{
struct pack_window *w_curs = NULL;
off_t curpos = obj_offset;
@@ -1917,7 +1941,8 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
case OBJ_TREE:
case OBJ_BLOB:
case OBJ_TAG:
- data = unpack_compressed_entry(p, &w_curs, curpos, *sizep);
+ data = unpack_compressed_entry(p, &w_curs, curpos,
+ *sizep, *type, sha1);
break;
default:
data = NULL;
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index e4b77a2..52acae5 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -143,7 +143,7 @@ test_expect_success 'fetch updates' '
)
'
-test_expect_failure 'fsck' '
+test_expect_success 'fsck' '
git fsck --full
'
--
1.7.3.1.256.g2539c.dirty
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v3 09/11] archive: support streaming large files to a tar archive
2012-03-04 12:59 ` [PATCH v2 00/10] " Nguyễn Thái Ngọc Duy
` (18 preceding siblings ...)
2012-03-05 3:43 ` [PATCH v3 08/11] pack-check: do not unpack blobs Nguyễn Thái Ngọc Duy
@ 2012-03-05 3:43 ` Nguyễn Thái Ngọc Duy
2012-03-06 0:57 ` Junio C Hamano
2012-03-05 3:43 ` [PATCH v3 10/11] fsck: use streaming interface for writing lost-found blobs Nguyễn Thái Ngọc Duy
2012-03-05 3:43 ` [PATCH v3 11/11] update-server-info: respect core.bigfilethreshold Nguyễn Thái Ngọc Duy
21 siblings, 1 reply; 48+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-03-05 3:43 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
archive-tar.c | 35 ++++++++++++++++++++++++++++-------
archive-zip.c | 9 +++++----
archive.c | 51 ++++++++++++++++++++++++++++++++++-----------------
archive.h | 11 +++++++++--
t/t1050-large.sh | 2 +-
5 files changed, 77 insertions(+), 31 deletions(-)
diff --git a/archive-tar.c b/archive-tar.c
index 20af005..5bffe49 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -5,6 +5,7 @@
#include "tar.h"
#include "archive.h"
#include "run-command.h"
+#include "streaming.h"
#define RECORDSIZE (512)
#define BLOCKSIZE (RECORDSIZE * 20)
@@ -123,9 +124,29 @@ static size_t get_path_prefix(const char *path, size_t pathlen, size_t maxlen)
return i;
}
+static void write_file(struct git_istream *stream, const void *buffer,
+ unsigned long size)
+{
+ if (!stream) {
+ write_blocked(buffer, size);
+ return;
+ }
+ for (;;) {
+ char buf[1024 * 16];
+ ssize_t readlen;
+
+ readlen = read_istream(stream, buf, sizeof(buf));
+
+ if (!readlen)
+ break;
+ write_blocked(buf, readlen);
+ }
+}
+
static int write_tar_entry(struct archiver_args *args,
- const unsigned char *sha1, const char *path, size_t pathlen,
- unsigned int mode, void *buffer, unsigned long size)
+ const unsigned char *sha1, const char *path,
+ size_t pathlen, unsigned int mode, void *buffer,
+ struct git_istream *stream, unsigned long size)
{
struct ustar_header header;
struct strbuf ext_header = STRBUF_INIT;
@@ -200,14 +221,14 @@ static int write_tar_entry(struct archiver_args *args,
if (ext_header.len > 0) {
err = write_tar_entry(args, sha1, NULL, 0, 0, ext_header.buf,
- ext_header.len);
+ NULL, ext_header.len);
if (err)
return err;
}
strbuf_release(&ext_header);
write_blocked(&header, sizeof(header));
- if (S_ISREG(mode) && buffer && size > 0)
- write_blocked(buffer, size);
+ if (S_ISREG(mode) && size > 0)
+ write_file(stream, buffer, size);
return err;
}
@@ -219,7 +240,7 @@ static int write_global_extended_header(struct archiver_args *args)
strbuf_append_ext_header(&ext_header, "comment", sha1_to_hex(sha1), 40);
err = write_tar_entry(args, NULL, NULL, 0, 0, ext_header.buf,
- ext_header.len);
+ NULL, ext_header.len);
strbuf_release(&ext_header);
return err;
}
@@ -308,7 +329,7 @@ static int write_tar_archive(const struct archiver *ar,
if (args->commit_sha1)
err = write_global_extended_header(args);
if (!err)
- err = write_archive_entries(args, write_tar_entry);
+ err = write_archive_entries(args, write_tar_entry, 1);
if (!err)
write_trailer();
return err;
diff --git a/archive-zip.c b/archive-zip.c
index 02d1f37..4a1e917 100644
--- a/archive-zip.c
+++ b/archive-zip.c
@@ -120,9 +120,10 @@ static void *zlib_deflate(void *data, unsigned long size,
return buffer;
}
-static int write_zip_entry(struct archiver_args *args,
- const unsigned char *sha1, const char *path, size_t pathlen,
- unsigned int mode, void *buffer, unsigned long size)
+int write_zip_entry(struct archiver_args *args,
+ const unsigned char *sha1, const char *path,
+ size_t pathlen, unsigned int mode, void *buffer,
+ struct git_istream *stream, unsigned long size)
{
struct zip_local_header header;
struct zip_dir_header dirent;
@@ -271,7 +272,7 @@ static int write_zip_archive(const struct archiver *ar,
zip_dir = xmalloc(ZIP_DIRECTORY_MIN_SIZE);
zip_dir_size = ZIP_DIRECTORY_MIN_SIZE;
- err = write_archive_entries(args, write_zip_entry);
+ err = write_archive_entries(args, write_zip_entry, 0);
if (!err)
write_zip_trailer(args->commit_sha1);
diff --git a/archive.c b/archive.c
index 1ee837d..257eadf 100644
--- a/archive.c
+++ b/archive.c
@@ -5,6 +5,7 @@
#include "archive.h"
#include "parse-options.h"
#include "unpack-trees.h"
+#include "streaming.h"
static char const * const archive_usage[] = {
"git archive [options] <tree-ish> [<path>...]",
@@ -59,26 +60,35 @@ static void format_subst(const struct commit *commit,
free(to_free);
}
-static void *sha1_file_to_archive(const char *path, const unsigned char *sha1,
- unsigned int mode, enum object_type *type,
- unsigned long *sizep, const struct commit *commit)
+void sha1_file_to_archive(void **buffer, struct git_istream **stream,
+ const char *path, const unsigned char *sha1,
+ unsigned int mode, enum object_type *type,
+ unsigned long *sizep,
+ const struct commit *commit)
{
- void *buffer;
+ if (stream) {
+ struct stream_filter *filter;
+ filter = get_stream_filter(path, sha1);
+ if (!commit && S_ISREG(mode) && is_null_stream_filter(filter)) {
+ *buffer = NULL;
+ *stream = open_istream(sha1, type, sizep, NULL);
+ return;
+ }
+ *stream = NULL;
+ }
- buffer = read_sha1_file(sha1, type, sizep);
- if (buffer && S_ISREG(mode)) {
+ *buffer = read_sha1_file(sha1, type, sizep);
+ if (*buffer && S_ISREG(mode)) {
struct strbuf buf = STRBUF_INIT;
size_t size = 0;
- strbuf_attach(&buf, buffer, *sizep, *sizep + 1);
+ strbuf_attach(&buf, *buffer, *sizep, *sizep + 1);
convert_to_working_tree(path, buf.buf, buf.len, &buf);
if (commit)
format_subst(commit, buf.buf, buf.len, &buf);
- buffer = strbuf_detach(&buf, &size);
+ *buffer = strbuf_detach(&buf, &size);
*sizep = size;
}
-
- return buffer;
}
static void setup_archive_check(struct git_attr_check *check)
@@ -97,6 +107,7 @@ static void setup_archive_check(struct git_attr_check *check)
struct archiver_context {
struct archiver_args *args;
write_archive_entry_fn_t write_entry;
+ int stream_ok;
};
static int write_archive_entry(const unsigned char *sha1, const char *base,
@@ -109,6 +120,7 @@ static int write_archive_entry(const unsigned char *sha1, const char *base,
write_archive_entry_fn_t write_entry = c->write_entry;
struct git_attr_check check[2];
const char *path_without_prefix;
+ struct git_istream *stream = NULL;
int convert = 0;
int err;
enum object_type type;
@@ -133,25 +145,29 @@ static int write_archive_entry(const unsigned char *sha1, const char *base,
strbuf_addch(&path, '/');
if (args->verbose)
fprintf(stderr, "%.*s\n", (int)path.len, path.buf);
- err = write_entry(args, sha1, path.buf, path.len, mode, NULL, 0);
+ err = write_entry(args, sha1, path.buf, path.len, mode, NULL, NULL, 0);
if (err)
return err;
return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0);
}
- buffer = sha1_file_to_archive(path_without_prefix, sha1, mode,
- &type, &size, convert ? args->commit : NULL);
- if (!buffer)
+ sha1_file_to_archive(&buffer, c->stream_ok ? &stream : NULL,
+ path_without_prefix, sha1, mode,
+ &type, &size, convert ? args->commit : NULL);
+ if (!buffer && !stream)
return error("cannot read %s", sha1_to_hex(sha1));
if (args->verbose)
fprintf(stderr, "%.*s\n", (int)path.len, path.buf);
- err = write_entry(args, sha1, path.buf, path.len, mode, buffer, size);
+ err = write_entry(args, sha1, path.buf, path.len, mode, buffer, stream, size);
+ if (stream)
+ close_istream(stream);
free(buffer);
return err;
}
int write_archive_entries(struct archiver_args *args,
- write_archive_entry_fn_t write_entry)
+ write_archive_entry_fn_t write_entry,
+ int stream_ok)
{
struct archiver_context context;
struct unpack_trees_options opts;
@@ -167,13 +183,14 @@ int write_archive_entries(struct archiver_args *args,
if (args->verbose)
fprintf(stderr, "%.*s\n", (int)len, args->base);
err = write_entry(args, args->tree->object.sha1, args->base,
- len, 040777, NULL, 0);
+ len, 040777, NULL, NULL, 0);
if (err)
return err;
}
context.args = args;
context.write_entry = write_entry;
+ context.stream_ok = stream_ok;
/*
* Setup index and instruct attr to read index only
diff --git a/archive.h b/archive.h
index 2b0884f..370cca9 100644
--- a/archive.h
+++ b/archive.h
@@ -27,9 +27,16 @@ extern void register_archiver(struct archiver *);
extern void init_tar_archiver(void);
extern void init_zip_archiver(void);
-typedef int (*write_archive_entry_fn_t)(struct archiver_args *args, const unsigned char *sha1, const char *path, size_t pathlen, unsigned int mode, void *buffer, unsigned long size);
+struct git_istream;
+typedef int (*write_archive_entry_fn_t)(struct archiver_args *args,
+ const unsigned char *sha1,
+ const char *path, size_t pathlen,
+ unsigned int mode,
+ void *buffer,
+ struct git_istream *stream,
+ unsigned long size);
-extern int write_archive_entries(struct archiver_args *args, write_archive_entry_fn_t write_entry);
+extern int write_archive_entries(struct archiver_args *args, write_archive_entry_fn_t write_entry, int stream_ok);
extern int write_archive(int argc, const char **argv, const char *prefix, int setup_prefix, const char *name_hint, int remote);
const char *archive_format_from_filename(const char *filename);
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index 52acae5..5336eb8 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -151,7 +151,7 @@ test_expect_failure 'repack' '
git repack -ad
'
-test_expect_failure 'tar achiving' '
+test_expect_success 'tar achiving' '
git archive --format=tar HEAD >/dev/null
'
--
1.7.3.1.256.g2539c.dirty
^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v3 09/11] archive: support streaming large files to a tar archive
2012-03-05 3:43 ` [PATCH v3 09/11] archive: support streaming large files to a tar archive Nguyễn Thái Ngọc Duy
@ 2012-03-06 0:57 ` Junio C Hamano
0 siblings, 0 replies; 48+ messages in thread
From: Junio C Hamano @ 2012-03-06 0:57 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: git
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
This is *way* *too* underdocumented.
For example, it is totally unclear from the patch what determines
the last parameter to write_archive_entries(), OK_TO_STREAM. Does
it depend on the nature of the payload? Does the backend decide it,
in other words, if it is prepared to read from a streaming API or not?
I wanted to first take all the "do not slurp things in core, and
instead read from streaming API" patches from this series, but I
had to stop at this one.
> ---
> archive-tar.c | 35 ++++++++++++++++++++++++++++-------
> archive-zip.c | 9 +++++----
> archive.c | 51 ++++++++++++++++++++++++++++++++++-----------------
> archive.h | 11 +++++++++--
> t/t1050-large.sh | 2 +-
> 5 files changed, 77 insertions(+), 31 deletions(-)
>
> diff --git a/archive-tar.c b/archive-tar.c
> index 20af005..5bffe49 100644
> --- a/archive-tar.c
> +++ b/archive-tar.c
> @@ -5,6 +5,7 @@
> #include "tar.h"
> #include "archive.h"
> #include "run-command.h"
> +#include "streaming.h"
>
> #define RECORDSIZE (512)
> #define BLOCKSIZE (RECORDSIZE * 20)
> @@ -123,9 +124,29 @@ static size_t get_path_prefix(const char *path, size_t pathlen, size_t maxlen)
> return i;
> }
>
> +static void write_file(struct git_istream *stream, const void *buffer,
> + unsigned long size)
> +{
> + if (!stream) {
> + write_blocked(buffer, size);
> + return;
> + }
> + for (;;) {
> + char buf[1024 * 16];
> + ssize_t readlen;
> +
> + readlen = read_istream(stream, buf, sizeof(buf));
> +
> + if (!readlen)
> + break;
> + write_blocked(buf, readlen);
> + }
> +}
> +
> static int write_tar_entry(struct archiver_args *args,
> - const unsigned char *sha1, const char *path, size_t pathlen,
> - unsigned int mode, void *buffer, unsigned long size)
> + const unsigned char *sha1, const char *path,
> + size_t pathlen, unsigned int mode, void *buffer,
> + struct git_istream *stream, unsigned long size)
> {
> struct ustar_header header;
> struct strbuf ext_header = STRBUF_INIT;
> @@ -200,14 +221,14 @@ static int write_tar_entry(struct archiver_args *args,
>
> if (ext_header.len > 0) {
> err = write_tar_entry(args, sha1, NULL, 0, 0, ext_header.buf,
> - ext_header.len);
> + NULL, ext_header.len);
> if (err)
> return err;
> }
> strbuf_release(&ext_header);
> write_blocked(&header, sizeof(header));
> - if (S_ISREG(mode) && buffer && size > 0)
> - write_blocked(buffer, size);
> + if (S_ISREG(mode) && size > 0)
> + write_file(stream, buffer, size);
> return err;
> }
>
> @@ -219,7 +240,7 @@ static int write_global_extended_header(struct archiver_args *args)
>
> strbuf_append_ext_header(&ext_header, "comment", sha1_to_hex(sha1), 40);
> err = write_tar_entry(args, NULL, NULL, 0, 0, ext_header.buf,
> - ext_header.len);
> + NULL, ext_header.len);
> strbuf_release(&ext_header);
> return err;
> }
> @@ -308,7 +329,7 @@ static int write_tar_archive(const struct archiver *ar,
> if (args->commit_sha1)
> err = write_global_extended_header(args);
> if (!err)
> - err = write_archive_entries(args, write_tar_entry);
> + err = write_archive_entries(args, write_tar_entry, 1);
> if (!err)
> write_trailer();
> return err;
> diff --git a/archive-zip.c b/archive-zip.c
> index 02d1f37..4a1e917 100644
> --- a/archive-zip.c
> +++ b/archive-zip.c
> @@ -120,9 +120,10 @@ static void *zlib_deflate(void *data, unsigned long size,
> return buffer;
> }
>
> -static int write_zip_entry(struct archiver_args *args,
> - const unsigned char *sha1, const char *path, size_t pathlen,
> - unsigned int mode, void *buffer, unsigned long size)
> +int write_zip_entry(struct archiver_args *args,
> + const unsigned char *sha1, const char *path,
> + size_t pathlen, unsigned int mode, void *buffer,
> + struct git_istream *stream, unsigned long size)
> {
> struct zip_local_header header;
> struct zip_dir_header dirent;
> @@ -271,7 +272,7 @@ static int write_zip_archive(const struct archiver *ar,
> zip_dir = xmalloc(ZIP_DIRECTORY_MIN_SIZE);
> zip_dir_size = ZIP_DIRECTORY_MIN_SIZE;
>
> - err = write_archive_entries(args, write_zip_entry);
> + err = write_archive_entries(args, write_zip_entry, 0);
> if (!err)
> write_zip_trailer(args->commit_sha1);
>
> diff --git a/archive.c b/archive.c
> index 1ee837d..257eadf 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -5,6 +5,7 @@
> #include "archive.h"
> #include "parse-options.h"
> #include "unpack-trees.h"
> +#include "streaming.h"
>
> static char const * const archive_usage[] = {
> "git archive [options] <tree-ish> [<path>...]",
> @@ -59,26 +60,35 @@ static void format_subst(const struct commit *commit,
> free(to_free);
> }
>
> -static void *sha1_file_to_archive(const char *path, const unsigned char *sha1,
> - unsigned int mode, enum object_type *type,
> - unsigned long *sizep, const struct commit *commit)
> +void sha1_file_to_archive(void **buffer, struct git_istream **stream,
> + const char *path, const unsigned char *sha1,
> + unsigned int mode, enum object_type *type,
> + unsigned long *sizep,
> + const struct commit *commit)
> {
> - void *buffer;
> + if (stream) {
> + struct stream_filter *filter;
> + filter = get_stream_filter(path, sha1);
> + if (!commit && S_ISREG(mode) && is_null_stream_filter(filter)) {
> + *buffer = NULL;
> + *stream = open_istream(sha1, type, sizep, NULL);
> + return;
> + }
> + *stream = NULL;
> + }
>
> - buffer = read_sha1_file(sha1, type, sizep);
> - if (buffer && S_ISREG(mode)) {
> + *buffer = read_sha1_file(sha1, type, sizep);
> + if (*buffer && S_ISREG(mode)) {
> struct strbuf buf = STRBUF_INIT;
> size_t size = 0;
>
> - strbuf_attach(&buf, buffer, *sizep, *sizep + 1);
> + strbuf_attach(&buf, *buffer, *sizep, *sizep + 1);
> convert_to_working_tree(path, buf.buf, buf.len, &buf);
> if (commit)
> format_subst(commit, buf.buf, buf.len, &buf);
> - buffer = strbuf_detach(&buf, &size);
> + *buffer = strbuf_detach(&buf, &size);
> *sizep = size;
> }
> -
> - return buffer;
> }
>
> static void setup_archive_check(struct git_attr_check *check)
> @@ -97,6 +107,7 @@ static void setup_archive_check(struct git_attr_check *check)
> struct archiver_context {
> struct archiver_args *args;
> write_archive_entry_fn_t write_entry;
> + int stream_ok;
> };
>
> static int write_archive_entry(const unsigned char *sha1, const char *base,
> @@ -109,6 +120,7 @@ static int write_archive_entry(const unsigned char *sha1, const char *base,
> write_archive_entry_fn_t write_entry = c->write_entry;
> struct git_attr_check check[2];
> const char *path_without_prefix;
> + struct git_istream *stream = NULL;
> int convert = 0;
> int err;
> enum object_type type;
> @@ -133,25 +145,29 @@ static int write_archive_entry(const unsigned char *sha1, const char *base,
> strbuf_addch(&path, '/');
> if (args->verbose)
> fprintf(stderr, "%.*s\n", (int)path.len, path.buf);
> - err = write_entry(args, sha1, path.buf, path.len, mode, NULL, 0);
> + err = write_entry(args, sha1, path.buf, path.len, mode, NULL, NULL, 0);
> if (err)
> return err;
> return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0);
> }
>
> - buffer = sha1_file_to_archive(path_without_prefix, sha1, mode,
> - &type, &size, convert ? args->commit : NULL);
> - if (!buffer)
> + sha1_file_to_archive(&buffer, c->stream_ok ? &stream : NULL,
> + path_without_prefix, sha1, mode,
> + &type, &size, convert ? args->commit : NULL);
> + if (!buffer && !stream)
> return error("cannot read %s", sha1_to_hex(sha1));
> if (args->verbose)
> fprintf(stderr, "%.*s\n", (int)path.len, path.buf);
> - err = write_entry(args, sha1, path.buf, path.len, mode, buffer, size);
> + err = write_entry(args, sha1, path.buf, path.len, mode, buffer, stream, size);
> + if (stream)
> + close_istream(stream);
> free(buffer);
> return err;
> }
>
> int write_archive_entries(struct archiver_args *args,
> - write_archive_entry_fn_t write_entry)
> + write_archive_entry_fn_t write_entry,
> + int stream_ok)
> {
> struct archiver_context context;
> struct unpack_trees_options opts;
> @@ -167,13 +183,14 @@ int write_archive_entries(struct archiver_args *args,
> if (args->verbose)
> fprintf(stderr, "%.*s\n", (int)len, args->base);
> err = write_entry(args, args->tree->object.sha1, args->base,
> - len, 040777, NULL, 0);
> + len, 040777, NULL, NULL, 0);
> if (err)
> return err;
> }
>
> context.args = args;
> context.write_entry = write_entry;
> + context.stream_ok = stream_ok;
>
> /*
> * Setup index and instruct attr to read index only
> diff --git a/archive.h b/archive.h
> index 2b0884f..370cca9 100644
> --- a/archive.h
> +++ b/archive.h
> @@ -27,9 +27,16 @@ extern void register_archiver(struct archiver *);
> extern void init_tar_archiver(void);
> extern void init_zip_archiver(void);
>
> -typedef int (*write_archive_entry_fn_t)(struct archiver_args *args, const unsigned char *sha1, const char *path, size_t pathlen, unsigned int mode, void *buffer, unsigned long size);
> +struct git_istream;
> +typedef int (*write_archive_entry_fn_t)(struct archiver_args *args,
> + const unsigned char *sha1,
> + const char *path, size_t pathlen,
> + unsigned int mode,
> + void *buffer,
> + struct git_istream *stream,
> + unsigned long size);
>
> -extern int write_archive_entries(struct archiver_args *args, write_archive_entry_fn_t write_entry);
> +extern int write_archive_entries(struct archiver_args *args, write_archive_entry_fn_t write_entry, int stream_ok);
> extern int write_archive(int argc, const char **argv, const char *prefix, int setup_prefix, const char *name_hint, int remote);
>
> const char *archive_format_from_filename(const char *filename);
> diff --git a/t/t1050-large.sh b/t/t1050-large.sh
> index 52acae5..5336eb8 100755
> --- a/t/t1050-large.sh
> +++ b/t/t1050-large.sh
> @@ -151,7 +151,7 @@ test_expect_failure 'repack' '
> git repack -ad
> '
>
> -test_expect_failure 'tar achiving' '
> +test_expect_success 'tar achiving' '
> git archive --format=tar HEAD >/dev/null
> '
^ permalink raw reply [flat|nested] 48+ messages in thread
* [PATCH v3 10/11] fsck: use streaming interface for writing lost-found blobs
2012-03-04 12:59 ` [PATCH v2 00/10] " Nguyễn Thái Ngọc Duy
` (19 preceding siblings ...)
2012-03-05 3:43 ` [PATCH v3 09/11] archive: support streaming large files to a tar archive Nguyễn Thái Ngọc Duy
@ 2012-03-05 3:43 ` Nguyễn Thái Ngọc Duy
2012-03-05 3:43 ` [PATCH v3 11/11] update-server-info: respect core.bigfilethreshold Nguyễn Thái Ngọc Duy
21 siblings, 0 replies; 48+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-03-05 3:43 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/fsck.c | 8 ++------
1 files changed, 2 insertions(+), 6 deletions(-)
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 8c479a7..7fcb33e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -12,6 +12,7 @@
#include "parse-options.h"
#include "dir.h"
#include "progress.h"
+#include "streaming.h"
#define REACHABLE 0x0001
#define SEEN 0x0002
@@ -236,13 +237,8 @@ static void check_unreachable_object(struct object *obj)
if (!(f = fopen(filename, "w")))
die_errno("Could not open '%s'", filename);
if (obj->type == OBJ_BLOB) {
- enum object_type type;
- unsigned long size;
- char *buf = read_sha1_file(obj->sha1,
- &type, &size);
- if (buf && fwrite(buf, 1, size, f) != size)
+ if (stream_blob_to_fd(fileno(f), obj->sha1, NULL, 1))
die_errno("Could not write '%s'", filename);
- free(buf);
} else
fprintf(f, "%s\n", sha1_to_hex(obj->sha1));
if (fclose(f))
--
1.7.3.1.256.g2539c.dirty
^ permalink raw reply related [flat|nested] 48+ messages in thread
* [PATCH v3 11/11] update-server-info: respect core.bigfilethreshold
2012-03-04 12:59 ` [PATCH v2 00/10] " Nguyễn Thái Ngọc Duy
` (20 preceding siblings ...)
2012-03-05 3:43 ` [PATCH v3 10/11] fsck: use streaming interface for writing lost-found blobs Nguyễn Thái Ngọc Duy
@ 2012-03-05 3:43 ` Nguyễn Thái Ngọc Duy
21 siblings, 0 replies; 48+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-03-05 3:43 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
This command indirectly calls check_sha1_signature() (add_info_ref ->
deref_tag -> parse_object -> ..) , which may put whole blob in memory
if the blob's size is under core.bigfilethreshold. As config is not
read, the threshold is always 512MB. Respect user settings here.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/update-server-info.c | 1 +
t/t1050-large.sh | 2 +-
2 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/builtin/update-server-info.c b/builtin/update-server-info.c
index b90dce6..0d63c44 100644
--- a/builtin/update-server-info.c
+++ b/builtin/update-server-info.c
@@ -15,6 +15,7 @@ int cmd_update_server_info(int argc, const char **argv, const char *prefix)
OPT_END()
};
+ git_config(git_default_config, NULL);
argc = parse_options(argc, argv, prefix, options,
update_server_info_usage, 0);
if (argc > 0)
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index 5336eb8..9197b89 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -147,7 +147,7 @@ test_expect_success 'fsck' '
git fsck --full
'
-test_expect_failure 'repack' '
+test_expect_success 'repack' '
git repack -ad
'
--
1.7.3.1.256.g2539c.dirty
^ permalink raw reply related [flat|nested] 48+ messages in thread