* [PATCH] diff: mode bits fixes
[not found] <7vy89ums2l.fsf@assigned-by-dhcp.cox.net>
@ 2005-06-01 18:38 ` Junio C Hamano
2005-06-02 16:46 ` [PATCH] Handle deltified object correctly in git-*-pull family Junio C Hamano
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Junio C Hamano @ 2005-06-01 18:38 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
The core GIT repository has trees that record regular file mode
in 0664 instead of normalized 0644 pattern. Comparing such a
tree with another tree that records the same file in 0644
pattern without content changes with git-diff-tree causes it to
feed otherwise unmodified pairs to the diff_change() routine,
which triggers a sanity check routine and barfs. This patch
fixes the problem, along with the fix to another caller that
uses unnormalized mode bits to call diff_change() routine in a
similar way.
Without this patch, you will see "fatal error" from diff-tree
when you run git-deltafy-script on the core GIT repository
itself.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
*** Linus, I decided to bite the bullet and audited the callers.
*** There was only one that was not quite right, other than the
*** diff-tree one. Please disregard the one I sent last night
*** which incorrectly said "S_ISDIR() || S_ISREG()". Please
*** also disregard the other one that silently ignores
*** unmodified filepairs in the output routine. Warning about
*** callers should be the most appropriate action as this
*** version does.
diff.h | 4 ++++
diffcore.h | 4 ----
diff-files.c | 8 +++-----
diff-tree.c | 4 +++-
diff.c | 12 +++++++-----
5 files changed, 17 insertions(+), 15 deletions(-)
diff --git a/diff.h b/diff.h
--- a/diff.h
+++ b/diff.h
@@ -4,6 +4,10 @@
#ifndef DIFF_H
#define DIFF_H
+#define DIFF_FILE_CANON_MODE(mode) \
+ (S_ISREG(mode) ? (S_IFREG | ce_permissions(mode)) : \
+ S_ISLNK(mode) ? S_IFLNK : S_IFDIR)
+
extern void diff_addremove(int addremove,
unsigned mode,
const unsigned char *sha1,
diff --git a/diffcore.h b/diffcore.h
--- a/diffcore.h
+++ b/diffcore.h
@@ -59,10 +59,6 @@ struct diff_filepair {
#define DIFF_PAIR_MODE_CHANGED(p) ((p)->one->mode != (p)->two->mode)
-#define DIFF_FILE_CANON_MODE(mode) \
- (S_ISREG(mode) ? (S_IFREG | ce_permissions(mode)) : \
- S_ISLNK(mode) ? S_IFLNK : S_IFDIR)
-
extern void diff_free_filepair(struct diff_filepair *);
extern int diff_unmodified_pair(struct diff_filepair *);
diff --git a/diff-files.c b/diff-files.c
--- a/diff-files.c
+++ b/diff-files.c
@@ -88,7 +88,7 @@ int main(int argc, const char **argv)
for (i = 0; i < entries; i++) {
struct stat st;
- unsigned int oldmode, mode;
+ unsigned int oldmode;
struct cache_entry *ce = active_cache[i];
int changed;
@@ -116,10 +116,8 @@ int main(int argc, const char **argv)
continue;
oldmode = ntohl(ce->ce_mode);
- mode = (S_ISLNK(st.st_mode) ? S_IFLNK :
- S_IFREG | ce_permissions(st.st_mode));
-
- show_modified(oldmode, mode, ce->sha1, null_sha1,
+ show_modified(oldmode, DIFF_FILE_CANON_MODE(st.st_mode),
+ ce->sha1, null_sha1,
ce->name);
}
diffcore_std((1 < argc) ? argv + 1 : NULL,
diff --git a/diff-tree.c b/diff-tree.c
--- a/diff-tree.c
+++ b/diff-tree.c
@@ -44,10 +44,12 @@ static const unsigned char *extract(void
int len = strlen(tree)+1;
const unsigned char *sha1 = tree + len;
const char *path = strchr(tree, ' ');
+ unsigned int mode;
- if (!path || size < len + 20 || sscanf(tree, "%o", modep) != 1)
+ if (!path || size < len + 20 || sscanf(tree, "%o", &mode) != 1)
die("corrupt tree file");
*pathp = path+1;
+ *modep = DIFF_FILE_CANON_MODE(mode);
return sha1;
}
diff --git a/diff.c b/diff.c
--- a/diff.c
+++ b/diff.c
@@ -854,12 +854,14 @@ static void diff_resolve_rename_copy(voi
else if (memcmp(p->one->sha1, p->two->sha1, 20) ||
p->one->mode != p->two->mode)
p->status = 'M';
- else
- /* this is a "no-change" entry.
- * should not happen anymore.
- * p->status = 'X';
+ else {
+ /* This is a "no-change" entry and should not
+ * happen anymore, but prepare for broken callers.
*/
- die("internal error in diffcore: unmodified entry remains");
+ error("feeding unmodified %s to diffcore",
+ p->one->path);
+ p->status = 'X';
+ }
}
diff_debug_queue("resolve-rename-copy done", q);
}
------------
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] Handle deltified object correctly in git-*-pull family.
2005-06-01 18:38 ` [PATCH] diff: mode bits fixes Junio C Hamano
@ 2005-06-02 16:46 ` Junio C Hamano
2005-06-02 17:03 ` Linus Torvalds
2005-06-02 17:03 ` [PATCH] Handle deltified object correctly in git-*-pull family McMullan, Jason
2005-06-02 16:47 ` [PATCH] Use correct U*MAX Junio C Hamano
2005-06-02 16:49 ` [PATCH] Find size of SHA1 object without inflating everything Junio C Hamano
2 siblings, 2 replies; 20+ messages in thread
From: Junio C Hamano @ 2005-06-02 16:46 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
When a remote repository is deltified, we need to get the
objects that a deltified object we want to obtain is based upon.
The initial parts of each retrieved SHA1 file is inflated and
inspected to see if it is deltified, and its base object is
asked from the remote side when it is. Since this partial
inflation and inspection has a small performance hit, it can
optionally be skipped by giving -d flag to git-*-pull commands.
This flag should be used only when the remote repository is
known to have no deltified objects.
Rsync transport does not have this problem since it fetches
everything the remote side has.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
*** Linus, this uses the new helper you wrote. The interface is
*** much more pleasant to use.
Documentation/git-http-pull.txt | 6 +++++-
Documentation/git-local-pull.txt | 6 +++++-
Documentation/git-rpull.txt | 6 +++++-
cache.h | 3 +++
pull.h | 3 +++
http-pull.c | 4 +++-
local-pull.c | 4 +++-
pull.c | 6 ++++++
rpull.c | 4 +++-
sha1_file.c | 40 ++++++++++++++++++++++++++++++++++++++
10 files changed, 76 insertions(+), 6 deletions(-)
diff --git a/Documentation/git-http-pull.txt b/Documentation/git-http-pull.txt
--- a/Documentation/git-http-pull.txt
+++ b/Documentation/git-http-pull.txt
@@ -9,7 +9,7 @@ git-http-pull - Downloads a remote GIT r
SYNOPSIS
--------
-'git-http-pull' [-c] [-t] [-a] [-v] commit-id url
+'git-http-pull' [-c] [-t] [-a] [-v] [-d] commit-id url
DESCRIPTION
-----------
@@ -21,6 +21,10 @@ Downloads a remote GIT repository via HT
Get trees associated with the commit objects.
-a::
Get all the objects.
+-d::
+ Do not check for delta base objects (use this option
+ only when you know the remote repository is not
+ deltified).
-v::
Report what is downloaded.
diff --git a/Documentation/git-local-pull.txt b/Documentation/git-local-pull.txt
--- a/Documentation/git-local-pull.txt
+++ b/Documentation/git-local-pull.txt
@@ -9,7 +9,7 @@ git-local-pull - Duplicates another GIT
SYNOPSIS
--------
-'git-local-pull' [-c] [-t] [-a] [-l] [-s] [-n] [-v] commit-id path
+'git-local-pull' [-c] [-t] [-a] [-l] [-s] [-n] [-v] [-d] commit-id path
DESCRIPTION
-----------
@@ -23,6 +23,10 @@ OPTIONS
Get trees associated with the commit objects.
-a::
Get all the objects.
+-d::
+ Do not check for delta base objects (use this option
+ only when you know the remote repository is not
+ deltified).
-v::
Report what is downloaded.
diff --git a/Documentation/git-rpull.txt b/Documentation/git-rpull.txt
--- a/Documentation/git-rpull.txt
+++ b/Documentation/git-rpull.txt
@@ -10,7 +10,7 @@ git-rpull - Pulls from a remote reposito
SYNOPSIS
--------
-'git-rpull' [-c] [-t] [-a] [-v] commit-id url
+'git-rpull' [-c] [-t] [-a] [-d] [-v] commit-id url
DESCRIPTION
-----------
@@ -25,6 +25,10 @@ OPTIONS
Get trees associated with the commit objects.
-a::
Get all the objects.
+-d::
+ Do not check for delta base objects (use this option
+ only when you know the remote repository is not
+ deltified).
-v::
Report what is downloaded.
diff --git a/cache.h b/cache.h
--- a/cache.h
+++ b/cache.h
@@ -158,6 +158,9 @@ extern int write_sha1_file(void *buf, un
extern int check_sha1_signature(unsigned char *sha1, void *buf, unsigned long size, const char *type);
+extern int sha1_delta_base(const unsigned char *, unsigned char *);
+
+
/* Read a tree into the cache */
extern int read_tree(void *buffer, unsigned long size, int stage);
diff --git a/pull.h b/pull.h
--- a/pull.h
+++ b/pull.h
@@ -13,6 +13,9 @@ extern int get_history;
/** Set to fetch the trees in the commit history. **/
extern int get_all;
+/* Set to zero to skip the check for delta object base. */
+extern int get_delta;
+
/* Set to be verbose */
extern int get_verbosely;
diff --git a/http-pull.c b/http-pull.c
--- a/http-pull.c
+++ b/http-pull.c
@@ -103,6 +103,8 @@ int main(int argc, char **argv)
get_tree = 1;
} else if (argv[arg][1] == 'c') {
get_history = 1;
+ } else if (argv[arg][1] == 'd') {
+ get_delta = 0;
} else if (argv[arg][1] == 'a') {
get_all = 1;
get_tree = 1;
@@ -113,7 +115,7 @@ int main(int argc, char **argv)
arg++;
}
if (argc < arg + 2) {
- usage("git-http-pull [-c] [-t] [-a] [-v] commit-id url");
+ usage("git-http-pull [-c] [-t] [-a] [-d] [-v] commit-id url");
return 1;
}
commit_id = argv[arg];
diff --git a/local-pull.c b/local-pull.c
--- a/local-pull.c
+++ b/local-pull.c
@@ -74,7 +74,7 @@ int fetch(unsigned char *sha1)
}
static const char *local_pull_usage =
-"git-local-pull [-c] [-t] [-a] [-l] [-s] [-n] [-v] commit-id path";
+"git-local-pull [-c] [-t] [-a] [-l] [-s] [-n] [-v] [-d] commit-id path";
/*
* By default we only use file copy.
@@ -92,6 +92,8 @@ int main(int argc, char **argv)
get_tree = 1;
else if (argv[arg][1] == 'c')
get_history = 1;
+ else if (argv[arg][1] == 'd')
+ get_delta = 0;
else if (argv[arg][1] == 'a') {
get_all = 1;
get_tree = 1;
diff --git a/pull.c b/pull.c
--- a/pull.c
+++ b/pull.c
@@ -6,6 +6,7 @@
int get_tree = 0;
int get_history = 0;
+int get_delta = 1;
int get_all = 0;
int get_verbosely = 0;
static unsigned char current_commit_sha1[20];
@@ -37,6 +38,11 @@ static int make_sure_we_have_it(const ch
status = fetch(sha1);
if (status && what)
report_missing(what, sha1);
+ if (get_delta) {
+ char delta_sha1[20];
+ if (sha1_delta_base(sha1, delta_sha1))
+ status = make_sure_we_have_it(what, delta_sha1);
+ }
return status;
}
diff --git a/rpull.c b/rpull.c
--- a/rpull.c
+++ b/rpull.c
@@ -27,6 +27,8 @@ int main(int argc, char **argv)
get_tree = 1;
} else if (argv[arg][1] == 'c') {
get_history = 1;
+ } else if (argv[arg][1] == 'd') {
+ get_delta = 0;
} else if (argv[arg][1] == 'a') {
get_all = 1;
get_tree = 1;
@@ -37,7 +39,7 @@ int main(int argc, char **argv)
arg++;
}
if (argc < arg + 2) {
- usage("git-rpull [-c] [-t] [-a] [-v] commit-id url");
+ usage("git-rpull [-c] [-t] [-a] [-v] [-d] commit-id url");
return 1;
}
commit_id = argv[arg];
diff --git a/sha1_file.c b/sha1_file.c
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -347,6 +347,46 @@ void * unpack_sha1_file(void *map, unsig
return buf;
}
+int sha1_delta_base(const unsigned char *sha1, unsigned char *delta_sha1)
+{
+ unsigned long mapsize, size;
+ void *map;
+ char type[20];
+ char buffer[200];
+ z_stream stream;
+ int ret, bytes, status;
+
+ map = map_sha1_file(sha1, &mapsize);
+ if (!map)
+ return 0;
+ ret = unpack_sha1_header(&stream, map, mapsize, buffer,
+ sizeof(buffer));
+ status = 0;
+
+ if (ret < Z_OK ||
+ sscanf(buffer, "%10s %lu", type, &size) != 2 ||
+ strcmp(type, "delta"))
+ goto out;
+ bytes = strlen(buffer) + 1;
+ if (size - bytes < 20)
+ goto out;
+
+ memmove(buffer, buffer + bytes, stream.total_out - bytes);
+ bytes = stream.total_out - bytes;
+ if (bytes < 20 && ret == Z_OK) {
+ stream.next_out = buffer + bytes;
+ stream.avail_out = sizeof(buffer) - bytes;
+ while (inflate(&stream, Z_FINISH) == Z_OK)
+ ; /* nothing */
+ }
+ status = 1;
+ memcpy(delta_sha1, buffer, 20);
+ out:
+ inflateEnd(&stream);
+ munmap(map, mapsize);
+ return status;
+}
+
void * read_sha1_file(const unsigned char *sha1, char *type, unsigned long *size)
{
unsigned long mapsize;
------------
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] Use correct U*MAX.
2005-06-01 18:38 ` [PATCH] diff: mode bits fixes Junio C Hamano
2005-06-02 16:46 ` [PATCH] Handle deltified object correctly in git-*-pull family Junio C Hamano
@ 2005-06-02 16:47 ` Junio C Hamano
2005-06-03 23:02 ` Petr Baudis
2005-06-02 16:49 ` [PATCH] Find size of SHA1 object without inflating everything Junio C Hamano
2 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2005-06-02 16:47 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
The largest "unsigned long" value is ULONG_MAX, not UINT_MAX.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
*** Linus, you may be unable to spell, but I cannot type ;-).
count-delta.c | 4 ++--
diff.c | 4 ++--
diffcore-break.c | 2 +-
diffcore-rename.c | 2 +-
4 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/count-delta.c b/count-delta.c
--- a/count-delta.c
+++ b/count-delta.c
@@ -46,7 +46,7 @@ unsigned long count_delta(void *delta_bu
/* the smallest delta size possible is 6 bytes */
if (delta_size < 6)
- return UINT_MAX;
+ return ULONG_MAX;
data = delta_buf;
top = delta_buf + delta_size;
@@ -83,7 +83,7 @@ unsigned long count_delta(void *delta_bu
/* sanity check */
if (data != top || out != dst_size)
- return UINT_MAX;
+ return ULONG_MAX;
/* delete size is what was _not_ copied from source.
* edit size is that and literal additions.
diff --git a/diff.c b/diff.c
--- a/diff.c
+++ b/diff.c
@@ -256,7 +256,7 @@ static struct sha1_size_cache *locate_si
first = next+1;
}
/* not found */
- if (size == UINT_MAX)
+ if (size == ULONG_MAX)
return NULL;
/* insert to make it at "first" */
if (sha1_size_cache_alloc <= sha1_size_cache_nr) {
@@ -338,7 +338,7 @@ int diff_populate_filespec(struct diff_f
struct sha1_size_cache *e;
if (size_only) {
- e = locate_size_cache(s->sha1, UINT_MAX);
+ e = locate_size_cache(s->sha1, ULONG_MAX);
if (e) {
s->size = e->size;
return 0;
diff --git a/diffcore-break.c b/diffcore-break.c
--- a/diffcore-break.c
+++ b/diffcore-break.c
@@ -63,7 +63,7 @@ static int very_different(struct diff_fi
/* Estimate the edit size by interpreting delta. */
delta_size = count_delta(delta, delta_size);
free(delta);
- if (delta_size == UINT_MAX)
+ if (delta_size == ULONG_MAX)
return 0; /* error in delta computation */
if (base_size < delta_size)
diff --git a/diffcore-rename.c b/diffcore-rename.c
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -176,7 +176,7 @@ static int estimate_similarity(struct di
/* Estimate the edit size by interpreting delta. */
delta_size = count_delta(delta, delta_size);
free(delta);
- if (delta_size == UINT_MAX)
+ if (delta_size == ULONG_MAX)
return 0;
/*
------------
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] Find size of SHA1 object without inflating everything.
2005-06-01 18:38 ` [PATCH] diff: mode bits fixes Junio C Hamano
2005-06-02 16:46 ` [PATCH] Handle deltified object correctly in git-*-pull family Junio C Hamano
2005-06-02 16:47 ` [PATCH] Use correct U*MAX Junio C Hamano
@ 2005-06-02 16:49 ` Junio C Hamano
2 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2005-06-02 16:49 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
This adds sha1_file_size() helper function and uses it in the
rename/copy similarity estimator. The helper function handles
deltified object as well.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
*** Linus, this also uses the new helper you wrote.
cache.h | 2 +-
diff.c | 11 +++++---
sha1_file.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 85 insertions(+), 6 deletions(-)
diff --git a/cache.h b/cache.h
--- a/cache.h
+++ b/cache.h
@@ -159,7 +159,7 @@ extern int write_sha1_file(void *buf, un
extern int check_sha1_signature(unsigned char *sha1, void *buf, unsigned long size, const char *type);
extern int sha1_delta_base(const unsigned char *, unsigned char *);
-
+extern int sha1_file_size(const unsigned char *, unsigned long *);
/* Read a tree into the cache */
extern int read_tree(void *buffer, unsigned long size, int stage);
diff --git a/diff.c b/diff.c
--- a/diff.c
+++ b/diff.c
@@ -333,7 +333,6 @@ int diff_populate_filespec(struct diff_f
close(fd);
}
else {
- /* We cannot do size only for SHA1 blobs */
char type[20];
struct sha1_size_cache *e;
@@ -343,11 +342,13 @@ int diff_populate_filespec(struct diff_f
s->size = e->size;
return 0;
}
+ if (!sha1_file_size(s->sha1, &s->size))
+ locate_size_cache(s->sha1, s->size);
+ }
+ else {
+ s->data = read_sha1_file(s->sha1, type, &s->size);
+ s->should_free = 1;
}
- s->data = read_sha1_file(s->sha1, type, &s->size);
- s->should_free = 1;
- if (s->data && size_only)
- locate_size_cache(s->sha1, s->size);
}
return 0;
}
diff --git a/sha1_file.c b/sha1_file.c
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -387,6 +387,84 @@ int sha1_delta_base(const unsigned char
return status;
}
+int sha1_file_size(const unsigned char *sha1, unsigned long *sizep)
+{
+ unsigned long mapsize, size;
+ void *map;
+ char type[20];
+ char buffer[200];
+ z_stream stream;
+ int ret, bytes, status;
+
+ map = map_sha1_file(sha1, &mapsize);
+ if (!map)
+ return 0;
+ ret = unpack_sha1_header(&stream, map, mapsize, buffer,
+ sizeof(buffer));
+ status = -1;
+
+ if (ret < Z_OK || sscanf(buffer, "%10s %lu", type, &size) != 2)
+ goto out;
+ if (strcmp(type, "delta")) {
+ *sizep = size;
+ status = 0;
+ goto out;
+ }
+
+ /* We are dealing with delta object. Inflated, the first 20
+ * bytes hold the base object SHA1, and delta data follows
+ * immediately after it.
+ */
+ bytes = strlen(buffer) + 1;
+ if (size < bytes + 6 + 20)
+ goto out; /* the smallest delta size is 6 bytes */
+
+ memmove(buffer, buffer + bytes, stream.total_out - bytes);
+ bytes = stream.total_out - bytes;
+ if (bytes < sizeof(buffer) && ret == Z_OK) {
+ stream.next_out = buffer + bytes;
+ stream.avail_out = sizeof(buffer) - bytes;
+ while (inflate(&stream, Z_FINISH) == Z_OK)
+ ; /* nothing */
+ }
+
+ /* We have read initial part of the delta, which starts at
+ * buffer+20. Borrow code from patch-delta to read the
+ * result size.
+ */
+ {
+ const unsigned char *data = buffer + 20;
+ unsigned char cmd;
+ int i;
+
+ /* Skip over the source size; we are not interested in
+ * it and we cannot verify it because we do not want
+ * to read the base object.
+ */
+ cmd = *data++;
+ while (cmd) {
+ if (cmd & 1)
+ data++;
+ cmd >>= 1;
+ }
+ /* Read the result size */
+ size = i = 0;
+ cmd = *data++;
+ while (cmd) {
+ if (cmd & 1)
+ size |= *data++ << i;
+ i += 8;
+ cmd >>= 1;
+ }
+ *sizep = size;
+ }
+ status = 0;
+ out:
+ inflateEnd(&stream);
+ munmap(map, mapsize);
+ return status;
+}
+
void * read_sha1_file(const unsigned char *sha1, char *type, unsigned long *size)
{
unsigned long mapsize;
------------
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Handle deltified object correctly in git-*-pull family.
2005-06-02 16:46 ` [PATCH] Handle deltified object correctly in git-*-pull family Junio C Hamano
@ 2005-06-02 17:03 ` Linus Torvalds
2005-06-02 18:55 ` Junio C Hamano
2005-06-02 18:57 ` [PATCH] " Junio C Hamano
2005-06-02 17:03 ` [PATCH] Handle deltified object correctly in git-*-pull family McMullan, Jason
1 sibling, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2005-06-02 17:03 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Thu, 2 Jun 2005, Junio C Hamano wrote:
>
> *** Linus, this uses the new helper you wrote. The interface is
> *** much more pleasant to use.
Can you update it for the fact that I split things up even more?
In particular, see commit 5180cacc202bb20b15981469487eb8d6b0509997: "Split
up unpack_sha1_file() some more".
You should be able to unpack a delta with just something like this:
int ret;
z_stream stream;
char hdr[100], type[10];
unsigned long size;
ret = unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr));
if (ret < Z_OK)
return NULL;
if (parse_sha1_header(hdr, type, &size) < 0)
return NULL;
if (strcmp("delta", type))
return NULL;
buffer = unpack_sha1_rest(&stream, hdr, size);
if (!buffer)
return NULL;
.. we now have the delta of size "size" in "buffer" ..
ie it now has proper helper functions for every single stage in unpacking
an object.
Linus
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Handle deltified object correctly in git-*-pull family.
2005-06-02 16:46 ` [PATCH] Handle deltified object correctly in git-*-pull family Junio C Hamano
2005-06-02 17:03 ` Linus Torvalds
@ 2005-06-02 17:03 ` McMullan, Jason
2005-06-02 18:02 ` Junio C Hamano
1 sibling, 1 reply; 20+ messages in thread
From: McMullan, Jason @ 2005-06-02 17:03 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Linus Torvalds, GIT Mailling list
[-- Attachment #1: Type: text/plain, Size: 1276 bytes --]
On Thu, 2005-06-02 at 09:46 -0700, Junio C Hamano wrote:
> When a remote repository is deltified, we need to get the
> objects that a deltified object we want to obtain is based upon.
> The initial parts of each retrieved SHA1 file is inflated and
> inspected to see if it is deltified, and its base object is
> asked from the remote side when it is. Since this partial
> inflation and inspection has a small performance hit, it can
> optionally be skipped by giving -d flag to git-*-pull commands.
> This flag should be used only when the remote repository is
> known to have no deltified objects.
Eww. Don't you want to attempt to get the referenced sha1 *before*
you stick the delta blob into the repository?
Otherwise, if a transfer fails, there's no good way to recover your
database through the git-*pull interfaces, ie:
Try to pull session 1:
pull tree a
pull delta b (references blob c)
pull blob c -> fails!
Try to pull session 2:
pull tree a
delta b - Found in database!
...
Or do I not understand your code properly?
(I'm working on a similar piece of code for my git-daemon protocol,
which is why I'm worried about this issue)
--
Jason McMullan <jason.mcmullan@timesys.com>
TimeSys Corporation
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Handle deltified object correctly in git-*-pull family.
2005-06-02 17:03 ` [PATCH] Handle deltified object correctly in git-*-pull family McMullan, Jason
@ 2005-06-02 18:02 ` Junio C Hamano
0 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2005-06-02 18:02 UTC (permalink / raw)
To: McMullan, Jason; +Cc: Linus Torvalds, GIT Mailling list
>>>>> "JM" == McMullan, Jason <jason.mcmullan@timesys.com> writes:
JM> Eww. Don't you want to attempt to get the referenced sha1 *before*
JM> you stick the delta blob into the repository?
That issue crossed my mind, and I admit I haven't looked at the
issues closely enough, but I suspect that it may not worth it
with the current pull.c structure.
The current pull code fetches and stores a commit object before
it retrieves the tree object associate with it, and similarly a
tree object before its subtree and blobs, which has the same
issue. Adding -r (recover) option to the pull family to not
check for the existence of required object but its dependents
would be necessary if my suspition turns out to be correct, and
delta dependency should be handled the same way commit and tree
dependencies are handled there.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] Handle deltified object correctly in git-*-pull family.
2005-06-02 17:03 ` Linus Torvalds
@ 2005-06-02 18:55 ` Junio C Hamano
2005-06-02 21:31 ` Nicolas Pitre
2005-06-02 18:57 ` [PATCH] " Junio C Hamano
1 sibling, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2005-06-02 18:55 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
When a remote repository is deltified, we need to get the
objects that a deltified object we want to obtain is based upon.
The initial parts of each retrieved SHA1 file is inflated and
inspected to see if it is deltified, and its base object is
asked from the remote side when it is. Since this partial
inflation and inspection has a small performance hit, it can
optionally be skipped by giving -d flag to git-*-pull commands.
This flag should be used only when the remote repository is
known to have no deltified objects.
Rsync transport does not have this problem since it fetches
everything the remote side has.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
*** Now uses parse_sha1_header() and unpack_sha1_rest(). I
*** decided not to make it the callers responsibility to check
*** what we have already got and fixed unpack_sha1_rest() to
*** avoid copying more than size bytes.
Documentation/git-http-pull.txt | 6 ++++-
Documentation/git-local-pull.txt | 6 ++++-
Documentation/git-rpull.txt | 6 ++++-
cache.h | 1 +
pull.h | 3 +++
http-pull.c | 4 +++-
local-pull.c | 4 +++-
pull.c | 7 ++++++
rpull.c | 4 +++-
sha1_file.c | 43 +++++++++++++++++++++++++++++++++++++-
10 files changed, 77 insertions(+), 7 deletions(-)
diff --git a/Documentation/git-http-pull.txt b/Documentation/git-http-pull.txt
--- a/Documentation/git-http-pull.txt
+++ b/Documentation/git-http-pull.txt
@@ -9,7 +9,7 @@ git-http-pull - Downloads a remote GIT r
SYNOPSIS
--------
-'git-http-pull' [-c] [-t] [-a] [-v] commit-id url
+'git-http-pull' [-c] [-t] [-a] [-v] [-d] commit-id url
DESCRIPTION
-----------
@@ -21,6 +21,10 @@ Downloads a remote GIT repository via HT
Get trees associated with the commit objects.
-a::
Get all the objects.
+-d::
+ Do not check for delta base objects (use this option
+ only when you know the remote repository is not
+ deltified).
-v::
Report what is downloaded.
diff --git a/Documentation/git-local-pull.txt b/Documentation/git-local-pull.txt
--- a/Documentation/git-local-pull.txt
+++ b/Documentation/git-local-pull.txt
@@ -9,7 +9,7 @@ git-local-pull - Duplicates another GIT
SYNOPSIS
--------
-'git-local-pull' [-c] [-t] [-a] [-l] [-s] [-n] [-v] commit-id path
+'git-local-pull' [-c] [-t] [-a] [-l] [-s] [-n] [-v] [-d] commit-id path
DESCRIPTION
-----------
@@ -23,6 +23,10 @@ OPTIONS
Get trees associated with the commit objects.
-a::
Get all the objects.
+-d::
+ Do not check for delta base objects (use this option
+ only when you know the remote repository is not
+ deltified).
-v::
Report what is downloaded.
diff --git a/Documentation/git-rpull.txt b/Documentation/git-rpull.txt
--- a/Documentation/git-rpull.txt
+++ b/Documentation/git-rpull.txt
@@ -10,7 +10,7 @@ git-rpull - Pulls from a remote reposito
SYNOPSIS
--------
-'git-rpull' [-c] [-t] [-a] [-v] commit-id url
+'git-rpull' [-c] [-t] [-a] [-d] [-v] commit-id url
DESCRIPTION
-----------
@@ -25,6 +25,10 @@ OPTIONS
Get trees associated with the commit objects.
-a::
Get all the objects.
+-d::
+ Do not check for delta base objects (use this option
+ only when you know the remote repository is not
+ deltified).
-v::
Report what is downloaded.
diff --git a/cache.h b/cache.h
--- a/cache.h
+++ b/cache.h
@@ -153,6 +153,7 @@ extern char *sha1_file_name(const unsign
extern void * map_sha1_file(const unsigned char *sha1, unsigned long *size);
extern int unpack_sha1_header(z_stream *stream, void *map, unsigned long mapsize, void *buffer, unsigned long size);
extern int parse_sha1_header(char *hdr, char *type, unsigned long *sizep);
+extern int sha1_delta_base(const unsigned char *, unsigned char *);
extern void * unpack_sha1_file(void *map, unsigned long mapsize, char *type, unsigned long *size);
extern void * read_sha1_file(const unsigned char *sha1, char *type, unsigned long *size);
extern int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned char *return_sha1);
diff --git a/pull.h b/pull.h
--- a/pull.h
+++ b/pull.h
@@ -13,6 +13,9 @@ extern int get_history;
/** Set to fetch the trees in the commit history. **/
extern int get_all;
+/* Set to zero to skip the check for delta object base. */
+extern int get_delta;
+
/* Set to be verbose */
extern int get_verbosely;
diff --git a/http-pull.c b/http-pull.c
--- a/http-pull.c
+++ b/http-pull.c
@@ -103,6 +103,8 @@ int main(int argc, char **argv)
get_tree = 1;
} else if (argv[arg][1] == 'c') {
get_history = 1;
+ } else if (argv[arg][1] == 'd') {
+ get_delta = 0;
} else if (argv[arg][1] == 'a') {
get_all = 1;
get_tree = 1;
@@ -113,7 +115,7 @@ int main(int argc, char **argv)
arg++;
}
if (argc < arg + 2) {
- usage("git-http-pull [-c] [-t] [-a] [-v] commit-id url");
+ usage("git-http-pull [-c] [-t] [-a] [-d] [-v] commit-id url");
return 1;
}
commit_id = argv[arg];
diff --git a/local-pull.c b/local-pull.c
--- a/local-pull.c
+++ b/local-pull.c
@@ -74,7 +74,7 @@ int fetch(unsigned char *sha1)
}
static const char *local_pull_usage =
-"git-local-pull [-c] [-t] [-a] [-l] [-s] [-n] [-v] commit-id path";
+"git-local-pull [-c] [-t] [-a] [-l] [-s] [-n] [-v] [-d] commit-id path";
/*
* By default we only use file copy.
@@ -92,6 +92,8 @@ int main(int argc, char **argv)
get_tree = 1;
else if (argv[arg][1] == 'c')
get_history = 1;
+ else if (argv[arg][1] == 'd')
+ get_delta = 0;
else if (argv[arg][1] == 'a') {
get_all = 1;
get_tree = 1;
diff --git a/pull.c b/pull.c
--- a/pull.c
+++ b/pull.c
@@ -6,6 +6,7 @@
int get_tree = 0;
int get_history = 0;
+int get_delta = 1;
int get_all = 0;
int get_verbosely = 0;
static unsigned char current_commit_sha1[20];
@@ -37,6 +38,12 @@ static int make_sure_we_have_it(const ch
status = fetch(sha1);
if (status && what)
report_missing(what, sha1);
+ if (get_delta) {
+ char delta_sha1[20];
+ status = sha1_delta_base(sha1, delta_sha1);
+ if (0 < status)
+ status = make_sure_we_have_it(what, delta_sha1);
+ }
return status;
}
diff --git a/rpull.c b/rpull.c
--- a/rpull.c
+++ b/rpull.c
@@ -27,6 +27,8 @@ int main(int argc, char **argv)
get_tree = 1;
} else if (argv[arg][1] == 'c') {
get_history = 1;
+ } else if (argv[arg][1] == 'd') {
+ get_delta = 0;
} else if (argv[arg][1] == 'a') {
get_all = 1;
get_tree = 1;
@@ -37,7 +39,7 @@ int main(int argc, char **argv)
arg++;
}
if (argc < arg + 2) {
- usage("git-rpull [-c] [-t] [-a] [-v] commit-id url");
+ usage("git-rpull [-c] [-t] [-a] [-v] [-d] commit-id url");
return 1;
}
commit_id = argv[arg];
diff --git a/sha1_file.c b/sha1_file.c
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -325,7 +325,13 @@ void *unpack_sha1_rest(z_stream *stream,
int bytes = strlen(buffer) + 1;
char *buf = xmalloc(1+size);
- memcpy(buf, buffer + bytes, stream->total_out - bytes);
+ /* (stream->total_out - bytes) is what we already have. The
+ * caller could be asking for something smaller than that.
+ */
+ if (size < stream->total_out - bytes)
+ memcpy(buf, buffer + bytes, size);
+ else
+ memcpy(buf, buffer + bytes, stream->total_out - bytes);
bytes = stream->total_out - bytes;
if (bytes < size) {
stream->next_out = buf + bytes;
@@ -401,6 +407,41 @@ void * unpack_sha1_file(void *map, unsig
return unpack_sha1_rest(&stream, hdr, *size);
}
+int sha1_delta_base(const unsigned char *sha1, unsigned char *base_sha1)
+{
+ int ret;
+ unsigned long mapsize, size;
+ void *map;
+ z_stream stream;
+ char hdr[1024], type[20];
+ void *delta_data_head;
+
+ map = map_sha1_file(sha1, &mapsize);
+ if (!map)
+ return -1;
+ ret = unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr));
+ if (ret < Z_OK || parse_sha1_header(hdr, type, &size) < 0) {
+ ret = -1;
+ goto out;
+ }
+ if (strcmp(type, "delta")) {
+ ret = 0;
+ goto out;
+ }
+ delta_data_head = unpack_sha1_rest(&stream, hdr, 20);
+ if (!delta_data_head) {
+ ret = -1;
+ goto out;
+ }
+ ret = 1;
+ memcpy(base_sha1, delta_data_head, 20);
+ free(delta_data_head);
+ out:
+ inflateEnd(&stream);
+ munmap(map, mapsize);
+ return ret;
+}
+
void * read_sha1_file(const unsigned char *sha1, char *type, unsigned long *size)
{
unsigned long mapsize;
------------
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] Find size of SHA1 object without inflating everything.
2005-06-02 17:03 ` Linus Torvalds
2005-06-02 18:55 ` Junio C Hamano
@ 2005-06-02 18:57 ` Junio C Hamano
2005-06-02 22:10 ` Linus Torvalds
1 sibling, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2005-06-02 18:57 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
This adds sha1_file_size() helper function and uses it in the
rename/copy similarity estimator. The helper function handles
deltified object as well.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
*** The U*MAX fix patch from previous round still applies, so I am
*** resending it to you. This probably would depend on it, so
*** please apply U*MAX fix patch before this one.
cache.h | 1 +
diff.c | 11 ++++++----
sha1_file.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 71 insertions(+), 5 deletions(-)
diff --git a/cache.h b/cache.h
--- a/cache.h
+++ b/cache.h
@@ -154,6 +154,7 @@ extern void * map_sha1_file(const unsign
extern int unpack_sha1_header(z_stream *stream, void *map, unsigned long mapsize, void *buffer, unsigned long size);
extern int parse_sha1_header(char *hdr, char *type, unsigned long *sizep);
extern int sha1_delta_base(const unsigned char *, unsigned char *);
+extern int sha1_file_size(const unsigned char *, unsigned long *);
extern void * unpack_sha1_file(void *map, unsigned long mapsize, char *type, unsigned long *size);
extern void * read_sha1_file(const unsigned char *sha1, char *type, unsigned long *size);
extern int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned char *return_sha1);
diff --git a/diff.c b/diff.c
--- a/diff.c
+++ b/diff.c
@@ -333,7 +333,6 @@ int diff_populate_filespec(struct diff_f
close(fd);
}
else {
- /* We cannot do size only for SHA1 blobs */
char type[20];
struct sha1_size_cache *e;
@@ -343,11 +342,13 @@ int diff_populate_filespec(struct diff_f
s->size = e->size;
return 0;
}
+ if (!sha1_file_size(s->sha1, &s->size))
+ locate_size_cache(s->sha1, s->size);
+ }
+ else {
+ s->data = read_sha1_file(s->sha1, type, &s->size);
+ s->should_free = 1;
}
- s->data = read_sha1_file(s->sha1, type, &s->size);
- s->should_free = 1;
- if (s->data && size_only)
- locate_size_cache(s->sha1, s->size);
}
return 0;
}
diff --git a/sha1_file.c b/sha1_file.c
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -442,6 +442,70 @@ int sha1_delta_base(const unsigned char
return ret;
}
+int sha1_file_size(const unsigned char *sha1, unsigned long *sizep)
+{
+ int ret, status;
+ unsigned long mapsize, size;
+ void *map;
+ z_stream stream;
+ char hdr[1024], type[20];
+ void *delta_data_head;
+ const unsigned char *data;
+ unsigned char cmd;
+ int i;
+
+ map = map_sha1_file(sha1, &mapsize);
+ if (!map)
+ return -1;
+ ret = unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr));
+ status = -1;
+ if (ret < Z_OK || parse_sha1_header(hdr, type, &size) < 0)
+ goto out;
+ if (strcmp(type, "delta")) {
+ *sizep = size;
+ status = 0;
+ goto out;
+ }
+
+ /* We are dealing with delta object. Inflated, the first 20
+ * bytes hold the base object SHA1, and delta data follows
+ * immediately after it.
+ */
+ delta_data_head = unpack_sha1_rest(&stream, hdr, 200);
+
+ /* The initial part of the delta starts at delta_data_head +
+ * 20. Borrow code from patch-delta to read the result size.
+ */
+
+ data = delta_data_head + 20;
+
+ /* Skip over the source size; we are not interested in
+ * it and we cannot verify it because we do not want
+ * to read the base object.
+ */
+ cmd = *data++;
+ while (cmd) {
+ if (cmd & 1)
+ data++;
+ cmd >>= 1;
+ }
+ /* Read the result size */
+ size = i = 0;
+ cmd = *data++;
+ while (cmd) {
+ if (cmd & 1)
+ size |= *data++ << i;
+ i += 8;
+ cmd >>= 1;
+ }
+ *sizep = size;
+ status = 0;
+ out:
+ inflateEnd(&stream);
+ munmap(map, mapsize);
+ return status;
+}
+
void * read_sha1_file(const unsigned char *sha1, char *type, unsigned long *size)
{
unsigned long mapsize;
------------
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Handle deltified object correctly in git-*-pull family.
2005-06-02 18:55 ` Junio C Hamano
@ 2005-06-02 21:31 ` Nicolas Pitre
2005-06-02 21:36 ` Nicolas Pitre
0 siblings, 1 reply; 20+ messages in thread
From: Nicolas Pitre @ 2005-06-02 21:31 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Linus Torvalds, git
On Thu, 2 Jun 2005, Junio C Hamano wrote:
> The initial parts of each retrieved SHA1 file is inflated and
> inspected to see if it is deltified, and its base object is
> asked from the remote side when it is. Since this partial
> inflation and inspection has a small performance hit, it can
> optionally be skipped by giving -d flag to git-*-pull commands.
It is still way more expensive than it could.
> diff --git a/sha1_file.c b/sha1_file.c
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -325,7 +325,13 @@ void *unpack_sha1_rest(z_stream *stream,
> int bytes = strlen(buffer) + 1;
> char *buf = xmalloc(1+size);
>
> - memcpy(buf, buffer + bytes, stream->total_out - bytes);
> + /* (stream->total_out - bytes) is what we already have. The
> + * caller could be asking for something smaller than that.
> + */
> + if (size < stream->total_out - bytes)
> + memcpy(buf, buffer + bytes, size);
> + else
> + memcpy(buf, buffer + bytes, stream->total_out - bytes);
> bytes = stream->total_out - bytes;
> if (bytes < size) {
> stream->next_out = buf + bytes;
This hunk is completely unneeded.
> @@ -401,6 +407,41 @@ void * unpack_sha1_file(void *map, unsig
> return unpack_sha1_rest(&stream, hdr, *size);
> }
>
> +int sha1_delta_base(const unsigned char *sha1, unsigned char *base_sha1)
> +{
> + int ret;
> + unsigned long mapsize, size;
> + void *map;
> + z_stream stream;
> + char hdr[1024], type[20];
Don't make hdr 1024 bytes long. If you do so unpack_sha1_header() will
uncompress up to 1024 bytes which is way above required. Instead,
consider a value of say 64 which is plenty sufficient (10 for the type
string, another 10 for the size, 20 for the reference sha1 and another
10 for the beginning of the delta data that must include the size, and
the rest for good measure).
> + void *delta_data_head;
> +
> + map = map_sha1_file(sha1, &mapsize);
> + if (!map)
> + return -1;
> + ret = unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr));
> + if (ret < Z_OK || parse_sha1_header(hdr, type, &size) < 0) {
> + ret = -1;
> + goto out;
> + }
> + if (strcmp(type, "delta")) {
> + ret = 0;
> + goto out;
> + }
> + delta_data_head = unpack_sha1_rest(&stream, hdr, 20);
Here you don't need to call unpack_sha1_rest() at all which would call
xmalloc and another memcpy needlessly. Instead, just use:
memcpy(base_sha1, hdr + strlen(hdr) + 1, 20);
and you're done. No need to call an extra free() either.
And maybe this function should live in delta.c instead?
Nicolas
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Handle deltified object correctly in git-*-pull family.
2005-06-02 21:31 ` Nicolas Pitre
@ 2005-06-02 21:36 ` Nicolas Pitre
2005-06-02 22:19 ` [PATCH 1/2] " Junio C Hamano
2005-06-02 22:20 ` [PATCH 2/2] Find size of SHA1 object without inflating everything Junio C Hamano
0 siblings, 2 replies; 20+ messages in thread
From: Nicolas Pitre @ 2005-06-02 21:36 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Linus Torvalds, git
On Thu, 2 Jun 2005, Nicolas Pitre wrote:
> Here you don't need to call unpack_sha1_rest() at all which would call
> xmalloc and another memcpy needlessly. Instead, just use:
>
> memcpy(base_sha1, hdr + strlen(hdr) + 1, 20);
>
> and you're done. No need to call an extra free() either.
>
> And maybe this function should live in delta.c instead?
Forget about my suggestion of moving it to delta.c. Since it assumes
knowledge of the object header format it better stay close to the other
functions in sha1_file.c.
Nicolas
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Find size of SHA1 object without inflating everything.
2005-06-02 18:57 ` [PATCH] " Junio C Hamano
@ 2005-06-02 22:10 ` Linus Torvalds
2005-06-02 22:47 ` Junio C Hamano
0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2005-06-02 22:10 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Thu, 2 Jun 2005, Junio C Hamano wrote:
>
> +int sha1_file_size(const unsigned char *sha1, unsigned long *sizep)
...
> + ret = unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr));
...
> + delta_data_head = unpack_sha1_rest(&stream, hdr, 200);
Why do you do this? You've already unpacked 1024 bytes (including the
header), now you want to unpack at least 200 bytes past the header (which
is less than what you already did.
So here "unpack_sha1_rest()" just ends up being a "xmalloc + memcpy", but
since you don't actually want the malloc (indeed, you're leaking it, as
far as I can tell), it seems to be all bad..
Linus
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 1/2] Handle deltified object correctly in git-*-pull family.
2005-06-02 21:36 ` Nicolas Pitre
@ 2005-06-02 22:19 ` Junio C Hamano
2005-06-02 22:48 ` Linus Torvalds
2005-06-02 22:20 ` [PATCH 2/2] Find size of SHA1 object without inflating everything Junio C Hamano
1 sibling, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2005-06-02 22:19 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Nicolas Pitre, git
>>>>> "NP" == Nicolas Pitre <nico@cam.org> writes:
>> Here you don't need to call unpack_sha1_rest() at all which would call
>> xmalloc and another memcpy needlessly. Instead,...
Like this...
------------
When a remote repository is deltified, we need to get the
objects that a deltified object we want to obtain is based upon.
The initial parts of each retrieved SHA1 file is inflated and
inspected to see if it is deltified, and its base object is
asked from the remote side when it is. Since this partial
inflation and inspection has a small performance hit, it can
optionally be skipped by giving -d flag to git-*-pull commands.
This flag should be used only when the remote repository is
known to have no deltified objects.
Rsync transport does not have this problem since it fetches
everything the remote side has.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
*** Thanks and credits goes to Nico for suggesting not to
*** use unpack_sha1_rest().
Documentation/git-http-pull.txt | 6 +++++-
Documentation/git-local-pull.txt | 6 +++++-
Documentation/git-rpull.txt | 6 +++++-
cache.h | 1 +
pull.h | 3 +++
http-pull.c | 4 +++-
local-pull.c | 4 +++-
pull.c | 7 +++++++
rpull.c | 4 +++-
sha1_file.c | 31 +++++++++++++++++++++++++++++++
10 files changed, 66 insertions(+), 6 deletions(-)
diff --git a/Documentation/git-http-pull.txt b/Documentation/git-http-pull.txt
--- a/Documentation/git-http-pull.txt
+++ b/Documentation/git-http-pull.txt
@@ -9,7 +9,7 @@ git-http-pull - Downloads a remote GIT r
SYNOPSIS
--------
-'git-http-pull' [-c] [-t] [-a] [-v] commit-id url
+'git-http-pull' [-c] [-t] [-a] [-v] [-d] commit-id url
DESCRIPTION
-----------
@@ -21,6 +21,10 @@ Downloads a remote GIT repository via HT
Get trees associated with the commit objects.
-a::
Get all the objects.
+-d::
+ Do not check for delta base objects (use this option
+ only when you know the remote repository is not
+ deltified).
-v::
Report what is downloaded.
diff --git a/Documentation/git-local-pull.txt b/Documentation/git-local-pull.txt
--- a/Documentation/git-local-pull.txt
+++ b/Documentation/git-local-pull.txt
@@ -9,7 +9,7 @@ git-local-pull - Duplicates another GIT
SYNOPSIS
--------
-'git-local-pull' [-c] [-t] [-a] [-l] [-s] [-n] [-v] commit-id path
+'git-local-pull' [-c] [-t] [-a] [-l] [-s] [-n] [-v] [-d] commit-id path
DESCRIPTION
-----------
@@ -23,6 +23,10 @@ OPTIONS
Get trees associated with the commit objects.
-a::
Get all the objects.
+-d::
+ Do not check for delta base objects (use this option
+ only when you know the remote repository is not
+ deltified).
-v::
Report what is downloaded.
diff --git a/Documentation/git-rpull.txt b/Documentation/git-rpull.txt
--- a/Documentation/git-rpull.txt
+++ b/Documentation/git-rpull.txt
@@ -10,7 +10,7 @@ git-rpull - Pulls from a remote reposito
SYNOPSIS
--------
-'git-rpull' [-c] [-t] [-a] [-v] commit-id url
+'git-rpull' [-c] [-t] [-a] [-d] [-v] commit-id url
DESCRIPTION
-----------
@@ -25,6 +25,10 @@ OPTIONS
Get trees associated with the commit objects.
-a::
Get all the objects.
+-d::
+ Do not check for delta base objects (use this option
+ only when you know the remote repository is not
+ deltified).
-v::
Report what is downloaded.
diff --git a/cache.h b/cache.h
--- a/cache.h
+++ b/cache.h
@@ -153,6 +153,7 @@ extern char *sha1_file_name(const unsign
extern void * map_sha1_file(const unsigned char *sha1, unsigned long *size);
extern int unpack_sha1_header(z_stream *stream, void *map, unsigned long mapsize, void *buffer, unsigned long size);
extern int parse_sha1_header(char *hdr, char *type, unsigned long *sizep);
+extern int sha1_delta_base(const unsigned char *, unsigned char *);
extern void * unpack_sha1_file(void *map, unsigned long mapsize, char *type, unsigned long *size);
extern void * read_sha1_file(const unsigned char *sha1, char *type, unsigned long *size);
extern int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned char *return_sha1);
diff --git a/pull.h b/pull.h
--- a/pull.h
+++ b/pull.h
@@ -13,6 +13,9 @@ extern int get_history;
/** Set to fetch the trees in the commit history. **/
extern int get_all;
+/* Set to zero to skip the check for delta object base. */
+extern int get_delta;
+
/* Set to be verbose */
extern int get_verbosely;
diff --git a/http-pull.c b/http-pull.c
--- a/http-pull.c
+++ b/http-pull.c
@@ -103,6 +103,8 @@ int main(int argc, char **argv)
get_tree = 1;
} else if (argv[arg][1] == 'c') {
get_history = 1;
+ } else if (argv[arg][1] == 'd') {
+ get_delta = 0;
} else if (argv[arg][1] == 'a') {
get_all = 1;
get_tree = 1;
@@ -113,7 +115,7 @@ int main(int argc, char **argv)
arg++;
}
if (argc < arg + 2) {
- usage("git-http-pull [-c] [-t] [-a] [-v] commit-id url");
+ usage("git-http-pull [-c] [-t] [-a] [-d] [-v] commit-id url");
return 1;
}
commit_id = argv[arg];
diff --git a/local-pull.c b/local-pull.c
--- a/local-pull.c
+++ b/local-pull.c
@@ -74,7 +74,7 @@ int fetch(unsigned char *sha1)
}
static const char *local_pull_usage =
-"git-local-pull [-c] [-t] [-a] [-l] [-s] [-n] [-v] commit-id path";
+"git-local-pull [-c] [-t] [-a] [-l] [-s] [-n] [-v] [-d] commit-id path";
/*
* By default we only use file copy.
@@ -92,6 +92,8 @@ int main(int argc, char **argv)
get_tree = 1;
else if (argv[arg][1] == 'c')
get_history = 1;
+ else if (argv[arg][1] == 'd')
+ get_delta = 0;
else if (argv[arg][1] == 'a') {
get_all = 1;
get_tree = 1;
diff --git a/pull.c b/pull.c
--- a/pull.c
+++ b/pull.c
@@ -6,6 +6,7 @@
int get_tree = 0;
int get_history = 0;
+int get_delta = 1;
int get_all = 0;
int get_verbosely = 0;
static unsigned char current_commit_sha1[20];
@@ -37,6 +38,12 @@ static int make_sure_we_have_it(const ch
status = fetch(sha1);
if (status && what)
report_missing(what, sha1);
+ if (get_delta) {
+ char delta_sha1[20];
+ status = sha1_delta_base(sha1, delta_sha1);
+ if (0 < status)
+ status = make_sure_we_have_it(what, delta_sha1);
+ }
return status;
}
diff --git a/rpull.c b/rpull.c
--- a/rpull.c
+++ b/rpull.c
@@ -27,6 +27,8 @@ int main(int argc, char **argv)
get_tree = 1;
} else if (argv[arg][1] == 'c') {
get_history = 1;
+ } else if (argv[arg][1] == 'd') {
+ get_delta = 0;
} else if (argv[arg][1] == 'a') {
get_all = 1;
get_tree = 1;
@@ -37,7 +39,7 @@ int main(int argc, char **argv)
arg++;
}
if (argc < arg + 2) {
- usage("git-rpull [-c] [-t] [-a] [-v] commit-id url");
+ usage("git-rpull [-c] [-t] [-a] [-v] [-d] commit-id url");
return 1;
}
commit_id = argv[arg];
diff --git a/sha1_file.c b/sha1_file.c
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -401,6 +401,37 @@ void * unpack_sha1_file(void *map, unsig
return unpack_sha1_rest(&stream, hdr, *size);
}
+int sha1_delta_base(const unsigned char *sha1, unsigned char *base_sha1)
+{
+ int ret;
+ unsigned long mapsize, size;
+ void *map;
+ z_stream stream;
+ char hdr[64], type[20];
+ void *delta_data_head;
+
+ map = map_sha1_file(sha1, &mapsize);
+ if (!map)
+ return -1;
+ ret = unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr));
+ if (ret < Z_OK || parse_sha1_header(hdr, type, &size) < 0) {
+ ret = -1;
+ goto out;
+ }
+ if (strcmp(type, "delta")) {
+ ret = 0;
+ goto out;
+ }
+
+ delta_data_head = hdr + strlen(hdr) + 1;
+ ret = 1;
+ memcpy(base_sha1, delta_data_head, 20);
+ out:
+ inflateEnd(&stream);
+ munmap(map, mapsize);
+ return ret;
+}
+
void * read_sha1_file(const unsigned char *sha1, char *type, unsigned long *size)
{
unsigned long mapsize;
------------
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 2/2] Find size of SHA1 object without inflating everything.
2005-06-02 21:36 ` Nicolas Pitre
2005-06-02 22:19 ` [PATCH 1/2] " Junio C Hamano
@ 2005-06-02 22:20 ` Junio C Hamano
1 sibling, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2005-06-02 22:20 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Nicolas Pitre, git
This adds sha1_file_size() helper function and uses it in the
rename/copy similarity estimator. The helper function handles
deltified object as well.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
*** Thanks and credits goes to Nico for suggesting not to
*** use unpack_sha1_rest().
cache.h | 1 +
diff.c | 11 ++++++-----
sha1_file.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 67 insertions(+), 5 deletions(-)
diff --git a/cache.h b/cache.h
--- a/cache.h
+++ b/cache.h
@@ -154,6 +154,7 @@ extern void * map_sha1_file(const unsign
extern int unpack_sha1_header(z_stream *stream, void *map, unsigned long mapsize, void *buffer, unsigned long size);
extern int parse_sha1_header(char *hdr, char *type, unsigned long *sizep);
extern int sha1_delta_base(const unsigned char *, unsigned char *);
+extern int sha1_file_size(const unsigned char *, unsigned long *);
extern void * unpack_sha1_file(void *map, unsigned long mapsize, char *type, unsigned long *size);
extern void * read_sha1_file(const unsigned char *sha1, char *type, unsigned long *size);
extern int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned char *return_sha1);
diff --git a/diff.c b/diff.c
--- a/diff.c
+++ b/diff.c
@@ -333,7 +333,6 @@ int diff_populate_filespec(struct diff_f
close(fd);
}
else {
- /* We cannot do size only for SHA1 blobs */
char type[20];
struct sha1_size_cache *e;
@@ -343,11 +342,13 @@ int diff_populate_filespec(struct diff_f
s->size = e->size;
return 0;
}
+ if (!sha1_file_size(s->sha1, &s->size))
+ locate_size_cache(s->sha1, s->size);
+ }
+ else {
+ s->data = read_sha1_file(s->sha1, type, &s->size);
+ s->should_free = 1;
}
- s->data = read_sha1_file(s->sha1, type, &s->size);
- s->should_free = 1;
- if (s->data && size_only)
- locate_size_cache(s->sha1, s->size);
}
return 0;
}
diff --git a/sha1_file.c b/sha1_file.c
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -432,6 +432,66 @@ int sha1_delta_base(const unsigned char
return ret;
}
+int sha1_file_size(const unsigned char *sha1, unsigned long *sizep)
+{
+ int ret, status;
+ unsigned long mapsize, size;
+ void *map;
+ z_stream stream;
+ char hdr[64], type[20];
+ const unsigned char *data;
+ unsigned char cmd;
+ int i;
+
+ map = map_sha1_file(sha1, &mapsize);
+ if (!map)
+ return -1;
+ ret = unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr));
+ status = -1;
+ if (ret < Z_OK || parse_sha1_header(hdr, type, &size) < 0)
+ goto out;
+ if (strcmp(type, "delta")) {
+ *sizep = size;
+ status = 0;
+ goto out;
+ }
+
+ /* We are dealing with a delta object. Inflated, the first
+ * 20 bytes hold the base object SHA1, and delta data follows
+ * immediately after it.
+ *
+ * The initial part of the delta starts at delta_data_head +
+ * 20. Borrow code from patch-delta to read the result size.
+ */
+ data = hdr + strlen(hdr) + 1 + 20;
+
+ /* Skip over the source size; we are not interested in
+ * it and we cannot verify it because we do not want
+ * to read the base object.
+ */
+ cmd = *data++;
+ while (cmd) {
+ if (cmd & 1)
+ data++;
+ cmd >>= 1;
+ }
+ /* Read the result size */
+ size = i = 0;
+ cmd = *data++;
+ while (cmd) {
+ if (cmd & 1)
+ size |= *data++ << i;
+ i += 8;
+ cmd >>= 1;
+ }
+ *sizep = size;
+ status = 0;
+ out:
+ inflateEnd(&stream);
+ munmap(map, mapsize);
+ return status;
+}
+
void * read_sha1_file(const unsigned char *sha1, char *type, unsigned long *size)
{
unsigned long mapsize;
------------
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Find size of SHA1 object without inflating everything.
2005-06-02 22:10 ` Linus Torvalds
@ 2005-06-02 22:47 ` Junio C Hamano
0 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2005-06-02 22:47 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:
LT> On Thu, 2 Jun 2005, Junio C Hamano wrote:
>>
>> +int sha1_file_size(const unsigned char *sha1, unsigned long *sizep)
LT> ...
>> + ret = unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr));
LT> ...
>> + delta_data_head = unpack_sha1_rest(&stream, hdr, 200);
LT> Why do you do this?
Because I was not thinking when I wrote "char hdr[1024]".
Nico pointed out the same problem and you have a fixed version
of both in your mailbox.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] Handle deltified object correctly in git-*-pull family.
2005-06-02 22:19 ` [PATCH 1/2] " Junio C Hamano
@ 2005-06-02 22:48 ` Linus Torvalds
0 siblings, 0 replies; 20+ messages in thread
From: Linus Torvalds @ 2005-06-02 22:48 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Nicolas Pitre, git
On Thu, 2 Jun 2005, Junio C Hamano wrote:
>
> Like this...
Yup. Applied, thanks,
Linus
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Use correct U*MAX.
2005-06-02 16:47 ` [PATCH] Use correct U*MAX Junio C Hamano
@ 2005-06-03 23:02 ` Petr Baudis
2005-06-03 23:40 ` Junio C Hamano
0 siblings, 1 reply; 20+ messages in thread
From: Petr Baudis @ 2005-06-03 23:02 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Linus Torvalds, git
Dear diary, on Thu, Jun 02, 2005 at 06:47:33PM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> told me that...
> The largest "unsigned long" value is ULONG_MAX, not UINT_MAX.
>
> Signed-off-by: Junio C Hamano <junkio@cox.net>
> diff --git a/diff.c b/diff.c
> --- a/diff.c
> +++ b/diff.c
> @@ -256,7 +256,7 @@ static struct sha1_size_cache *locate_si
> first = next+1;
> }
> /* not found */
> - if (size == UINT_MAX)
> + if (size == ULONG_MAX)
> return NULL;
> /* insert to make it at "first" */
> if (sha1_size_cache_alloc <= sha1_size_cache_nr) {
> @@ -338,7 +338,7 @@ int diff_populate_filespec(struct diff_f
> struct sha1_size_cache *e;
>
> if (size_only) {
> - e = locate_size_cache(s->sha1, UINT_MAX);
> + e = locate_size_cache(s->sha1, ULONG_MAX);
> if (e) {
> s->size = e->size;
> return 0;
This one still applies, but it might be better to get rid of it
altogether, like...
[PATCH] Kill UINT_MAX usage in locate_size_cache()
Use -1 instead of UINT_MAX to indicate that locate_size_cache() should
do only the lookup and not create new stuff in case the looked up hash
was not found.
Signed-off-by: Petr Baudis <pasky@ucw.cz>
diff --git a/diff.c b/diff.c
--- a/diff.c
+++ b/diff.c
@@ -256,7 +256,7 @@ static struct sha1_size_cache *locate_si
first = next+1;
}
/* not found */
- if (size == UINT_MAX)
+ if (size == -1)
return NULL;
/* insert to make it at "first" */
if (sha1_size_cache_alloc <= sha1_size_cache_nr) {
@@ -337,7 +337,7 @@ int diff_populate_filespec(struct diff_f
struct sha1_size_cache *e;
if (size_only) {
- e = locate_size_cache(s->sha1, UINT_MAX);
+ e = locate_size_cache(s->sha1, -1);
if (e) {
s->size = e->size;
return 0;
|
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Use correct U*MAX.
2005-06-03 23:02 ` Petr Baudis
@ 2005-06-03 23:40 ` Junio C Hamano
2005-06-04 0:00 ` Petr Baudis
0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2005-06-03 23:40 UTC (permalink / raw)
To: Petr Baudis; +Cc: Linus Torvalds, git
I'd rather see it use the correct U*MAX.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Use correct U*MAX.
2005-06-03 23:40 ` Junio C Hamano
@ 2005-06-04 0:00 ` Petr Baudis
2005-06-04 0:09 ` Junio C Hamano
0 siblings, 1 reply; 20+ messages in thread
From: Petr Baudis @ 2005-06-04 0:00 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Linus Torvalds, git
Dear diary, on Sat, Jun 04, 2005 at 01:40:21AM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> told me that...
> I'd rather see it use the correct U*MAX.
Care to elaborate? It doesn't make sense to me to use any U*MAX stuff
there whatsoever. (And how do you define the "correct" U*MAX anyway?)
P.S.: Could you please always keep at least some context in the mail?
Thanks.
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Use correct U*MAX.
2005-06-04 0:00 ` Petr Baudis
@ 2005-06-04 0:09 ` Junio C Hamano
0 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2005-06-04 0:09 UTC (permalink / raw)
To: Petr Baudis; +Cc: Linus Torvalds, git
>>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:
>> I'd rather see it use the correct U*MAX.
PB> Care to elaborate? It doesn't make sense to me to use any U*MAX stuff
PB> there whatsoever. (And how do you define the "correct" U*MAX anyway?)
It just feels wrong to spell a parameter to the function
locate_size_cache() "-1" when I know the argument it expects is
of type unsigned long. And correct U*MAX for that case is
obviously ULONG_MAX, _assuming_ that you agree to the function's
(unwritten) calling convention of "passing the largest possible
value to me means 'do not create', not 'you are telling me that
sha1 is such a large file'".
If you feel strongly about that calling convention, you could
rewrite it to take the third argument "int do_not_create" and
pass that information separately, which is conceptually cleaner.
I just did not think that was worth it for such an internal
helper when I wrote it.
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2005-06-04 0:09 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <7vy89ums2l.fsf@assigned-by-dhcp.cox.net>
2005-06-01 18:38 ` [PATCH] diff: mode bits fixes Junio C Hamano
2005-06-02 16:46 ` [PATCH] Handle deltified object correctly in git-*-pull family Junio C Hamano
2005-06-02 17:03 ` Linus Torvalds
2005-06-02 18:55 ` Junio C Hamano
2005-06-02 21:31 ` Nicolas Pitre
2005-06-02 21:36 ` Nicolas Pitre
2005-06-02 22:19 ` [PATCH 1/2] " Junio C Hamano
2005-06-02 22:48 ` Linus Torvalds
2005-06-02 22:20 ` [PATCH 2/2] Find size of SHA1 object without inflating everything Junio C Hamano
2005-06-02 18:57 ` [PATCH] " Junio C Hamano
2005-06-02 22:10 ` Linus Torvalds
2005-06-02 22:47 ` Junio C Hamano
2005-06-02 17:03 ` [PATCH] Handle deltified object correctly in git-*-pull family McMullan, Jason
2005-06-02 18:02 ` Junio C Hamano
2005-06-02 16:47 ` [PATCH] Use correct U*MAX Junio C Hamano
2005-06-03 23:02 ` Petr Baudis
2005-06-03 23:40 ` Junio C Hamano
2005-06-04 0:00 ` Petr Baudis
2005-06-04 0:09 ` Junio C Hamano
2005-06-02 16:49 ` [PATCH] Find size of SHA1 object without inflating everything Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).