* [PATCH 0/5] add pack index v2 reading capability to git v1.4.4.4 @ 2008-07-16 6:31 Nicolas Pitre 2008-07-16 6:31 ` [PATCH 1/5] clean up pack index handling a bit Nicolas Pitre ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Nicolas Pitre @ 2008-07-16 6:31 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Well, here it is. I know I know, I'm just too nice. Oh well... This is the minimum set of backported patches on top of git v1.4.4.4 to allow it to cope with pack index version 2. Junio: if you could just apply them, tag the result as v1.4.4.5 and push it out then at that point it simply will be up to Debian to make it available as a "major usability fix". Nicolas ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/5] clean up pack index handling a bit 2008-07-16 6:31 [PATCH 0/5] add pack index v2 reading capability to git v1.4.4.4 Nicolas Pitre @ 2008-07-16 6:31 ` Nicolas Pitre 2008-07-16 6:31 ` [PATCH 2/5] clean up and optimize nth_packed_object_sha1() usage Nicolas Pitre 2008-07-16 10:46 ` [PATCH 0/5] add pack index v2 reading capability to git v1.4.4.4 Johannes Schindelin 2008-07-16 16:25 ` Linus Torvalds 2 siblings, 1 reply; 12+ messages in thread From: Nicolas Pitre @ 2008-07-16 6:31 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Especially with the new index format to come, it is more appropriate to encapsulate more into check_packed_git_idx() and assume less of the index format in struct packed_git. To that effect, the index_base is renamed to index_data with void * type so it is not used directly but other pointers initialized with it. This allows for a couple pointer cast removal, as well as providing a better generic name to grep for when adding support for new index versions or formats. And index_data is declared const too while at it. (based on commit 4287307833a7c67b09973fc1023311e473f830b2) Signed-off-by: Nicolas Pitre <nico@cam.org> --- builtin-pack-objects.c | 14 ++++--- cache.h | 5 ++- pack-check.c | 8 ++-- pack-redundant.c | 30 ++++++++------- pack.h | 28 ++++++++++++++ sha1_file.c | 94 ++++++++++++++++++++++++++---------------------- 6 files changed, 110 insertions(+), 69 deletions(-) diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index 69e5dd3..ae051f9 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -170,11 +170,12 @@ static void prepare_pack_revindex(struct pack_revindex *rix) struct packed_git *p = rix->p; int num_ent = num_packed_objects(p); int i; - void *index = p->index_base + 256; + const char *index = p->index_data; + index += 4 * 256; rix->revindex = xmalloc(sizeof(*rix->revindex) * (num_ent + 1)); for (i = 0; i < num_ent; i++) { - unsigned int hl = *((unsigned int *)((char *) index + 24*i)); + uint32_t hl = *((uint32_t *)(index + 24 * i)); rix->revindex[i].offset = ntohl(hl); rix->revindex[i].nr = i; } @@ -222,11 +223,11 @@ static unsigned long find_packed_object_size(struct packed_git *p, return entry[1].offset - ofs; } -static unsigned char *find_packed_object_name(struct packed_git *p, - unsigned long ofs) +static const unsigned char *find_packed_object_name(struct packed_git *p, + unsigned long ofs) { struct revindex_entry *entry = find_packed_object(p, ofs); - return (unsigned char *)(p->index_base + 256) + 24 * entry->nr + 4; + return ((unsigned char *)p->index_data) + 4 * 256 + 24 * entry->nr + 4; } static void *delta_against(void *buf, unsigned long size, struct object_entry *entry) @@ -959,7 +960,8 @@ static void check_object(struct object_entry *entry) * delta. */ if (!no_reuse_delta) { - unsigned char c, *base_name; + unsigned char c; + const unsigned char *base_name; unsigned long ofs; /* there is at least 20 bytes left in the pack */ switch (entry->in_pack_type) { diff --git a/cache.h b/cache.h index a0e9727..cc8e84e 100644 --- a/cache.h +++ b/cache.h @@ -334,8 +334,9 @@ extern struct packed_git { struct packed_git *next; unsigned long index_size; unsigned long pack_size; - unsigned int *index_base; + const void *index_data; void *pack_base; + int index_version; unsigned int pack_last_used; unsigned int pack_use_cnt; int pack_local; @@ -374,7 +375,7 @@ extern int server_supports(const char *feature); extern struct packed_git *parse_pack_index(unsigned char *sha1); extern struct packed_git *parse_pack_index_file(const unsigned char *sha1, - char *idx_path); + const char *idx_path); extern void prepare_packed_git(void); extern void reprepare_packed_git(void); diff --git a/pack-check.c b/pack-check.c index 8e123b7..7db44e9 100644 --- a/pack-check.c +++ b/pack-check.c @@ -6,7 +6,7 @@ static int verify_packfile(struct packed_git *p) { unsigned long index_size = p->index_size; - void *index_base = p->index_base; + const unsigned char *index_base = p->index_data; SHA_CTX ctx; unsigned char sha1[20]; struct pack_header *hdr; @@ -42,7 +42,7 @@ static int verify_packfile(struct packed_git *p) if (hashcmp(sha1, (unsigned char *)(p->pack_base) + p->pack_size - 20)) return error("Packfile %s SHA1 mismatch with itself", p->pack_name); - if (hashcmp(sha1, (unsigned char *)index_base + index_size - 40)) + if (hashcmp(sha1, index_base + index_size - 40)) return error("Packfile %s SHA1 mismatch with idx", p->pack_name); @@ -136,7 +136,7 @@ static void show_pack_info(struct packed_git *p) int verify_pack(struct packed_git *p, int verbose) { unsigned long index_size = p->index_size; - void *index_base = p->index_base; + const unsigned char *index_base = p->index_data; SHA_CTX ctx; unsigned char sha1[20]; int ret; @@ -146,7 +146,7 @@ int verify_pack(struct packed_git *p, int verbose) SHA1_Init(&ctx); SHA1_Update(&ctx, index_base, index_size - 20); SHA1_Final(sha1, &ctx); - if (hashcmp(sha1, (unsigned char *)index_base + index_size - 20)) + if (hashcmp(sha1, index_base + index_size - 20)) ret = error("Packfile index for %s SHA1 mismatch", p->pack_name); diff --git a/pack-redundant.c b/pack-redundant.c index edb5524..83812f3 100644 --- a/pack-redundant.c +++ b/pack-redundant.c @@ -17,7 +17,7 @@ static int load_all_packs, verbose, alt_odb; struct llist_item { struct llist_item *next; - unsigned char *sha1; + const unsigned char *sha1; }; static struct llist { struct llist_item *front; @@ -104,9 +104,9 @@ static struct llist * llist_copy(struct llist *list) return ret; } -static inline struct llist_item * llist_insert(struct llist *list, - struct llist_item *after, - unsigned char *sha1) +static inline struct llist_item *llist_insert(struct llist *list, + struct llist_item *after, + const unsigned char *sha1) { struct llist_item *new = llist_item_get(); new->sha1 = sha1; @@ -128,12 +128,14 @@ static inline struct llist_item * llist_insert(struct llist *list, return new; } -static inline struct llist_item *llist_insert_back(struct llist *list, unsigned char *sha1) +static inline struct llist_item *llist_insert_back(struct llist *list, + const unsigned char *sha1) { return llist_insert(list, list->back, sha1); } -static inline struct llist_item *llist_insert_sorted_unique(struct llist *list, unsigned char *sha1, struct llist_item *hint) +static inline struct llist_item *llist_insert_sorted_unique(struct llist *list, + const unsigned char *sha1, struct llist_item *hint) { struct llist_item *prev = NULL, *l; @@ -246,12 +248,12 @@ static struct pack_list * pack_list_difference(const struct pack_list *A, static void cmp_two_packs(struct pack_list *p1, struct pack_list *p2) { int p1_off, p2_off; - unsigned char *p1_base, *p2_base; + const unsigned char *p1_base, *p2_base; struct llist_item *p1_hint = NULL, *p2_hint = NULL; p1_off = p2_off = 256 * 4 + 4; - p1_base = (unsigned char *) p1->pack->index_base; - p2_base = (unsigned char *) p2->pack->index_base; + p1_base = p1->pack->index_data; + p2_base = p2->pack->index_data; while (p1_off <= p1->pack->index_size - 3 * 20 && p2_off <= p2->pack->index_size - 3 * 20) @@ -351,11 +353,11 @@ static size_t sizeof_union(struct packed_git *p1, struct packed_git *p2) { size_t ret = 0; int p1_off, p2_off; - unsigned char *p1_base, *p2_base; + const unsigned char *p1_base, *p2_base; p1_off = p2_off = 256 * 4 + 4; - p1_base = (unsigned char *)p1->index_base; - p2_base = (unsigned char *)p2->index_base; + p1_base = p1->index_data; + p2_base = p2->index_data; while (p1_off <= p1->index_size - 3 * 20 && p2_off <= p2->index_size - 3 * 20) @@ -534,7 +536,7 @@ static struct pack_list * add_pack(struct packed_git *p) { struct pack_list l; size_t off; - unsigned char *base; + const unsigned char *base; if (!p->pack_local && !(alt_odb || verbose)) return NULL; @@ -543,7 +545,7 @@ static struct pack_list * add_pack(struct packed_git *p) llist_init(&l.all_objects); off = 256 * 4 + 4; - base = (unsigned char *)p->index_base; + base = p->index_data; while (off <= p->index_size - 3 * 20) { llist_insert_back(l.all_objects, base + off); off += 24; diff --git a/pack.h b/pack.h index 4814800..e0051fd 100644 --- a/pack.h +++ b/pack.h @@ -15,5 +15,33 @@ struct pack_header { unsigned int hdr_entries; }; +/* + * The first four bytes of index formats later than version 1 should + * start with this signature, as all older git binaries would find this + * value illegal and abort reading the file. + * + * This is the case because the number of objects in a packfile + * cannot exceed 1,431,660,000 as every object would need at least + * 3 bytes of data and the overall packfile cannot exceed 4 GiB with + * version 1 of the index file due to the offsets limited to 32 bits. + * Clearly the signature exceeds this maximum. + * + * Very old git binaries will also compare the first 4 bytes to the + * next 4 bytes in the index and abort with a "non-monotonic index" + * error if the second 4 byte word is smaller than the first 4 + * byte word. This would be true in the proposed future index + * format as idx_signature would be greater than idx_version. + */ +#define PACK_IDX_SIGNATURE 0xff744f63 /* "\377tOc" */ + +/* + * Packed object index header + */ +struct pack_idx_header { + uint32_t idx_signature; + uint32_t idx_version; +}; + + extern int verify_pack(struct packed_git *, int); #endif diff --git a/sha1_file.c b/sha1_file.c index 09456d2..df31462 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -402,15 +402,15 @@ static int pack_used_ctr; static unsigned long pack_mapped; struct packed_git *packed_git; -static int check_packed_git_idx(const char *path, unsigned long *idx_size_, - void **idx_map_) +static int check_packed_git_idx(const char *path, struct packed_git *p) { void *idx_map; - unsigned int *index; + struct pack_idx_header *hdr; unsigned long idx_size; - int nr, i; + unsigned int nr, i, *index; int fd = open(path, O_RDONLY); struct stat st; + if (fd < 0) return -1; if (fstat(fd, &st)) { @@ -423,14 +423,21 @@ static int check_packed_git_idx(const char *path, unsigned long *idx_size_, if (idx_map == MAP_FAILED) return -1; - index = idx_map; - *idx_map_ = idx_map; - *idx_size_ = idx_size; + /* a future index format would start with this, as older git + * binaries would fail the non-monotonic index check below. + * give a nicer warning to the user if we can. + */ + hdr = idx_map; + if (hdr->idx_signature == htonl(PACK_IDX_SIGNATURE)) { + munmap(idx_map, idx_size); + return error("index file %s is a newer version" + " and is not supported by this binary" + " (try upgrading GIT to a newer version)", + path); + } - /* check index map */ - if (idx_size < 4*256 + 20 + 20) - return error("index file too small"); nr = 0; + index = idx_map; for (i = 0; i < 256; i++) { unsigned int n = ntohl(index[i]); if (n < nr) @@ -448,6 +455,9 @@ static int check_packed_git_idx(const char *path, unsigned long *idx_size_, if (idx_size != 4*256 + nr * 24 + 20 + 20) return error("wrong index file size"); + p->index_version = 1; + p->index_data = idx_map; + p->index_size = idx_size; return 0; } @@ -521,7 +531,7 @@ int use_packed_git(struct packed_git *p) /* Check if the pack file matches with the index file. * this is cheap. */ - if (hashcmp((unsigned char *)(p->index_base) + + if (hashcmp((unsigned char *)(p->index_data) + p->index_size - 40, (unsigned char *)p->pack_base + p->pack_size - 20)) { @@ -536,35 +546,34 @@ int use_packed_git(struct packed_git *p) struct packed_git *add_packed_git(char *path, int path_len, int local) { struct stat st; - struct packed_git *p; - unsigned long idx_size; - void *idx_map; - unsigned char sha1[20]; + struct packed_git *p = xmalloc(sizeof(*p) + path_len + 2); - if (check_packed_git_idx(path, &idx_size, &idx_map)) + /* + * Make sure a corresponding .pack file exists and that + * the index looks sane. + */ + path_len -= strlen(".idx"); + if (path_len < 1) return NULL; - - /* do we have a corresponding .pack file? */ - strcpy(path + path_len - 4, ".pack"); - if (stat(path, &st) || !S_ISREG(st.st_mode)) { - munmap(idx_map, idx_size); + memcpy(p->pack_name, path, path_len); + strcpy(p->pack_name + path_len, ".pack"); + if (stat(p->pack_name, &st) || !S_ISREG(st.st_mode) || + check_packed_git_idx(path, p)) { + free(p); return NULL; } + /* ok, it looks sane as far as we can check without * actually mapping the pack file. */ - p = xmalloc(sizeof(*p) + path_len + 2); - strcpy(p->pack_name, path); - p->index_size = idx_size; p->pack_size = st.st_size; - p->index_base = idx_map; p->next = NULL; p->pack_base = NULL; p->pack_last_used = 0; p->pack_use_cnt = 0; p->pack_local = local; - if ((path_len > 44) && !get_sha1_hex(path + path_len - 44, sha1)) - hashcpy(p->sha1, sha1); + if (path_len < 40 || get_sha1_hex(path + path_len - 40, p->sha1)) + hashclr(p->sha1); return p; } @@ -574,23 +583,19 @@ struct packed_git *parse_pack_index(unsigned char *sha1) return parse_pack_index_file(sha1, path); } -struct packed_git *parse_pack_index_file(const unsigned char *sha1, char *idx_path) +struct packed_git *parse_pack_index_file(const unsigned char *sha1, + const char *idx_path) { - struct packed_git *p; - unsigned long idx_size; - void *idx_map; - char *path; + const char *path = sha1_pack_name(sha1); + struct packed_git *p = xmalloc(sizeof(*p) + strlen(path) + 2); - if (check_packed_git_idx(idx_path, &idx_size, &idx_map)) + if (check_packed_git_idx(idx_path, p)) { + free(p); return NULL; + } - path = sha1_pack_name(sha1); - - p = xmalloc(sizeof(*p) + strlen(path) + 2); strcpy(p->pack_name, path); - p->index_size = idx_size; p->pack_size = 0; - p->index_base = idx_map; p->next = NULL; p->pack_base = NULL; p->pack_last_used = 0; @@ -1175,24 +1180,27 @@ int num_packed_objects(const struct packed_git *p) int nth_packed_object_sha1(const struct packed_git *p, int n, unsigned char* sha1) { - void *index = p->index_base + 256; + const unsigned char *index = p->index_data; + index += 4 * 256; if (n < 0 || num_packed_objects(p) <= n) return -1; - hashcpy(sha1, (unsigned char *) index + (24 * n) + 4); + hashcpy(sha1, index + 24 * n + 4); return 0; } unsigned long find_pack_entry_one(const unsigned char *sha1, struct packed_git *p) { - unsigned int *level1_ofs = p->index_base; + const unsigned int *level1_ofs = p->index_data; int hi = ntohl(level1_ofs[*sha1]); int lo = ((*sha1 == 0x0) ? 0 : ntohl(level1_ofs[*sha1 - 1])); - void *index = p->index_base + 256; + const unsigned char *index = p->index_data; + + index += 4 * 256; do { int mi = (lo + hi) / 2; - int cmp = hashcmp((unsigned char *)index + (24 * mi) + 4, sha1); + int cmp = hashcmp(index + 24 * mi + 4, sha1); if (!cmp) return ntohl(*((unsigned int *) ((char *) index + (24 * mi)))); if (cmp > 0) -- 1.5.6.3.499.geae9 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/5] clean up and optimize nth_packed_object_sha1() usage 2008-07-16 6:31 ` [PATCH 1/5] clean up pack index handling a bit Nicolas Pitre @ 2008-07-16 6:31 ` Nicolas Pitre 2008-07-16 6:31 ` [PATCH 3/5] get rid of num_packed_objects() Nicolas Pitre 0 siblings, 1 reply; 12+ messages in thread From: Nicolas Pitre @ 2008-07-16 6:31 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Let's avoid the open coded pack index reference in pack-object and use nth_packed_object_sha1() instead. This will help encapsulating index format differences in one place. And while at it there is no reason to copy SHA1's over and over while a direct pointer to it in the index will do just fine. (based on commit d72308e01c5977177cda0aed06cfeee9192e1247) Signed-off-by: Nicolas Pitre <nico@cam.org> --- builtin-pack-objects.c | 2 +- cache.h | 2 +- fsck-objects.c | 9 +++------ pack-check.c | 11 +++++++---- sha1_file.c | 11 +++++------ sha1_name.c | 16 ++++++++-------- 6 files changed, 25 insertions(+), 26 deletions(-) diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index ae051f9..4c345d5 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -227,7 +227,7 @@ static const unsigned char *find_packed_object_name(struct packed_git *p, unsigned long ofs) { struct revindex_entry *entry = find_packed_object(p, ofs); - return ((unsigned char *)p->index_data) + 4 * 256 + 24 * entry->nr + 4; + return nth_packed_object_sha1(p, entry->nr); } static void *delta_against(void *buf, unsigned long size, struct object_entry *entry) diff --git a/cache.h b/cache.h index cc8e84e..191c738 100644 --- a/cache.h +++ b/cache.h @@ -388,7 +388,7 @@ extern int use_packed_git(struct packed_git *); extern void unuse_packed_git(struct packed_git *); extern struct packed_git *add_packed_git(char *, int, int); extern int num_packed_objects(const struct packed_git *p); -extern int nth_packed_object_sha1(const struct packed_git *, int, unsigned char*); +extern const unsigned char *nth_packed_object_sha1(const struct packed_git *, unsigned int); extern unsigned long find_pack_entry_one(const unsigned char *, struct packed_git *); extern void *unpack_entry_gently(struct packed_git *, unsigned long, char *, unsigned long *); extern unsigned long unpack_object_header_gently(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep); diff --git a/fsck-objects.c b/fsck-objects.c index 46b628c..f6015a8 100644 --- a/fsck-objects.c +++ b/fsck-objects.c @@ -289,7 +289,7 @@ static int fsck_tag(struct tag *tag) return 0; } -static int fsck_sha1(unsigned char *sha1) +static int fsck_sha1(const unsigned char *sha1) { struct object *obj = parse_object(sha1); if (!obj) @@ -551,11 +551,8 @@ int main(int argc, char **argv) for (p = packed_git; p; p = p->next) { int num = num_packed_objects(p); - for (i = 0; i < num; i++) { - unsigned char sha1[20]; - nth_packed_object_sha1(p, i, sha1); - fsck_sha1(sha1); - } + for (i = 0; i < num; i++) + fsck_sha1(nth_packed_object_sha1(p, i)); } } diff --git a/pack-check.c b/pack-check.c index 7db44e9..11f6ed2 100644 --- a/pack-check.c +++ b/pack-check.c @@ -51,12 +51,13 @@ static int verify_packfile(struct packed_git *p) * we do not do scan-streaming check on the pack file. */ for (i = err = 0; i < nr_objects; i++) { - unsigned char sha1[20]; + const unsigned char *sha1; void *data; char type[20]; unsigned long size, offset; - if (nth_packed_object_sha1(p, i, sha1)) + sha1 = nth_packed_object_sha1(p, i); + if (!sha1) die("internal error pack-check nth-packed-object"); offset = find_pack_entry_one(sha1, p); if (!offset) @@ -93,14 +94,16 @@ static void show_pack_info(struct packed_git *p) memset(chain_histogram, 0, sizeof(chain_histogram)); for (i = 0; i < nr_objects; i++) { - unsigned char sha1[20], base_sha1[20]; + const unsigned char *sha1; + unsigned char base_sha1[20]; char type[20]; unsigned long size; unsigned long store_size; unsigned long offset; unsigned int delta_chain_length; - if (nth_packed_object_sha1(p, i, sha1)) + sha1 = nth_packed_object_sha1(p, i); + if (!sha1) die("internal error pack-check nth-packed-object"); offset = find_pack_entry_one(sha1, p); if (!offset) diff --git a/sha1_file.c b/sha1_file.c index df31462..b4c5209 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1177,15 +1177,14 @@ int num_packed_objects(const struct packed_git *p) return (p->index_size - 20 - 20 - 4*256) / 24; } -int nth_packed_object_sha1(const struct packed_git *p, int n, - unsigned char* sha1) +const unsigned char *nth_packed_object_sha1(const struct packed_git *p, + unsigned int n) { const unsigned char *index = p->index_data; index += 4 * 256; - if (n < 0 || num_packed_objects(p) <= n) - return -1; - hashcpy(sha1, index + 24 * n + 4); - return 0; + if (num_packed_objects(p) <= n) + return NULL; + return index + 24 * n + 4; } unsigned long find_pack_entry_one(const unsigned char *sha1, diff --git a/sha1_name.c b/sha1_name.c index 6d7cd78..d083096 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -71,7 +71,7 @@ static int match_sha(unsigned len, const unsigned char *a, const unsigned char * static int find_short_packed_object(int len, const unsigned char *match, unsigned char *sha1) { struct packed_git *p; - unsigned char found_sha1[20]; + const unsigned char *found_sha1 = NULL; int found = 0; prepare_packed_git(); @@ -80,10 +80,10 @@ static int find_short_packed_object(int len, const unsigned char *match, unsigne unsigned first = 0, last = num; while (first < last) { unsigned mid = (first + last) / 2; - unsigned char now[20]; + const unsigned char *now; int cmp; - nth_packed_object_sha1(p, mid, now); + now = nth_packed_object_sha1(p, mid); cmp = hashcmp(match, now); if (!cmp) { first = mid; @@ -96,14 +96,14 @@ static int find_short_packed_object(int len, const unsigned char *match, unsigne last = mid; } if (first < num) { - unsigned char now[20], next[20]; - nth_packed_object_sha1(p, first, now); + const unsigned char *now, *next; + now = nth_packed_object_sha1(p, first); if (match_sha(len, match, now)) { - if (nth_packed_object_sha1(p, first+1, next) || - !match_sha(len, match, next)) { + next = nth_packed_object_sha1(p, first+1); + if (!next|| !match_sha(len, match, next)) { /* unique within this pack */ if (!found) { - hashcpy(found_sha1, now); + found_sha1 = now; found++; } else if (hashcmp(found_sha1, now)) { -- 1.5.6.3.499.geae9 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 3/5] get rid of num_packed_objects() 2008-07-16 6:31 ` [PATCH 2/5] clean up and optimize nth_packed_object_sha1() usage Nicolas Pitre @ 2008-07-16 6:31 ` Nicolas Pitre 2008-07-16 6:31 ` [PATCH 4/5] pack-objects: learn about pack index version 2 Nicolas Pitre 0 siblings, 1 reply; 12+ messages in thread From: Nicolas Pitre @ 2008-07-16 6:31 UTC (permalink / raw) To: Junio C Hamano; +Cc: git The coming index format change doesn't allow for the number of objects to be determined from the size of the index file directly. Instead, Let's initialize a field in the packed_git structure with the object count when the index is validated since the count is always known at that point. (based on commit 57059091fad25427bce9b3d47e073ce0518d164b) Signed-off-by: Nicolas Pitre <nico@cam.org> --- builtin-count-objects.c | 2 +- builtin-pack-objects.c | 4 ++-- cache.h | 2 +- fsck-objects.c | 2 +- pack-check.c | 4 ++-- sha1_file.c | 9 ++------- sha1_name.c | 2 +- 7 files changed, 10 insertions(+), 15 deletions(-) diff --git a/builtin-count-objects.c b/builtin-count-objects.c index 73c5982..7795a63 100644 --- a/builtin-count-objects.c +++ b/builtin-count-objects.c @@ -110,7 +110,7 @@ int cmd_count_objects(int ac, const char **av, const char *prefix) for (p = packed_git; p; p = p->next) { if (!p->pack_local) continue; - packed += num_packed_objects(p); + packed += p->num_objects; } printf("count: %lu\n", loose); printf("size: %lu\n", loose_size / 2); diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index 4c345d5..b9c3da2 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -168,7 +168,7 @@ static int cmp_offset(const void *a_, const void *b_) static void prepare_pack_revindex(struct pack_revindex *rix) { struct packed_git *p = rix->p; - int num_ent = num_packed_objects(p); + int num_ent = p->num_objects; int i; const char *index = p->index_data; @@ -202,7 +202,7 @@ static struct revindex_entry * find_packed_object(struct packed_git *p, prepare_pack_revindex(rix); revindex = rix->revindex; lo = 0; - hi = num_packed_objects(p) + 1; + hi = p->num_objects + 1; do { int mi = (lo + hi) / 2; if (revindex[mi].offset == ofs) { diff --git a/cache.h b/cache.h index 191c738..1bcc7c1 100644 --- a/cache.h +++ b/cache.h @@ -336,6 +336,7 @@ extern struct packed_git { unsigned long pack_size; const void *index_data; void *pack_base; + unsigned int num_objects; int index_version; unsigned int pack_last_used; unsigned int pack_use_cnt; @@ -387,7 +388,6 @@ extern struct packed_git *find_sha1_pack(const unsigned char *sha1, extern int use_packed_git(struct packed_git *); extern void unuse_packed_git(struct packed_git *); extern struct packed_git *add_packed_git(char *, int, int); -extern int num_packed_objects(const struct packed_git *p); extern const unsigned char *nth_packed_object_sha1(const struct packed_git *, unsigned int); extern unsigned long find_pack_entry_one(const unsigned char *, struct packed_git *); extern void *unpack_entry_gently(struct packed_git *, unsigned long, char *, unsigned long *); diff --git a/fsck-objects.c b/fsck-objects.c index f6015a8..bdbca54 100644 --- a/fsck-objects.c +++ b/fsck-objects.c @@ -550,7 +550,7 @@ int main(int argc, char **argv) verify_pack(p, 0); for (p = packed_git; p; p = p->next) { - int num = num_packed_objects(p); + int num = p->num_objects; for (i = 0; i < num; i++) fsck_sha1(nth_packed_object_sha1(p, i)); } diff --git a/pack-check.c b/pack-check.c index 11f6ed2..578f59e 100644 --- a/pack-check.c +++ b/pack-check.c @@ -22,10 +22,10 @@ static int verify_packfile(struct packed_git *p) return error("Packfile version %d unsupported", ntohl(hdr->hdr_version)); nr_objects = ntohl(hdr->hdr_entries); - if (num_packed_objects(p) != nr_objects) + if (p->num_objects != nr_objects) return error("Packfile claims to have %d objects, " "while idx size expects %d", nr_objects, - num_packed_objects(p)); + p->num_objects); /* Check integrity of pack data with its SHA-1 checksum */ SHA1_Init(&ctx); diff --git a/sha1_file.c b/sha1_file.c index b4c5209..9c40e7e 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -458,6 +458,7 @@ static int check_packed_git_idx(const char *path, struct packed_git *p) p->index_version = 1; p->index_data = idx_map; p->index_size = idx_size; + p->num_objects = nr; return 0; } @@ -1171,18 +1172,12 @@ void *unpack_entry_gently(struct packed_git *p, unsigned long offset, } } -int num_packed_objects(const struct packed_git *p) -{ - /* See check_packed_git_idx() */ - return (p->index_size - 20 - 20 - 4*256) / 24; -} - const unsigned char *nth_packed_object_sha1(const struct packed_git *p, unsigned int n) { const unsigned char *index = p->index_data; index += 4 * 256; - if (num_packed_objects(p) <= n) + if (n >= p->num_objects) return NULL; return index + 24 * n + 4; } diff --git a/sha1_name.c b/sha1_name.c index d083096..be9be52 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -76,7 +76,7 @@ static int find_short_packed_object(int len, const unsigned char *match, unsigne prepare_packed_git(); for (p = packed_git; p && found < 2; p = p->next) { - unsigned num = num_packed_objects(p); + unsigned num = p->num_objects; unsigned first = 0, last = num; while (first < last) { unsigned mid = (first + last) / 2; -- 1.5.6.3.499.geae9 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 4/5] pack-objects: learn about pack index version 2 2008-07-16 6:31 ` [PATCH 3/5] get rid of num_packed_objects() Nicolas Pitre @ 2008-07-16 6:31 ` Nicolas Pitre 2008-07-16 6:31 ` [PATCH 5/5] sha1_file.c: learn about " Nicolas Pitre 0 siblings, 1 reply; 12+ messages in thread From: Nicolas Pitre @ 2008-07-16 6:31 UTC (permalink / raw) To: Junio C Hamano; +Cc: git This is the reading part only. No creation of index v2 is provided. (extracted from commit c553ca25bd60dc9fd50b8bc7bd329601b81cee66) Signed-off-by: Nicolas Pitre <nico@cam.org> --- builtin-pack-objects.c | 30 +++++++++++++++++++++++++----- 1 files changed, 25 insertions(+), 5 deletions(-) diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index b9c3da2..5198563 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -172,13 +172,33 @@ static void prepare_pack_revindex(struct pack_revindex *rix) int i; const char *index = p->index_data; - index += 4 * 256; rix->revindex = xmalloc(sizeof(*rix->revindex) * (num_ent + 1)); - for (i = 0; i < num_ent; i++) { - uint32_t hl = *((uint32_t *)(index + 24 * i)); - rix->revindex[i].offset = ntohl(hl); - rix->revindex[i].nr = i; + index += 4 * 256; + + if (p->index_version > 1) { + const uint32_t *off_32 = + (uint32_t *)(index + 8 + p->num_objects * (20 + 4)); + const uint32_t *off_64 = off_32 + p->num_objects; + for (i = 0; i < num_ent; i++) { + uint32_t off = ntohl(*off_32++); + if (!(off & 0x80000000)) { + rix->revindex[i].offset = off; + } else { + rix->revindex[i].offset = + ((uint64_t)ntohl(*off_64++)) << 32; + rix->revindex[i].offset |= + ntohl(*off_64++); + } + rix->revindex[i].nr = i; + } + } else { + for (i = 0; i < num_ent; i++) { + uint32_t hl = *((uint32_t *)(index + 24 * i)); + rix->revindex[i].offset = ntohl(hl); + rix->revindex[i].nr = i; + } } + /* This knows the pack format -- the 20-byte trailer * follows immediately after the last object data. */ -- 1.5.6.3.499.geae9 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 5/5] sha1_file.c: learn about index version 2 2008-07-16 6:31 ` [PATCH 4/5] pack-objects: learn about pack index version 2 Nicolas Pitre @ 2008-07-16 6:31 ` Nicolas Pitre 0 siblings, 0 replies; 12+ messages in thread From: Nicolas Pitre @ 2008-07-16 6:31 UTC (permalink / raw) To: Junio C Hamano; +Cc: git This allows for pack index v2 to be used. On 32-bit machines the maximum pack size is 2GB. To lift this limitation just use a newer git version. (based on commit 74e34e1fca2ed9998581cc94073bc2dd28bbb8f3) Signed-off-by: Nicolas Pitre <nico@cam.org> --- sha1_file.c | 117 +++++++++++++++++++++++++++++++++++++++++++++-------------- 1 files changed, 89 insertions(+), 28 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 9c40e7e..927ac06 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -407,7 +407,7 @@ static int check_packed_git_idx(const char *path, struct packed_git *p) void *idx_map; struct pack_idx_header *hdr; unsigned long idx_size; - unsigned int nr, i, *index; + unsigned int version, nr, i, *index; int fd = open(path, O_RDONLY); struct stat st; @@ -423,21 +423,23 @@ static int check_packed_git_idx(const char *path, struct packed_git *p) if (idx_map == MAP_FAILED) return -1; - /* a future index format would start with this, as older git - * binaries would fail the non-monotonic index check below. - * give a nicer warning to the user if we can. - */ hdr = idx_map; if (hdr->idx_signature == htonl(PACK_IDX_SIGNATURE)) { - munmap(idx_map, idx_size); - return error("index file %s is a newer version" - " and is not supported by this binary" - " (try upgrading GIT to a newer version)", - path); - } + version = ntohl(hdr->idx_version); + if (version < 2 || version > 2) { + munmap(idx_map, idx_size); + return error("index file %s is version %d" + " and is not supported by this binary" + " (try upgrading GIT to a newer version)", + path, version); + } + } else + version = 1; nr = 0; index = idx_map; + if (version > 1) + index += 2; /* skip index header */ for (i = 0; i < 256; i++) { unsigned int n = ntohl(index[i]); if (n < nr) @@ -445,17 +447,47 @@ static int check_packed_git_idx(const char *path, struct packed_git *p) nr = n; } - /* - * Total size: - * - 256 index entries 4 bytes each - * - 24-byte entries * nr (20-byte sha1 + 4-byte offset) - * - 20-byte SHA1 of the packfile - * - 20-byte SHA1 file checksum - */ - if (idx_size != 4*256 + nr * 24 + 20 + 20) - return error("wrong index file size"); + if (version == 1) { + /* + * Total size: + * - 256 index entries 4 bytes each + * - 24-byte entries * nr (20-byte sha1 + 4-byte offset) + * - 20-byte SHA1 of the packfile + * - 20-byte SHA1 file checksum + */ + if (idx_size != 4*256 + nr * 24 + 20 + 20) { + munmap(idx_map, idx_size); + return error("wrong index file size in %s", path); + } + } else if (version == 2) { + /* + * Minimum size: + * - 8 bytes of header + * - 256 index entries 4 bytes each + * - 20-byte sha1 entry * nr + * - 4-byte crc entry * nr + * - 4-byte offset entry * nr + * - 20-byte SHA1 of the packfile + * - 20-byte SHA1 file checksum + * And after the 4-byte offset table might be a + * variable sized table containing 8-byte entries + * for offsets larger than 2^31. + */ + unsigned long min_size = 8 + 4*256 + nr*(20 + 4 + 4) + 20 + 20; + if (idx_size < min_size || idx_size > min_size + (nr - 1)*8) { + munmap(idx_map, idx_size); + return error("wrong index file size in %s", path); + } + if (idx_size != min_size) { + /* make sure we can deal with large pack offsets */ + if (sizeof(unsigned long) <= 4) { + munmap(idx_map, idx_size); + return error("pack %s too large -- please upgrade your git version", path); + } + } + } - p->index_version = 1; + p->index_version = version; p->index_data = idx_map; p->index_size = idx_size; p->num_objects = nr; @@ -1176,27 +1208,56 @@ const unsigned char *nth_packed_object_sha1(const struct packed_git *p, unsigned int n) { const unsigned char *index = p->index_data; - index += 4 * 256; if (n >= p->num_objects) return NULL; - return index + 24 * n + 4; + index += 4 * 256; + if (p->index_version == 1) { + return index + 24 * n + 4; + } else { + index += 8; + return index + 20 * n; + } +} + +static off_t nth_packed_object_offset(const struct packed_git *p, uint32_t n) +{ + const unsigned char *index = p->index_data; + index += 4 * 256; + if (p->index_version == 1) { + return ntohl(*((uint32_t *)(index + 24 * n))); + } else { + uint32_t off; + index += 8 + p->num_objects * (20 + 4); + off = ntohl(*((uint32_t *)(index + 4 * n))); + if (!(off & 0x80000000)) + return off; + index += p->num_objects * 4 + (off & 0x7fffffff) * 8; + return (((uint64_t)ntohl(*((uint32_t *)(index + 0)))) << 32) | + ntohl(*((uint32_t *)(index + 4))); + } } unsigned long find_pack_entry_one(const unsigned char *sha1, struct packed_git *p) { const unsigned int *level1_ofs = p->index_data; - int hi = ntohl(level1_ofs[*sha1]); - int lo = ((*sha1 == 0x0) ? 0 : ntohl(level1_ofs[*sha1 - 1])); const unsigned char *index = p->index_data; + unsigned hi, lo; + if (p->index_version > 1) { + level1_ofs += 2; + index += 8; + } index += 4 * 256; + hi = ntohl(level1_ofs[*sha1]); + lo = ((*sha1 == 0x0) ? 0 : ntohl(level1_ofs[*sha1 - 1])); do { - int mi = (lo + hi) / 2; - int cmp = hashcmp(index + 24 * mi + 4, sha1); + unsigned mi = (lo + hi) / 2; + unsigned x = (p->index_version > 1) ? (mi * 20) : (mi * 24 + 4); + int cmp = hashcmp(index + x, sha1); if (!cmp) - return ntohl(*((unsigned int *) ((char *) index + (24 * mi)))); + return nth_packed_object_offset(p, mi); if (cmp > 0) hi = mi; else -- 1.5.6.3.499.geae9 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 0/5] add pack index v2 reading capability to git v1.4.4.4 2008-07-16 6:31 [PATCH 0/5] add pack index v2 reading capability to git v1.4.4.4 Nicolas Pitre 2008-07-16 6:31 ` [PATCH 1/5] clean up pack index handling a bit Nicolas Pitre @ 2008-07-16 10:46 ` Johannes Schindelin 2008-07-16 16:25 ` Linus Torvalds 2 siblings, 0 replies; 12+ messages in thread From: Johannes Schindelin @ 2008-07-16 10:46 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Junio C Hamano, git Hi, On Wed, 16 Jul 2008, Nicolas Pitre wrote: > Well, here it is. I know I know, I'm just too nice. Thanks, Dscho ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/5] add pack index v2 reading capability to git v1.4.4.4 2008-07-16 6:31 [PATCH 0/5] add pack index v2 reading capability to git v1.4.4.4 Nicolas Pitre 2008-07-16 6:31 ` [PATCH 1/5] clean up pack index handling a bit Nicolas Pitre 2008-07-16 10:46 ` [PATCH 0/5] add pack index v2 reading capability to git v1.4.4.4 Johannes Schindelin @ 2008-07-16 16:25 ` Linus Torvalds 2008-07-16 17:04 ` Junio C Hamano 2008-07-16 17:08 ` Junio C Hamano 2 siblings, 2 replies; 12+ messages in thread From: Linus Torvalds @ 2008-07-16 16:25 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Junio C Hamano, git On Wed, 16 Jul 2008, Nicolas Pitre wrote: > > Junio: if you could just apply them, tag the result as v1.4.4.5 and > push it out then at that point it simply will be up to Debian to make > it available as a "major usability fix". Actually, it fixes a crash. Didn't 1.4.4 SIGSEGV without this on pack-v2? So you don't even have to sell it as a usability issue, but literally as a bugfix. Linus ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/5] add pack index v2 reading capability to git v1.4.4.4 2008-07-16 16:25 ` Linus Torvalds @ 2008-07-16 17:04 ` Junio C Hamano 2008-07-16 17:08 ` Junio C Hamano 1 sibling, 0 replies; 12+ messages in thread From: Junio C Hamano @ 2008-07-16 17:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: Nicolas Pitre, git, Gerrit Pape Linus Torvalds <torvalds@linux-foundation.org> writes: > On Wed, 16 Jul 2008, Nicolas Pitre wrote: >> >> Junio: if you could just apply them, tag the result as v1.4.4.5 and >> push it out then at that point it simply will be up to Debian to make >> it available as a "major usability fix". > > Actually, it fixes a crash. Didn't 1.4.4 SIGSEGV without this on pack-v2? > > So you don't even have to sell it as a usability issue, but literally as a > bugfix. It's tagged and pushed out to both kernel.org and repo.or.cz. Thanks, Nico. I won't be doing my usual full release engineering on this one, but I did: - repack a copy of git.git to use pack idx version #2 for trial. Hint. A handy way to tell which version of packfile you have is to run this: $ od -c .git/objects/pack/pack-$your_pack_id.idx | head -n 1 If the output begins with "377 t 0 c", you have pack idx version #2. - check "git log" from v1.4.4.4 (fails) and v1.4.4.5 (reads ok) - repack this copy with "git repack" from 1.4.4.5. This produced a repository usable by v1.4.4.4. I also updated /pub/scm/git/git.git/config at kernel.org to use [pack] indexVersion = 1 for now. The repository is also repacked with pack idx version #1. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/5] add pack index v2 reading capability to git v1.4.4.4 2008-07-16 16:25 ` Linus Torvalds 2008-07-16 17:04 ` Junio C Hamano @ 2008-07-16 17:08 ` Junio C Hamano 2008-07-16 17:34 ` Linus Torvalds 1 sibling, 1 reply; 12+ messages in thread From: Junio C Hamano @ 2008-07-16 17:08 UTC (permalink / raw) To: Linus Torvalds; +Cc: Nicolas Pitre, git Linus Torvalds <torvalds@linux-foundation.org> writes: > Actually, it fixes a crash. Didn't 1.4.4 SIGSEGV without this on pack-v2? I do not think it should SEGV. The pack-idx signature was chosen rather carefully to allow older ones to die gracefully. error: non-monotonic index error: Could not read 4a588075c54cd5902e5f4d43b9d6b0c31d0f9769 But as I said in the other message, it's tagged and pushed out already. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/5] add pack index v2 reading capability to git v1.4.4.4 2008-07-16 17:08 ` Junio C Hamano @ 2008-07-16 17:34 ` Linus Torvalds 2008-07-17 8:01 ` Junio C Hamano 0 siblings, 1 reply; 12+ messages in thread From: Linus Torvalds @ 2008-07-16 17:34 UTC (permalink / raw) To: Junio C Hamano; +Cc: Nicolas Pitre, git On Wed, 16 Jul 2008, Junio C Hamano wrote: > > I do not think it should SEGV. The pack-idx signature was chosen rather > carefully to allow older ones to die gracefully. Well, Pasky reported differently. > error: non-monotonic index > error: Could not read 4a588075c54cd5902e5f4d43b9d6b0c31d0f9769 Pasky's report was error: non-monotonic index /usr/bin/git-fetch: line 297: 30402 Segmentation fault git-http-fetch -v -a "$head" "$remote/" but maybe that was something specific to his case. Linus ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/5] add pack index v2 reading capability to git v1.4.4.4 2008-07-16 17:34 ` Linus Torvalds @ 2008-07-17 8:01 ` Junio C Hamano 0 siblings, 0 replies; 12+ messages in thread From: Junio C Hamano @ 2008-07-17 8:01 UTC (permalink / raw) To: Linus Torvalds; +Cc: Nicolas Pitre, git Linus Torvalds <torvalds@linux-foundation.org> writes: > On Wed, 16 Jul 2008, Junio C Hamano wrote: >> >> I do not think it should SEGV. The pack-idx signature was chosen rather >> carefully to allow older ones to die gracefully. > > Well, Pasky reported differently. > >> error: non-monotonic index >> error: Could not read 4a588075c54cd5902e5f4d43b9d6b0c31d0f9769 > > Pasky's report was > > error: non-monotonic index > /usr/bin/git-fetch: line 297: 30402 Segmentation fault git-http-fetch -v -a "$head" "$remote/" > > but maybe that was something specific to his case. It is caused by the http walker not being careful. In v1.4.4.5 http-fetch.c, this code appears unmodified since v1.4.4.4, and an equivalent code is still in http-walker.c in more recent versions: static int setup_index(struct alt_base *repo, unsigned char *sha1) { struct packed_git *new_pack; if (has_pack_file(sha1)) return 0; /* don't list this as something we can get */ if (fetch_index(repo, sha1)) return -1; new_pack = parse_pack_index(sha1); new_pack->next = repo->packs; repo->packs = new_pack; return 0; } Nico taught parse_pack_index() what v2 pack idx file looks like, but when the code hits unknown idx file (or a corrupt one), the function signals error by returning NULL; assigning to new_pack->next without checking would segfault. We would need this fix to futureproof ourselves for pack idx v3 and later, and also for protecting from a corrupt idx file coming over the wire. --- http-walker.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/http-walker.c b/http-walker.c index 51c18f2..9dc6b27 100644 --- a/http-walker.c +++ b/http-walker.c @@ -442,6 +442,8 @@ static int setup_index(struct walker *walker, struct alt_base *repo, unsigned ch return -1; new_pack = parse_pack_index(sha1); + if (!new_pack) + return -1; /* parse_pack_index() already issued an error message */ new_pack->next = repo->packs; repo->packs = new_pack; return 0; ^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2008-07-17 8:02 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-07-16 6:31 [PATCH 0/5] add pack index v2 reading capability to git v1.4.4.4 Nicolas Pitre 2008-07-16 6:31 ` [PATCH 1/5] clean up pack index handling a bit Nicolas Pitre 2008-07-16 6:31 ` [PATCH 2/5] clean up and optimize nth_packed_object_sha1() usage Nicolas Pitre 2008-07-16 6:31 ` [PATCH 3/5] get rid of num_packed_objects() Nicolas Pitre 2008-07-16 6:31 ` [PATCH 4/5] pack-objects: learn about pack index version 2 Nicolas Pitre 2008-07-16 6:31 ` [PATCH 5/5] sha1_file.c: learn about " Nicolas Pitre 2008-07-16 10:46 ` [PATCH 0/5] add pack index v2 reading capability to git v1.4.4.4 Johannes Schindelin 2008-07-16 16:25 ` Linus Torvalds 2008-07-16 17:04 ` Junio C Hamano 2008-07-16 17:08 ` Junio C Hamano 2008-07-16 17:34 ` Linus Torvalds 2008-07-17 8:01 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).