From: "SZEDER Gábor" <szeder@ira.uka.de>
To: Nicolas Pitre <nico@fluxnic.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 05/38] pack v4: add commit object parsing
Date: Thu, 5 Sep 2013 12:30:11 +0200 [thread overview]
Message-ID: <20130905103011.GA20919@goldbirke> (raw)
In-Reply-To: <1378362001-1738-6-git-send-email-nico@fluxnic.net>
Hi,
On Thu, Sep 05, 2013 at 02:19:28AM -0400, Nicolas Pitre wrote:
> Let's create another dictionary table to hold the author and committer
> entries. We use the same table format used for tree entries where the
> 16 bit data prefix is conveniently used to store the timezone value.
>
> In order to copy straight from a commit object buffer, dict_add_entry()
> is modified to get the string length as the provided string pointer is
> not always be null terminated.
>
> Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
> ---
> packv4-create.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 89 insertions(+), 9 deletions(-)
>
> diff --git a/packv4-create.c b/packv4-create.c
> index eccd9fc..5c08871 100644
> --- a/packv4-create.c
> +++ b/packv4-create.c
> @@ -1,5 +1,5 @@
> /*
> - * packv4-create.c: management of dictionary tables used in pack v4
> + * packv4-create.c: creation of dictionary tables and objects used in pack v4
> *
> * (C) Nicolas Pitre <nico@fluxnic.net>
> *
> @@ -80,9 +80,9 @@ static void rehash_entries(struct dict_table *t)
> }
> }
>
> -int dict_add_entry(struct dict_table *t, int val, const char *str)
> +int dict_add_entry(struct dict_table *t, int val, const char *str, int str_len)
> {
> - int i, val_len = 2, str_len = strlen(str) + 1;
> + int i, val_len = 2;
>
> if (t->ptr + val_len + str_len > t->size) {
We need a +1 here on the left side, i.e.
if (t->ptr + val_len + str_len + 1 > t->size) {
The str_len variable accounted for the terminating null character
before, but this patch removes str_len = strlen(str) + 1; above, and
callers specify the length of str without the terminating null in
str_len. Thus it can lead to memory corruption, when the new entry
happens to end at 't->ptr + val_len + str_len' and the line added in
the next hunk writes the terminating null beyond the end of the
buffer. I couldn't create a v4 pack from a current linux repo because
of this; either glibc detected something or 'git packv4-create'
crashed.
Sidenote: couldn't we call the 'ptr' field something else, like
end_offset or end_idx? It took me some headscratching to figure out
why is it OK to compare a pointer to an integer above, or use a
pointer without dereferencing as an index into an array below (because
ptr is, well, not a pointer after all).
> t->size = (t->size + val_len + str_len + 1024) * 3 / 2;
> @@ -92,6 +92,7 @@ int dict_add_entry(struct dict_table *t, int val, const char *str)
> t->data[t->ptr] = val >> 8;
> t->data[t->ptr + 1] = val;
> memcpy(t->data + t->ptr + val_len, str, str_len);
> + t->data[t->ptr + val_len + str_len] = 0;
>
> i = (t->nb_entries) ?
> locate_entry(t, t->data + t->ptr, val_len + str_len) : -1;
> @@ -107,7 +108,7 @@ int dict_add_entry(struct dict_table *t, int val, const char *str)
> t->entry[t->nb_entries].offset = t->ptr;
> t->entry[t->nb_entries].size = val_len + str_len;
> t->entry[t->nb_entries].hits = 1;
> - t->ptr += val_len + str_len;
> + t->ptr += val_len + str_len + 1;
Good.
Best,
Gábor
> t->nb_entries++;
>
> if (t->hash_size * 3 <= t->nb_entries * 4)
> @@ -135,8 +136,73 @@ static void sort_dict_entries_by_hits(struct dict_table *t)
> rehash_entries(t);
> }
>
> +static struct dict_table *commit_name_table;
> static struct dict_table *tree_path_table;
>
> +/*
> + * Parse the author/committer line from a canonical commit object.
> + * The 'from' argument points right after the "author " or "committer "
> + * string. The time zone is parsed and stored in *tz_val. The returned
> + * pointer is right after the end of the email address which is also just
> + * before the time value, or NULL if a parsing error is encountered.
> + */
> +static char *get_nameend_and_tz(char *from, int *tz_val)
> +{
> + char *end, *tz;
> +
> + tz = strchr(from, '\n');
> + /* let's assume the smallest possible string to be "x <x> 0 +0000\n" */
> + if (!tz || tz - from < 13)
> + return NULL;
> + tz -= 4;
> + end = tz - 4;
> + while (end - from > 5 && *end != ' ')
> + end--;
> + if (end[-1] != '>' || end[0] != ' ' || tz[-2] != ' ')
> + return NULL;
> + *tz_val = (tz[0] - '0') * 1000 +
> + (tz[1] - '0') * 100 +
> + (tz[2] - '0') * 10 +
> + (tz[3] - '0');
> + switch (tz[-1]) {
> + default: return NULL;
> + case '+': break;
> + case '-': *tz_val = -*tz_val;
> + }
> + return end;
> +}
> +
> +static int add_commit_dict_entries(void *buf, unsigned long size)
> +{
> + char *name, *end = NULL;
> + int tz_val;
> +
> + if (!commit_name_table)
> + commit_name_table = create_dict_table();
> +
> + /* parse and add author info */
> + name = strstr(buf, "\nauthor ");
> + if (name) {
> + name += 8;
> + end = get_nameend_and_tz(name, &tz_val);
> + }
> + if (!name || !end)
> + return -1;
> + dict_add_entry(commit_name_table, tz_val, name, end - name);
> +
> + /* parse and add committer info */
> + name = strstr(end, "\ncommitter ");
> + if (name) {
> + name += 11;
> + end = get_nameend_and_tz(name, &tz_val);
> + }
> + if (!name || !end)
> + return -1;
> + dict_add_entry(commit_name_table, tz_val, name, end - name);
> +
> + return 0;
> +}
> +
> static int add_tree_dict_entries(void *buf, unsigned long size)
> {
> struct tree_desc desc;
> @@ -146,13 +212,16 @@ static int add_tree_dict_entries(void *buf, unsigned long size)
> tree_path_table = create_dict_table();
>
> init_tree_desc(&desc, buf, size);
> - while (tree_entry(&desc, &name_entry))
> + while (tree_entry(&desc, &name_entry)) {
> + int pathlen = tree_entry_len(&name_entry);
> dict_add_entry(tree_path_table, name_entry.mode,
> - name_entry.path);
> + name_entry.path, pathlen);
> + }
> +
> return 0;
> }
>
> -void dict_dump(struct dict_table *t)
> +void dump_dict_table(struct dict_table *t)
> {
> int i;
>
> @@ -169,6 +238,12 @@ void dict_dump(struct dict_table *t)
> }
> }
>
> +static void dict_dump(void)
> +{
> + dump_dict_table(commit_name_table);
> + dump_dict_table(tree_path_table);
> +}
> +
> struct idx_entry
> {
> off_t offset;
> @@ -205,6 +280,7 @@ static int create_pack_dictionaries(struct packed_git *p)
> enum object_type type;
> unsigned long size;
> struct object_info oi = {};
> + int (*add_dict_entries)(void *, unsigned long);
>
> oi.typep = &type;
> oi.sizep = &size;
> @@ -213,7 +289,11 @@ static int create_pack_dictionaries(struct packed_git *p)
> sha1_to_hex(objects[i].sha1), p->pack_name);
>
> switch (type) {
> + case OBJ_COMMIT:
> + add_dict_entries = add_commit_dict_entries;
> + break;
> case OBJ_TREE:
> + add_dict_entries = add_tree_dict_entries;
> break;
> default:
> continue;
> @@ -225,7 +305,7 @@ static int create_pack_dictionaries(struct packed_git *p)
> if (check_sha1_signature(objects[i].sha1, data, size, typename(type)))
> die("packed %s from %s is corrupt",
> sha1_to_hex(objects[i].sha1), p->pack_name);
> - if (add_tree_dict_entries(data, size) < 0)
> + if (add_dict_entries(data, size) < 0)
> die("can't process %s object %s",
> typename(type), sha1_to_hex(objects[i].sha1));
> free(data);
> @@ -285,6 +365,6 @@ int main(int argc, char *argv[])
> exit(1);
> }
> process_one_pack(argv[1]);
> - dict_dump(tree_path_table);
> + dict_dump();
> return 0;
> }
> --
> 1.8.4.38.g317e65b
>
>
next prev parent reply other threads:[~2013-09-05 10:30 UTC|newest]
Thread overview: 124+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-05 6:19 [PATCH 00/38] pack version 4 basic functionalities Nicolas Pitre
2013-09-05 6:19 ` [PATCH 01/38] pack v4: initial pack dictionary structure and code Nicolas Pitre
2013-09-05 6:19 ` [PATCH 02/38] export packed_object_info() Nicolas Pitre
2013-09-05 6:19 ` [PATCH 03/38] pack v4: scan tree objects Nicolas Pitre
2013-09-05 6:19 ` [PATCH 04/38] pack v4: add tree entry mode support to dictionary entries Nicolas Pitre
2013-09-05 6:19 ` [PATCH 05/38] pack v4: add commit object parsing Nicolas Pitre
2013-09-05 10:30 ` SZEDER Gábor [this message]
2013-09-05 17:30 ` Nicolas Pitre
2013-09-05 6:19 ` [PATCH 06/38] pack v4: split the object list and dictionary creation Nicolas Pitre
2013-09-05 6:19 ` [PATCH 07/38] pack v4: move to struct pack_idx_entry and get rid of our own struct idx_entry Nicolas Pitre
2013-09-05 6:19 ` [PATCH 08/38] pack v4: basic SHA1 reference encoding Nicolas Pitre
2013-09-05 6:19 ` [PATCH 09/38] introduce get_sha1_lowhex() Nicolas Pitre
2013-09-05 6:19 ` [PATCH 10/38] pack v4: commit object encoding Nicolas Pitre
2013-09-06 6:57 ` Junio C Hamano
2013-09-06 21:28 ` Nicolas Pitre
2013-09-06 22:08 ` Junio C Hamano
2013-09-07 4:41 ` Nicolas Pitre
2013-09-05 6:19 ` [PATCH 11/38] pack v4: tree " Nicolas Pitre
2013-09-05 6:19 ` [PATCH 12/38] pack v4: dictionary table output Nicolas Pitre
2013-09-05 6:19 ` [PATCH 13/38] pack v4: creation code Nicolas Pitre
2013-09-05 6:19 ` [PATCH 14/38] pack v4: object headers Nicolas Pitre
2013-09-05 6:19 ` [PATCH 15/38] pack v4: object data copy Nicolas Pitre
2013-09-05 6:19 ` [PATCH 16/38] pack v4: object writing Nicolas Pitre
2013-09-05 6:19 ` [PATCH 17/38] pack v4: tree object delta encoding Nicolas Pitre
2013-09-05 6:19 ` [PATCH 18/38] pack v4: load delta candidate for encoding tree objects Nicolas Pitre
2013-09-05 6:19 ` [PATCH 19/38] packv4-create: optimize delta encoding Nicolas Pitre
2013-09-05 6:19 ` [PATCH 20/38] pack v4: honor pack.compression config option Nicolas Pitre
2013-09-05 6:19 ` [PATCH 21/38] pack v4: relax commit parsing a bit Nicolas Pitre
2013-09-05 6:19 ` [PATCH 22/38] pack index v3 Nicolas Pitre
2013-09-05 6:19 ` [PATCH 23/38] packv4-create: normalize pack name to properly generate the pack index file name Nicolas Pitre
2013-09-05 6:19 ` [PATCH 24/38] packv4-create: add progress display Nicolas Pitre
2013-09-05 6:19 ` [PATCH 25/38] pack v4: initial pack index v3 support on the read side Nicolas Pitre
2013-09-05 6:19 ` [PATCH 26/38] pack v4: object header decode Nicolas Pitre
2013-09-05 6:19 ` [PATCH 27/38] pack v4: code to obtain a SHA1 from a sha1ref Nicolas Pitre
2013-09-05 6:19 ` [PATCH 28/38] pack v4: code to load and prepare a pack dictionary table for use Nicolas Pitre
2013-09-05 6:19 ` [PATCH 29/38] pack v4: code to retrieve a name Nicolas Pitre
2013-09-05 6:19 ` [PATCH 30/38] pack v4: code to recreate a canonical commit object Nicolas Pitre
2013-09-05 6:19 ` [PATCH 31/38] sha1_file.c: make use of decode_varint() Nicolas Pitre
2013-09-05 7:35 ` SZEDER Gábor
2013-09-05 6:19 ` [PATCH 32/38] pack v4: parse delta base reference Nicolas Pitre
2013-09-05 6:19 ` [PATCH 33/38] pack v4: we can read commit objects now Nicolas Pitre
2013-09-05 6:19 ` [PATCH 34/38] pack v4: code to retrieve a path component Nicolas Pitre
2013-09-05 6:19 ` [PATCH 35/38] pack v4: decode tree objects Nicolas Pitre
2013-09-05 6:19 ` [PATCH 36/38] pack v4: get " Nicolas Pitre
2013-09-05 6:20 ` [PATCH 37/38] pack v4: introduce "escape hatches" in the name and path indexes Nicolas Pitre
2013-09-05 19:02 ` Nicolas Pitre
2013-09-05 21:48 ` Nicolas Pitre
2013-09-05 23:57 ` Duy Nguyen
2013-09-05 6:20 ` [PATCH 38/38] packv4-create: add a command line argument to limit tree copy sequences Nicolas Pitre
2013-09-07 10:43 ` [PATCH 00/12] pack v4 support in index-pack Nguyễn Thái Ngọc Duy
2013-09-07 10:43 ` [PATCH 01/12] pack v4: split pv4_create_dict() out of load_dict() Nguyễn Thái Ngọc Duy
2013-09-07 10:43 ` [PATCH 02/12] index-pack: split out varint decoding code Nguyễn Thái Ngọc Duy
2013-09-07 10:43 ` [PATCH 03/12] index-pack: do not allocate buffer for unpacking deltas in the first pass Nguyễn Thái Ngọc Duy
2013-09-07 10:43 ` [PATCH 04/12] index-pack: split inflate/digest code out of unpack_entry_data Nguyễn Thái Ngọc Duy
2013-09-07 10:43 ` [PATCH 05/12] index-pack: parse v4 header and dictionaries Nguyễn Thái Ngọc Duy
2013-09-08 2:14 ` Nicolas Pitre
2013-09-07 10:43 ` [PATCH 06/12] index-pack: make sure all objects are registered in v4's SHA-1 table Nguyễn Thái Ngọc Duy
2013-09-07 10:43 ` [PATCH 07/12] index-pack: parse v4 commit format Nguyễn Thái Ngọc Duy
2013-09-07 10:43 ` [PATCH 08/12] index-pack: parse v4 tree format Nguyễn Thái Ngọc Duy
2013-09-08 2:52 ` Nicolas Pitre
2013-09-07 10:43 ` [PATCH 09/12] index-pack: move delta base queuing code to unpack_raw_entry Nguyễn Thái Ngọc Duy
2013-09-07 10:43 ` [PATCH 10/12] index-pack: record all delta bases in v4 (tree and ref-delta) Nguyễn Thái Ngọc Duy
2013-09-07 10:43 ` [PATCH 11/12] index-pack: skip looking for ofs-deltas in v4 as they are not allowed Nguyễn Thái Ngọc Duy
2013-09-07 10:43 ` [PATCH 12/12] index-pack: resolve v4 one-base trees Nguyễn Thái Ngọc Duy
2013-09-08 3:28 ` Nicolas Pitre
2013-09-08 3:44 ` Duy Nguyen
2013-09-08 7:22 ` [PATCH v2 00/14] pack v4 support in index-pack Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 01/14] pack v4: split pv4_create_dict() out of load_dict() Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 02/14] pack v4: add pv4_free_dict() Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 03/14] index-pack: add more comments on some big functions Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 04/14] index-pack: split out varint decoding code Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 05/14] index-pack: do not allocate buffer for unpacking deltas in the first pass Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 06/14] index-pack: split inflate/digest code out of unpack_entry_data Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 07/14] index-pack: parse v4 header and dictionaries Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 08/14] index-pack: make sure all objects are registered in v4's SHA-1 table Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 09/14] index-pack: parse v4 commit format Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 10/14] index-pack: parse v4 tree format Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 11/14] index-pack: move delta base queuing code to unpack_raw_entry Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 12/14] index-pack: record all delta bases in v4 (tree and ref-delta) Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 13/14] index-pack: skip looking for ofs-deltas in v4 as they are not allowed Nguyễn Thái Ngọc Duy
2013-09-08 7:22 ` [PATCH v2 14/14] index-pack: resolve v4 one-base trees Nguyễn Thái Ngọc Duy
2013-09-08 15:04 ` [PATCH 00/11] pack v4 support in pack-objects Nguyễn Thái Ngọc Duy
2013-09-08 15:04 ` [PATCH 01/11] pack v4: allocate dicts from the beginning Nguyễn Thái Ngọc Duy
2013-09-08 15:04 ` [PATCH 02/11] pack v4: stop using static/global variables in packv4-create.c Nguyễn Thái Ngọc Duy
2013-09-08 15:04 ` [PATCH 03/11] pack v4: move packv4-create.c to libgit.a Nguyễn Thái Ngọc Duy
2013-09-08 20:56 ` Nicolas Pitre
2013-09-08 15:04 ` [PATCH 04/11] pack v4: add version argument to write_pack_header Nguyễn Thái Ngọc Duy
2013-09-08 15:04 ` [PATCH 05/11] pack-write.c: add pv4_encode_in_pack_object_header Nguyễn Thái Ngọc Duy
2013-09-08 20:51 ` Nicolas Pitre
2013-09-08 15:04 ` [PATCH 06/11] pack-objects: add --version to specify written pack version Nguyễn Thái Ngọc Duy
2013-09-08 15:04 ` [PATCH 07/11] list-objects.c: add show_tree_entry callback to traverse_commit_list Nguyễn Thái Ngọc Duy
2013-09-08 15:04 ` [PATCH 08/11] pack-objects: create pack v4 tables Nguyễn Thái Ngọc Duy
2013-09-09 10:40 ` Duy Nguyen
2013-09-09 13:07 ` Nicolas Pitre
2013-09-09 15:21 ` Junio C Hamano
2013-09-08 15:04 ` [PATCH 09/11] pack-objects: do not cache delta for v4 trees Nguyễn Thái Ngọc Duy
2013-09-08 15:04 ` [PATCH 10/11] pack-objects: exclude commits out of delta objects in v4 Nguyễn Thái Ngọc Duy
2013-09-08 15:04 ` [PATCH 11/11] pack-objects: support writing pack v4 Nguyễn Thái Ngọc Duy
2013-09-09 13:57 ` [PATCH v2 00/16] pack v4 support in pack-objects Nguyễn Thái Ngọc Duy
2013-09-09 13:57 ` [PATCH v2 01/16] pack v4: allocate dicts from the beginning Nguyễn Thái Ngọc Duy
2013-09-09 13:57 ` [PATCH v2 02/16] pack v4: stop using static/global variables in packv4-create.c Nguyễn Thái Ngọc Duy
2013-09-09 13:57 ` [PATCH v2 03/16] pack v4: move packv4-create.c to libgit.a Nguyễn Thái Ngọc Duy
2013-09-09 13:57 ` [PATCH v2 04/16] pack v4: add version argument to write_pack_header Nguyễn Thái Ngọc Duy
2013-09-09 13:57 ` [PATCH v2 05/16] pack_write: tighten valid object type check in encode_in_pack_object_header Nguyễn Thái Ngọc Duy
2013-09-09 13:57 ` [PATCH v2 06/16] pack-write.c: add pv4_encode_object_header Nguyễn Thái Ngọc Duy
2013-09-09 13:57 ` [PATCH v2 07/16] pack-objects: add --version to specify written pack version Nguyễn Thái Ngọc Duy
2013-09-09 13:57 ` [PATCH v2 08/16] list-objects.c: add show_tree_entry callback to traverse_commit_list Nguyễn Thái Ngọc Duy
2013-09-09 13:58 ` [PATCH v2 09/16] pack-objects: do not cache delta for v4 trees Nguyễn Thái Ngọc Duy
2013-09-09 13:58 ` [PATCH v2 10/16] pack-objects: exclude commits out of delta objects in v4 Nguyễn Thái Ngọc Duy
2013-09-09 13:58 ` [PATCH v2 11/16] pack-objects: create pack v4 tables Nguyễn Thái Ngọc Duy
2013-09-09 13:58 ` [PATCH v2 12/16] pack-objects: prepare SHA-1 table in v4 Nguyễn Thái Ngọc Duy
2013-09-09 13:58 ` [PATCH v2 13/16] pack-objects: support writing pack v4 Nguyễn Thái Ngọc Duy
2013-09-09 13:58 ` [PATCH v2 14/16] pack v4: support "end-of-pack" indicator in index-pack and pack-objects Nguyễn Thái Ngọc Duy
2013-09-09 13:58 ` [PATCH v2 15/16] index-pack: use nr_objects_final as sha1_table size Nguyễn Thái Ngọc Duy
2013-09-09 15:01 ` Nicolas Pitre
2013-09-09 18:34 ` Junio C Hamano
2013-09-09 18:46 ` Nicolas Pitre
2013-09-09 18:56 ` Junio C Hamano
2013-09-09 19:11 ` Nicolas Pitre
2013-09-09 19:30 ` Junio C Hamano
2013-09-09 19:56 ` Nicolas Pitre
2013-09-10 0:45 ` Duy Nguyen
2013-09-12 15:34 ` Nicolas Pitre
2013-09-09 13:58 ` [PATCH v2 16/16] index-pack: support completing thin packs v4 Nguyễn Thái Ngọc Duy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130905103011.GA20919@goldbirke \
--to=szeder@ira.uka.de \
--cc=git@vger.kernel.org \
--cc=nico@fluxnic.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).