git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "SZEDER Gábor" <szeder@ira.uka.de>
To: Nicolas Pitre <nico@fluxnic.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 05/38] pack v4: add commit object parsing
Date: Thu, 5 Sep 2013 12:30:11 +0200	[thread overview]
Message-ID: <20130905103011.GA20919@goldbirke> (raw)
In-Reply-To: <1378362001-1738-6-git-send-email-nico@fluxnic.net>

Hi,


On Thu, Sep 05, 2013 at 02:19:28AM -0400, Nicolas Pitre wrote:
> Let's create another dictionary table to hold the author and committer
> entries.  We use the same table format used for tree entries where the
> 16 bit data prefix is conveniently used to store the timezone value.
> 
> In order to copy straight from a commit object buffer, dict_add_entry()
> is modified to get the string length as the provided string pointer is
> not always be null terminated.
> 
> Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
> ---
>  packv4-create.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 89 insertions(+), 9 deletions(-)
> 
> diff --git a/packv4-create.c b/packv4-create.c
> index eccd9fc..5c08871 100644
> --- a/packv4-create.c
> +++ b/packv4-create.c
> @@ -1,5 +1,5 @@
>  /*
> - * packv4-create.c: management of dictionary tables used in pack v4
> + * packv4-create.c: creation of dictionary tables and objects used in pack v4
>   *
>   * (C) Nicolas Pitre <nico@fluxnic.net>
>   *
> @@ -80,9 +80,9 @@ static void rehash_entries(struct dict_table *t)
>  	}
>  }
>  
> -int dict_add_entry(struct dict_table *t, int val, const char *str)
> +int dict_add_entry(struct dict_table *t, int val, const char *str, int str_len)
>  {
> -	int i, val_len = 2, str_len = strlen(str) + 1;
> +	int i, val_len = 2;
>  
>  	if (t->ptr + val_len + str_len > t->size) {

We need a +1 here on the left side, i.e.

        if (t->ptr + val_len + str_len + 1 > t->size) {

The str_len variable accounted for the terminating null character
before, but this patch removes str_len = strlen(str) + 1; above, and
callers specify the length of str without the terminating null in
str_len.  Thus it can lead to memory corruption, when the new entry
happens to end at 't->ptr + val_len + str_len' and the line added in
the next hunk writes the terminating null beyond the end of the
buffer.  I couldn't create a v4 pack from a current linux repo because
of this; either glibc detected something or 'git packv4-create'
crashed.

Sidenote: couldn't we call the 'ptr' field something else, like
end_offset or end_idx?  It took me some headscratching to figure out
why is it OK to compare a pointer to an integer above, or use a
pointer without dereferencing as an index into an array below (because
ptr is, well, not a pointer after all).

>  		t->size = (t->size + val_len + str_len + 1024) * 3 / 2;
> @@ -92,6 +92,7 @@ int dict_add_entry(struct dict_table *t, int val, const char *str)
>  	t->data[t->ptr] = val >> 8;
>  	t->data[t->ptr + 1] = val;
>  	memcpy(t->data + t->ptr + val_len, str, str_len);
> +	t->data[t->ptr + val_len + str_len] = 0;
>  
>  	i = (t->nb_entries) ?
>  		locate_entry(t, t->data + t->ptr, val_len + str_len) : -1;
> @@ -107,7 +108,7 @@ int dict_add_entry(struct dict_table *t, int val, const char *str)
>  	t->entry[t->nb_entries].offset = t->ptr;
>  	t->entry[t->nb_entries].size = val_len + str_len;
>  	t->entry[t->nb_entries].hits = 1;
> -	t->ptr += val_len + str_len;
> +	t->ptr += val_len + str_len + 1;

Good.


Best,
Gábor


>  	t->nb_entries++;
>  
>  	if (t->hash_size * 3 <= t->nb_entries * 4)
> @@ -135,8 +136,73 @@ static void sort_dict_entries_by_hits(struct dict_table *t)
>  	rehash_entries(t);
>  }
>  
> +static struct dict_table *commit_name_table;
>  static struct dict_table *tree_path_table;
>  
> +/*
> + * Parse the author/committer line from a canonical commit object.
> + * The 'from' argument points right after the "author " or "committer "
> + * string.  The time zone is parsed and stored in *tz_val.  The returned
> + * pointer is right after the end of the email address which is also just
> + * before the time value, or NULL if a parsing error is encountered.
> + */
> +static char *get_nameend_and_tz(char *from, int *tz_val)
> +{
> +	char *end, *tz;
> +
> +	tz = strchr(from, '\n');
> +	/* let's assume the smallest possible string to be "x <x> 0 +0000\n" */
> +	if (!tz || tz - from < 13)
> +		return NULL;
> +	tz -= 4;
> +	end = tz - 4;
> +	while (end - from > 5 && *end != ' ')
> +		end--;
> +	if (end[-1] != '>' || end[0] != ' ' || tz[-2] != ' ')
> +		return NULL;
> +	*tz_val = (tz[0] - '0') * 1000 +
> +		  (tz[1] - '0') * 100 +
> +		  (tz[2] - '0') * 10 +
> +		  (tz[3] - '0');
> +	switch (tz[-1]) {
> +	default:	return NULL;
> +	case '+':	break;
> +	case '-':	*tz_val = -*tz_val;
> +	}
> +	return end;
> +}
> +
> +static int add_commit_dict_entries(void *buf, unsigned long size)
> +{
> +	char *name, *end = NULL;
> +	int tz_val;
> +
> +	if (!commit_name_table)
> +		commit_name_table = create_dict_table();
> +
> +	/* parse and add author info */
> +	name = strstr(buf, "\nauthor ");
> +	if (name) {
> +		name += 8;
> +		end = get_nameend_and_tz(name, &tz_val);
> +	}
> +	if (!name || !end)
> +		return -1;
> +	dict_add_entry(commit_name_table, tz_val, name, end - name);
> +
> +	/* parse and add committer info */
> +	name = strstr(end, "\ncommitter ");
> +	if (name) {
> +	       name += 11;
> +	       end = get_nameend_and_tz(name, &tz_val);
> +	}
> +	if (!name || !end)
> +		return -1;
> +	dict_add_entry(commit_name_table, tz_val, name, end - name);
> +
> +	return 0;
> +}
> +
>  static int add_tree_dict_entries(void *buf, unsigned long size)
>  {
>  	struct tree_desc desc;
> @@ -146,13 +212,16 @@ static int add_tree_dict_entries(void *buf, unsigned long size)
>  		tree_path_table = create_dict_table();
>  
>  	init_tree_desc(&desc, buf, size);
> -	while (tree_entry(&desc, &name_entry))
> +	while (tree_entry(&desc, &name_entry)) {
> +		int pathlen = tree_entry_len(&name_entry);
>  		dict_add_entry(tree_path_table, name_entry.mode,
> -			       name_entry.path);
> +				name_entry.path, pathlen);
> +	}
> +
>  	return 0;
>  }
>  
> -void dict_dump(struct dict_table *t)
> +void dump_dict_table(struct dict_table *t)
>  {
>  	int i;
>  
> @@ -169,6 +238,12 @@ void dict_dump(struct dict_table *t)
>  	}
>  }
>  
> +static void dict_dump(void)
> +{
> +	dump_dict_table(commit_name_table);
> +	dump_dict_table(tree_path_table);
> +}
> +
>  struct idx_entry
>  {
>  	off_t                offset;
> @@ -205,6 +280,7 @@ static int create_pack_dictionaries(struct packed_git *p)
>  		enum object_type type;
>  		unsigned long size;
>  		struct object_info oi = {};
> +		int (*add_dict_entries)(void *, unsigned long);
>  
>  		oi.typep = &type;
>  		oi.sizep = &size;
> @@ -213,7 +289,11 @@ static int create_pack_dictionaries(struct packed_git *p)
>  			    sha1_to_hex(objects[i].sha1), p->pack_name);
>  
>  		switch (type) {
> +		case OBJ_COMMIT:
> +			add_dict_entries = add_commit_dict_entries;
> +			break;
>  		case OBJ_TREE:
> +			add_dict_entries = add_tree_dict_entries;
>  			break;
>  		default:
>  			continue;
> @@ -225,7 +305,7 @@ static int create_pack_dictionaries(struct packed_git *p)
>  		if (check_sha1_signature(objects[i].sha1, data, size, typename(type)))
>  			die("packed %s from %s is corrupt",
>  			    sha1_to_hex(objects[i].sha1), p->pack_name);
> -		if (add_tree_dict_entries(data, size) < 0)
> +		if (add_dict_entries(data, size) < 0)
>  			die("can't process %s object %s",
>  				typename(type), sha1_to_hex(objects[i].sha1));
>  		free(data);
> @@ -285,6 +365,6 @@ int main(int argc, char *argv[])
>  		exit(1);
>  	}
>  	process_one_pack(argv[1]);
> -	dict_dump(tree_path_table);
> +	dict_dump();
>  	return 0;
>  }
> -- 
> 1.8.4.38.g317e65b
> 
> 

  reply	other threads:[~2013-09-05 10:30 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-05  6:19 [PATCH 00/38] pack version 4 basic functionalities Nicolas Pitre
2013-09-05  6:19 ` [PATCH 01/38] pack v4: initial pack dictionary structure and code Nicolas Pitre
2013-09-05  6:19 ` [PATCH 02/38] export packed_object_info() Nicolas Pitre
2013-09-05  6:19 ` [PATCH 03/38] pack v4: scan tree objects Nicolas Pitre
2013-09-05  6:19 ` [PATCH 04/38] pack v4: add tree entry mode support to dictionary entries Nicolas Pitre
2013-09-05  6:19 ` [PATCH 05/38] pack v4: add commit object parsing Nicolas Pitre
2013-09-05 10:30   ` SZEDER Gábor [this message]
2013-09-05 17:30     ` Nicolas Pitre
2013-09-05  6:19 ` [PATCH 06/38] pack v4: split the object list and dictionary creation Nicolas Pitre
2013-09-05  6:19 ` [PATCH 07/38] pack v4: move to struct pack_idx_entry and get rid of our own struct idx_entry Nicolas Pitre
2013-09-05  6:19 ` [PATCH 08/38] pack v4: basic SHA1 reference encoding Nicolas Pitre
2013-09-05  6:19 ` [PATCH 09/38] introduce get_sha1_lowhex() Nicolas Pitre
2013-09-05  6:19 ` [PATCH 10/38] pack v4: commit object encoding Nicolas Pitre
2013-09-06  6:57   ` Junio C Hamano
2013-09-06 21:28     ` Nicolas Pitre
2013-09-06 22:08       ` Junio C Hamano
2013-09-07  4:41         ` Nicolas Pitre
2013-09-05  6:19 ` [PATCH 11/38] pack v4: tree " Nicolas Pitre
2013-09-05  6:19 ` [PATCH 12/38] pack v4: dictionary table output Nicolas Pitre
2013-09-05  6:19 ` [PATCH 13/38] pack v4: creation code Nicolas Pitre
2013-09-05  6:19 ` [PATCH 14/38] pack v4: object headers Nicolas Pitre
2013-09-05  6:19 ` [PATCH 15/38] pack v4: object data copy Nicolas Pitre
2013-09-05  6:19 ` [PATCH 16/38] pack v4: object writing Nicolas Pitre
2013-09-05  6:19 ` [PATCH 17/38] pack v4: tree object delta encoding Nicolas Pitre
2013-09-05  6:19 ` [PATCH 18/38] pack v4: load delta candidate for encoding tree objects Nicolas Pitre
2013-09-05  6:19 ` [PATCH 19/38] packv4-create: optimize delta encoding Nicolas Pitre
2013-09-05  6:19 ` [PATCH 20/38] pack v4: honor pack.compression config option Nicolas Pitre
2013-09-05  6:19 ` [PATCH 21/38] pack v4: relax commit parsing a bit Nicolas Pitre
2013-09-05  6:19 ` [PATCH 22/38] pack index v3 Nicolas Pitre
2013-09-05  6:19 ` [PATCH 23/38] packv4-create: normalize pack name to properly generate the pack index file name Nicolas Pitre
2013-09-05  6:19 ` [PATCH 24/38] packv4-create: add progress display Nicolas Pitre
2013-09-05  6:19 ` [PATCH 25/38] pack v4: initial pack index v3 support on the read side Nicolas Pitre
2013-09-05  6:19 ` [PATCH 26/38] pack v4: object header decode Nicolas Pitre
2013-09-05  6:19 ` [PATCH 27/38] pack v4: code to obtain a SHA1 from a sha1ref Nicolas Pitre
2013-09-05  6:19 ` [PATCH 28/38] pack v4: code to load and prepare a pack dictionary table for use Nicolas Pitre
2013-09-05  6:19 ` [PATCH 29/38] pack v4: code to retrieve a name Nicolas Pitre
2013-09-05  6:19 ` [PATCH 30/38] pack v4: code to recreate a canonical commit object Nicolas Pitre
2013-09-05  6:19 ` [PATCH 31/38] sha1_file.c: make use of decode_varint() Nicolas Pitre
2013-09-05  7:35   ` SZEDER Gábor
2013-09-05  6:19 ` [PATCH 32/38] pack v4: parse delta base reference Nicolas Pitre
2013-09-05  6:19 ` [PATCH 33/38] pack v4: we can read commit objects now Nicolas Pitre
2013-09-05  6:19 ` [PATCH 34/38] pack v4: code to retrieve a path component Nicolas Pitre
2013-09-05  6:19 ` [PATCH 35/38] pack v4: decode tree objects Nicolas Pitre
2013-09-05  6:19 ` [PATCH 36/38] pack v4: get " Nicolas Pitre
2013-09-05  6:20 ` [PATCH 37/38] pack v4: introduce "escape hatches" in the name and path indexes Nicolas Pitre
2013-09-05 19:02   ` Nicolas Pitre
2013-09-05 21:48     ` Nicolas Pitre
2013-09-05 23:57     ` Duy Nguyen
2013-09-05  6:20 ` [PATCH 38/38] packv4-create: add a command line argument to limit tree copy sequences Nicolas Pitre
2013-09-07 10:43 ` [PATCH 00/12] pack v4 support in index-pack Nguyễn Thái Ngọc Duy
2013-09-07 10:43   ` [PATCH 01/12] pack v4: split pv4_create_dict() out of load_dict() Nguyễn Thái Ngọc Duy
2013-09-07 10:43   ` [PATCH 02/12] index-pack: split out varint decoding code Nguyễn Thái Ngọc Duy
2013-09-07 10:43   ` [PATCH 03/12] index-pack: do not allocate buffer for unpacking deltas in the first pass Nguyễn Thái Ngọc Duy
2013-09-07 10:43   ` [PATCH 04/12] index-pack: split inflate/digest code out of unpack_entry_data Nguyễn Thái Ngọc Duy
2013-09-07 10:43   ` [PATCH 05/12] index-pack: parse v4 header and dictionaries Nguyễn Thái Ngọc Duy
2013-09-08  2:14     ` Nicolas Pitre
2013-09-07 10:43   ` [PATCH 06/12] index-pack: make sure all objects are registered in v4's SHA-1 table Nguyễn Thái Ngọc Duy
2013-09-07 10:43   ` [PATCH 07/12] index-pack: parse v4 commit format Nguyễn Thái Ngọc Duy
2013-09-07 10:43   ` [PATCH 08/12] index-pack: parse v4 tree format Nguyễn Thái Ngọc Duy
2013-09-08  2:52     ` Nicolas Pitre
2013-09-07 10:43   ` [PATCH 09/12] index-pack: move delta base queuing code to unpack_raw_entry Nguyễn Thái Ngọc Duy
2013-09-07 10:43   ` [PATCH 10/12] index-pack: record all delta bases in v4 (tree and ref-delta) Nguyễn Thái Ngọc Duy
2013-09-07 10:43   ` [PATCH 11/12] index-pack: skip looking for ofs-deltas in v4 as they are not allowed Nguyễn Thái Ngọc Duy
2013-09-07 10:43   ` [PATCH 12/12] index-pack: resolve v4 one-base trees Nguyễn Thái Ngọc Duy
2013-09-08  3:28     ` Nicolas Pitre
2013-09-08  3:44       ` Duy Nguyen
2013-09-08  7:22   ` [PATCH v2 00/14] pack v4 support in index-pack Nguyễn Thái Ngọc Duy
2013-09-08  7:22     ` [PATCH v2 01/14] pack v4: split pv4_create_dict() out of load_dict() Nguyễn Thái Ngọc Duy
2013-09-08  7:22     ` [PATCH v2 02/14] pack v4: add pv4_free_dict() Nguyễn Thái Ngọc Duy
2013-09-08  7:22     ` [PATCH v2 03/14] index-pack: add more comments on some big functions Nguyễn Thái Ngọc Duy
2013-09-08  7:22     ` [PATCH v2 04/14] index-pack: split out varint decoding code Nguyễn Thái Ngọc Duy
2013-09-08  7:22     ` [PATCH v2 05/14] index-pack: do not allocate buffer for unpacking deltas in the first pass Nguyễn Thái Ngọc Duy
2013-09-08  7:22     ` [PATCH v2 06/14] index-pack: split inflate/digest code out of unpack_entry_data Nguyễn Thái Ngọc Duy
2013-09-08  7:22     ` [PATCH v2 07/14] index-pack: parse v4 header and dictionaries Nguyễn Thái Ngọc Duy
2013-09-08  7:22     ` [PATCH v2 08/14] index-pack: make sure all objects are registered in v4's SHA-1 table Nguyễn Thái Ngọc Duy
2013-09-08  7:22     ` [PATCH v2 09/14] index-pack: parse v4 commit format Nguyễn Thái Ngọc Duy
2013-09-08  7:22     ` [PATCH v2 10/14] index-pack: parse v4 tree format Nguyễn Thái Ngọc Duy
2013-09-08  7:22     ` [PATCH v2 11/14] index-pack: move delta base queuing code to unpack_raw_entry Nguyễn Thái Ngọc Duy
2013-09-08  7:22     ` [PATCH v2 12/14] index-pack: record all delta bases in v4 (tree and ref-delta) Nguyễn Thái Ngọc Duy
2013-09-08  7:22     ` [PATCH v2 13/14] index-pack: skip looking for ofs-deltas in v4 as they are not allowed Nguyễn Thái Ngọc Duy
2013-09-08  7:22     ` [PATCH v2 14/14] index-pack: resolve v4 one-base trees Nguyễn Thái Ngọc Duy
2013-09-08 15:04 ` [PATCH 00/11] pack v4 support in pack-objects Nguyễn Thái Ngọc Duy
2013-09-08 15:04   ` [PATCH 01/11] pack v4: allocate dicts from the beginning Nguyễn Thái Ngọc Duy
2013-09-08 15:04   ` [PATCH 02/11] pack v4: stop using static/global variables in packv4-create.c Nguyễn Thái Ngọc Duy
2013-09-08 15:04   ` [PATCH 03/11] pack v4: move packv4-create.c to libgit.a Nguyễn Thái Ngọc Duy
2013-09-08 20:56     ` Nicolas Pitre
2013-09-08 15:04   ` [PATCH 04/11] pack v4: add version argument to write_pack_header Nguyễn Thái Ngọc Duy
2013-09-08 15:04   ` [PATCH 05/11] pack-write.c: add pv4_encode_in_pack_object_header Nguyễn Thái Ngọc Duy
2013-09-08 20:51     ` Nicolas Pitre
2013-09-08 15:04   ` [PATCH 06/11] pack-objects: add --version to specify written pack version Nguyễn Thái Ngọc Duy
2013-09-08 15:04   ` [PATCH 07/11] list-objects.c: add show_tree_entry callback to traverse_commit_list Nguyễn Thái Ngọc Duy
2013-09-08 15:04   ` [PATCH 08/11] pack-objects: create pack v4 tables Nguyễn Thái Ngọc Duy
2013-09-09 10:40     ` Duy Nguyen
2013-09-09 13:07       ` Nicolas Pitre
2013-09-09 15:21         ` Junio C Hamano
2013-09-08 15:04   ` [PATCH 09/11] pack-objects: do not cache delta for v4 trees Nguyễn Thái Ngọc Duy
2013-09-08 15:04   ` [PATCH 10/11] pack-objects: exclude commits out of delta objects in v4 Nguyễn Thái Ngọc Duy
2013-09-08 15:04   ` [PATCH 11/11] pack-objects: support writing pack v4 Nguyễn Thái Ngọc Duy
2013-09-09 13:57   ` [PATCH v2 00/16] pack v4 support in pack-objects Nguyễn Thái Ngọc Duy
2013-09-09 13:57     ` [PATCH v2 01/16] pack v4: allocate dicts from the beginning Nguyễn Thái Ngọc Duy
2013-09-09 13:57     ` [PATCH v2 02/16] pack v4: stop using static/global variables in packv4-create.c Nguyễn Thái Ngọc Duy
2013-09-09 13:57     ` [PATCH v2 03/16] pack v4: move packv4-create.c to libgit.a Nguyễn Thái Ngọc Duy
2013-09-09 13:57     ` [PATCH v2 04/16] pack v4: add version argument to write_pack_header Nguyễn Thái Ngọc Duy
2013-09-09 13:57     ` [PATCH v2 05/16] pack_write: tighten valid object type check in encode_in_pack_object_header Nguyễn Thái Ngọc Duy
2013-09-09 13:57     ` [PATCH v2 06/16] pack-write.c: add pv4_encode_object_header Nguyễn Thái Ngọc Duy
2013-09-09 13:57     ` [PATCH v2 07/16] pack-objects: add --version to specify written pack version Nguyễn Thái Ngọc Duy
2013-09-09 13:57     ` [PATCH v2 08/16] list-objects.c: add show_tree_entry callback to traverse_commit_list Nguyễn Thái Ngọc Duy
2013-09-09 13:58     ` [PATCH v2 09/16] pack-objects: do not cache delta for v4 trees Nguyễn Thái Ngọc Duy
2013-09-09 13:58     ` [PATCH v2 10/16] pack-objects: exclude commits out of delta objects in v4 Nguyễn Thái Ngọc Duy
2013-09-09 13:58     ` [PATCH v2 11/16] pack-objects: create pack v4 tables Nguyễn Thái Ngọc Duy
2013-09-09 13:58     ` [PATCH v2 12/16] pack-objects: prepare SHA-1 table in v4 Nguyễn Thái Ngọc Duy
2013-09-09 13:58     ` [PATCH v2 13/16] pack-objects: support writing pack v4 Nguyễn Thái Ngọc Duy
2013-09-09 13:58     ` [PATCH v2 14/16] pack v4: support "end-of-pack" indicator in index-pack and pack-objects Nguyễn Thái Ngọc Duy
2013-09-09 13:58     ` [PATCH v2 15/16] index-pack: use nr_objects_final as sha1_table size Nguyễn Thái Ngọc Duy
2013-09-09 15:01       ` Nicolas Pitre
2013-09-09 18:34         ` Junio C Hamano
2013-09-09 18:46           ` Nicolas Pitre
2013-09-09 18:56             ` Junio C Hamano
2013-09-09 19:11               ` Nicolas Pitre
2013-09-09 19:30                 ` Junio C Hamano
2013-09-09 19:56                   ` Nicolas Pitre
2013-09-10  0:45         ` Duy Nguyen
2013-09-12 15:34           ` Nicolas Pitre
2013-09-09 13:58     ` [PATCH v2 16/16] index-pack: support completing thin packs v4 Nguyễn Thái Ngọc Duy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130905103011.GA20919@goldbirke \
    --to=szeder@ira.uka.de \
    --cc=git@vger.kernel.org \
    --cc=nico@fluxnic.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).