git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 2/2] alloc.c: remove the redundant commit_count variable
@ 2014-07-10 23:59 Ramsay Jones
  2014-07-11  0:30 ` Jeff King
  0 siblings, 1 reply; 35+ messages in thread
From: Ramsay Jones @ 2014-07-10 23:59 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, GIT Mailing-list


The 'commit_count' static variable is used in alloc_commit_node()
to set the 'index' field of the commit structure to a unique value.
This variable assumes the same value as the 'count' field of the
'commit_state' allocator state structure, which may be used in its
place.

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
---
 alloc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/alloc.c b/alloc.c
index d7c3605..c6687f9 100644
--- a/alloc.c
+++ b/alloc.c
@@ -64,9 +64,8 @@ static struct alloc_state commit_state;
 
 void *alloc_commit_node(void)
 {
-	static int commit_count;
 	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
-	c->index = commit_count++;
+	c->index = commit_state.count - 1;
 	return c;
 }
 
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/2] alloc.c: remove the redundant commit_count variable
  2014-07-10 23:59 [PATCH v3 2/2] alloc.c: remove the redundant commit_count variable Ramsay Jones
@ 2014-07-11  0:30 ` Jeff King
  2014-07-11  0:59   ` Ramsay Jones
  2014-07-11  8:41   ` [PATCH 0/7] ensure index is set for all OBJ_COMMIT objects variable Jeff King
  0 siblings, 2 replies; 35+ messages in thread
From: Jeff King @ 2014-07-11  0:30 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Junio C Hamano, GIT Mailing-list

On Fri, Jul 11, 2014 at 12:59:48AM +0100, Ramsay Jones wrote:

> The 'commit_count' static variable is used in alloc_commit_node()
> to set the 'index' field of the commit structure to a unique value.
> This variable assumes the same value as the 'count' field of the
> 'commit_state' allocator state structure, which may be used in its
> place.

I don't think we want to do this, because there is a bug in the current
code that I have not reported yet. :)

The code you're touching here was trying to make sure that each commit
gets a unique index, under the assumption that commits only get
allocated via alloc_commit_node. But I think that assumption is wrong.
We can also get commit objects by allocating an OBJ_NONE (e.g., via
lookup_unknown_object) and then converting it into an OBJ_COMMIT when we
find out what it is.

We should be allocating a commit index to it at that point (but we don't
currently), at which point the commit_count and commit_state.count
indices will no longer be in sync.

This only happens in a few places. Most calls will probably be via
lookup_commit (which gets called when we parse_object such an unknown
object), but there are others (e.g., peel_object may fill in the type).
So we could fix it with something like:

diff --git a/commit.c b/commit.c
index fb7897c..f4f4278 100644
--- a/commit.c
+++ b/commit.c
@@ -65,8 +65,11 @@ struct commit *lookup_commit(const unsigned char *sha1)
 		struct commit *c = alloc_commit_node();
 		return create_object(sha1, OBJ_COMMIT, c);
 	}
-	if (!obj->type)
+	if (!obj->type) {
+		struct commit *c = (struct commit *)obj;
 		obj->type = OBJ_COMMIT;
+		c->index = alloc_commit_index();
+	}
 	return check_commit(obj, sha1, 0);
 }
 

where alloc_commit_index() would be a thin wrapper around "return
commit_count++". And then find and update any other callsites that need
it.

The downside is that it's hard to find those callsites. A safer
alternative would be to speculatively allocate a commit index for all
unknown objects. Then if any other code switches the type to commit,
they already have an index. But it also means we'll have holes in our
commit index space, which makes commit-slabs less efficient.

We do not call lookup_unknown_object all that often, though, so it might
be an OK tradeoff (and in at least one case, in diff_tree_stdin, I think
the code can be converted not to use lookup_unknown_object in the first
place). So worrying about the performance may not be worth it.

-Peff

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/2] alloc.c: remove the redundant commit_count variable
  2014-07-11  0:30 ` Jeff King
@ 2014-07-11  0:59   ` Ramsay Jones
  2014-07-11  8:32     ` Jeff King
  2014-07-11  8:41   ` [PATCH 0/7] ensure index is set for all OBJ_COMMIT objects variable Jeff King
  1 sibling, 1 reply; 35+ messages in thread
From: Ramsay Jones @ 2014-07-11  0:59 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, GIT Mailing-list

On 11/07/14 01:30, Jeff King wrote:
> On Fri, Jul 11, 2014 at 12:59:48AM +0100, Ramsay Jones wrote:
> 
>> The 'commit_count' static variable is used in alloc_commit_node()
>> to set the 'index' field of the commit structure to a unique value.
>> This variable assumes the same value as the 'count' field of the
>> 'commit_state' allocator state structure, which may be used in its
>> place.
> 
> I don't think we want to do this, because there is a bug in the current
> code that I have not reported yet. :)

:P OK, I will simply drop this one then.

> 
> The code you're touching here was trying to make sure that each commit
> gets a unique index, under the assumption that commits only get
> allocated via alloc_commit_node. But I think that assumption is wrong.
> We can also get commit objects by allocating an OBJ_NONE (e.g., via
> lookup_unknown_object) and then converting it into an OBJ_COMMIT when we
> find out what it is.

Hmm, I don't know how the object is converted, but the object allocator
is actually allocating an 'union any_object', so it's allocating more
space than for a struct object anyway. If you add an 'index' field to
struct object, (and remove it from struct commit) it could be set in
alloc_object_node(). ie _all_ node types get an index field.

Hmm, that was just off the top of my head, so take with a pinch of salt.

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/2] alloc.c: remove the redundant commit_count variable
  2014-07-11  0:59   ` Ramsay Jones
@ 2014-07-11  8:32     ` Jeff King
  2014-07-11  9:41       ` Ramsay Jones
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff King @ 2014-07-11  8:32 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Junio C Hamano, GIT Mailing-list

On Fri, Jul 11, 2014 at 01:59:53AM +0100, Ramsay Jones wrote:

> > The code you're touching here was trying to make sure that each commit
> > gets a unique index, under the assumption that commits only get
> > allocated via alloc_commit_node. But I think that assumption is wrong.
> > We can also get commit objects by allocating an OBJ_NONE (e.g., via
> > lookup_unknown_object) and then converting it into an OBJ_COMMIT when we
> > find out what it is.
> 
> Hmm, I don't know how the object is converted, but the object allocator
> is actually allocating an 'union any_object', so it's allocating more
> space than for a struct object anyway.

Right, we would generally want to avoid lookup_unknown_object where we
can for that reason.

> If you add an 'index' field to struct object, (and remove it from
> struct commit) it could be set in alloc_object_node(). ie _all_ node
> types get an index field.

That was something I considered when we did the original commit-slab
work, as it would let you do similar tricks for any set of objects, not
just commits. The reasons against it are:

  1. It would bloat the size of blob and tree structs by at least 4
     bytes (probably 8 for alignment). In most repos, commits make up
     only 10-20% of the total objects (so for linux.git, we're talking
     about 25MB extra in the working set).

  2. It makes single types sparse in the index space. In cases where you
     do just want to keep data on commits (and that is the main use),
     you end up allocating a slab entry per object, rather than per
     commit. That wastes memory (much worse than 25MB if your slab items
     are large), and reduces cache locality.

You could probably get around (2) by splitting the index space by type
and allocating them in pools, but that complicates things considerably,
as you have to guess ahead of time at reasonable maximums for each type.

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 0/7] ensure index is set for all OBJ_COMMIT objects variable
  2014-07-11  0:30 ` Jeff King
  2014-07-11  0:59   ` Ramsay Jones
@ 2014-07-11  8:41   ` Jeff King
  2014-07-11  8:42     ` [PATCH 1/7] alloc.c: remove the alloc_raw_commit_node() function Jeff King
                       ` (7 more replies)
  1 sibling, 8 replies; 35+ messages in thread
From: Jeff King @ 2014-07-11  8:41 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Junio C Hamano, GIT Mailing-list

Here's a series to address the bug I mentioned earlier by catching the
conversion of OBJ_NONE to OBJ_COMMIT in a central location and setting
the index there.

I've included your patch 1/2 unchanged in the beginning, as I build on
top of it (and your patch 2/2 is no longer applicable).  The rest is
refactoring leading up to patch 6 to fix the bug. Patch 7 is a bonus
cleanup.

I'd hoped to cap off the series by converting the "type" field of
"struct commit" to a "const unsigned type : 3", which would avoid any
new callers being added that would touch it without going through the
proper procedure.  However, it's a bitfield, which makes it hard to cast
the constness away in the actual setter function. My best attempt was to
use a union with matching const and non-const members, but that would
mean changing all of the sites which read the field (and there are many)
to use "object->type.read".

There may be a clever solution hiding in a dark corner of C, but I
suspect we are entering a realm of portability problems with older
compilers (I even saw one compiler's documentation claim that "const"
was forbidden on bitfields, even though C99 has an example which does
it).

  [1/7]: alloc.c: remove the alloc_raw_commit_node() function
  [2/7]: move setting of object->type to alloc_* functions
  [3/7]: parse_object_buffer: do not set object type
  [4/7]: add object_as_type helper for casting objects
  [5/7]: alloc: factor out commit index
  [6/7]: object_as_type: set commit index
  [7/7]: diff-tree: avoid lookup_unknown_object

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/7] alloc.c: remove the alloc_raw_commit_node() function
  2014-07-11  8:41   ` [PATCH 0/7] ensure index is set for all OBJ_COMMIT objects variable Jeff King
@ 2014-07-11  8:42     ` Jeff King
  2014-07-11  8:46     ` [PATCH 2/7] move setting of object->type to alloc_* functions Jeff King
                       ` (6 subsequent siblings)
  7 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2014-07-11  8:42 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Junio C Hamano, GIT Mailing-list

From: Ramsay Jones <ramsay@ramsay1.demon.co.uk>

In order to encapsulate the setting of the unique commit index, commit
969eba63 ("commit: push commit_index update into alloc_commit_node",
10-06-2014) introduced a (logically private) intermediary allocator
function. However, this function (alloc_raw_commit_node()) was declared
as a public function, which undermines its entire purpose.

Introduce an inline function, alloc_node(), which implements the main
logic of the allocator used by DEFINE_ALLOCATOR, and redefine the macro
in terms of the new function. In addition, use the new function in the
implementation of the alloc_commit_node() allocator, rather than the
intermediary allocator, which can now be removed.

Noticed by sparse ("symbol 'alloc_raw_commit_node' was not declared.
Should it be static?").

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Jeff King <peff@peff.net>
---
 alloc.c | 47 +++++++++++++++++++++++++++++------------------
 1 file changed, 29 insertions(+), 18 deletions(-)

diff --git a/alloc.c b/alloc.c
index eb22a45..d7c3605 100644
--- a/alloc.c
+++ b/alloc.c
@@ -19,22 +19,10 @@
 #define BLOCKING 1024
 
 #define DEFINE_ALLOCATOR(name, type)				\
-static unsigned int name##_allocs;				\
+static struct alloc_state name##_state;				\
 void *alloc_##name##_node(void)					\
 {								\
-	static int nr;						\
-	static type *block;					\
-	void *ret;						\
-								\
-	if (!nr) {						\
-		nr = BLOCKING;					\
-		block = xmalloc(BLOCKING * sizeof(type));	\
-	}							\
-	nr--;							\
-	name##_allocs++;					\
-	ret = block++;						\
-	memset(ret, 0, sizeof(type));				\
-	return ret;						\
+	return alloc_node(&name##_state, sizeof(type));		\
 }
 
 union any_object {
@@ -45,16 +33,39 @@ union any_object {
 	struct tag tag;
 };
 
+struct alloc_state {
+	int count; /* total number of nodes allocated */
+	int nr;    /* number of nodes left in current allocation */
+	void *p;   /* first free node in current allocation */
+};
+
+static inline void *alloc_node(struct alloc_state *s, size_t node_size)
+{
+	void *ret;
+
+	if (!s->nr) {
+		s->nr = BLOCKING;
+		s->p = xmalloc(BLOCKING * node_size);
+	}
+	s->nr--;
+	s->count++;
+	ret = s->p;
+	s->p = (char *)s->p + node_size;
+	memset(ret, 0, node_size);
+	return ret;
+}
+
 DEFINE_ALLOCATOR(blob, struct blob)
 DEFINE_ALLOCATOR(tree, struct tree)
-DEFINE_ALLOCATOR(raw_commit, struct commit)
 DEFINE_ALLOCATOR(tag, struct tag)
 DEFINE_ALLOCATOR(object, union any_object)
 
+static struct alloc_state commit_state;
+
 void *alloc_commit_node(void)
 {
 	static int commit_count;
-	struct commit *c = alloc_raw_commit_node();
+	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
 	c->index = commit_count++;
 	return c;
 }
@@ -66,13 +77,13 @@ static void report(const char *name, unsigned int count, size_t size)
 }
 
 #define REPORT(name, type)	\
-    report(#name, name##_allocs, name##_allocs * sizeof(type) >> 10)
+    report(#name, name##_state.count, name##_state.count * sizeof(type) >> 10)
 
 void alloc_report(void)
 {
 	REPORT(blob, struct blob);
 	REPORT(tree, struct tree);
-	REPORT(raw_commit, struct commit);
+	REPORT(commit, struct commit);
 	REPORT(tag, struct tag);
 	REPORT(object, union any_object);
 }
-- 
2.0.0.566.gfe3e6b2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/7] move setting of object->type to alloc_* functions
  2014-07-11  8:41   ` [PATCH 0/7] ensure index is set for all OBJ_COMMIT objects variable Jeff King
  2014-07-11  8:42     ` [PATCH 1/7] alloc.c: remove the alloc_raw_commit_node() function Jeff King
@ 2014-07-11  8:46     ` Jeff King
  2014-07-12 14:44       ` Ramsay Jones
  2014-07-12 14:55       ` Ramsay Jones
  2014-07-11  8:46     ` [PATCH 3/7] parse_object_buffer: do not set object type Jeff King
                       ` (5 subsequent siblings)
  7 siblings, 2 replies; 35+ messages in thread
From: Jeff King @ 2014-07-11  8:46 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Junio C Hamano, GIT Mailing-list

The "struct object" type implements basic object
polymorphism.  Individual instances are allocated as
concrete types (or as a union type that can store any
object), and a "struct object *" can be cast into its real
type after examining its "type" enum.  This means it is
dangerous to have a type field that does not match the
allocation (e.g., setting the type field of a "struct blob"
to "OBJ_COMMIT" would mean that a reader might read past the
allocated memory).

In most of the current code this is not a problem; the first
thing we do after allocating an object is usually to set its
type field by passing it to create_object. However, the
virtual commits we create in merge-recursive.c do not ever
get their type set. This does not seem to have caused
problems in practice, though (presumably because we always
pass around a "struct commit" pointer and never even look at
the type).

We can fix this oversight and also make it harder for future
code to get it wrong by setting the type directly in the
object allocation functions.

This will also make it easier to fix problems with commit
index allocation, as we know that any object allocated by
alloc_commit_node will meet the invariant that an object
with an OBJ_COMMIT type field will have a unique index
number.

Signed-off-by: Jeff King <peff@peff.net>
---
 alloc.c         | 18 ++++++++++--------
 blob.c          |  2 +-
 builtin/blame.c |  1 -
 commit.c        |  2 +-
 object.c        |  5 ++---
 object.h        |  2 +-
 tag.c           |  2 +-
 tree.c          |  2 +-
 8 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/alloc.c b/alloc.c
index d7c3605..fd5fcb7 100644
--- a/alloc.c
+++ b/alloc.c
@@ -18,11 +18,11 @@
 
 #define BLOCKING 1024
 
-#define DEFINE_ALLOCATOR(name, type)				\
+#define DEFINE_ALLOCATOR(name, flag, type)			\
 static struct alloc_state name##_state;				\
 void *alloc_##name##_node(void)					\
 {								\
-	return alloc_node(&name##_state, sizeof(type));		\
+	return alloc_node(&name##_state, flag, sizeof(type));	\
 }
 
 union any_object {
@@ -39,7 +39,8 @@ struct alloc_state {
 	void *p;   /* first free node in current allocation */
 };
 
-static inline void *alloc_node(struct alloc_state *s, size_t node_size)
+static inline void *alloc_node(struct alloc_state *s, enum object_type type,
+			       size_t node_size)
 {
 	void *ret;
 
@@ -52,20 +53,21 @@ static inline void *alloc_node(struct alloc_state *s, size_t node_size)
 	ret = s->p;
 	s->p = (char *)s->p + node_size;
 	memset(ret, 0, node_size);
+	((struct object *)ret)->type = type;
 	return ret;
 }
 
-DEFINE_ALLOCATOR(blob, struct blob)
-DEFINE_ALLOCATOR(tree, struct tree)
-DEFINE_ALLOCATOR(tag, struct tag)
-DEFINE_ALLOCATOR(object, union any_object)
+DEFINE_ALLOCATOR(blob, OBJ_BLOB, struct blob)
+DEFINE_ALLOCATOR(tree, OBJ_TREE, struct tree)
+DEFINE_ALLOCATOR(tag, OBJ_TAG, struct tag)
+DEFINE_ALLOCATOR(object, OBJ_NONE, union any_object)
 
 static struct alloc_state commit_state;
 
 void *alloc_commit_node(void)
 {
 	static int commit_count;
-	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
+	struct commit *c = alloc_node(&commit_state, OBJ_COMMIT, sizeof(struct commit));
 	c->index = commit_count++;
 	return c;
 }
diff --git a/blob.c b/blob.c
index ae320bd..5720a38 100644
--- a/blob.c
+++ b/blob.c
@@ -7,7 +7,7 @@ struct blob *lookup_blob(const unsigned char *sha1)
 {
 	struct object *obj = lookup_object(sha1);
 	if (!obj)
-		return create_object(sha1, OBJ_BLOB, alloc_blob_node());
+		return create_object(sha1, alloc_blob_node());
 	if (!obj->type)
 		obj->type = OBJ_BLOB;
 	if (obj->type != OBJ_BLOB) {
diff --git a/builtin/blame.c b/builtin/blame.c
index d3b256e..8f3e311 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2287,7 +2287,6 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
 	commit = alloc_commit_node();
 	commit->object.parsed = 1;
 	commit->date = now;
-	commit->object.type = OBJ_COMMIT;
 	parent_tail = &commit->parents;
 
 	if (!resolve_ref_unsafe("HEAD", head_sha1, 1, NULL))
diff --git a/commit.c b/commit.c
index fb7897c..21ed310 100644
--- a/commit.c
+++ b/commit.c
@@ -63,7 +63,7 @@ struct commit *lookup_commit(const unsigned char *sha1)
 	struct object *obj = lookup_object(sha1);
 	if (!obj) {
 		struct commit *c = alloc_commit_node();
-		return create_object(sha1, OBJ_COMMIT, c);
+		return create_object(sha1, c);
 	}
 	if (!obj->type)
 		obj->type = OBJ_COMMIT;
diff --git a/object.c b/object.c
index 9c31e9a..a950b85 100644
--- a/object.c
+++ b/object.c
@@ -141,13 +141,12 @@ static void grow_object_hash(void)
 	obj_hash_size = new_hash_size;
 }
 
-void *create_object(const unsigned char *sha1, int type, void *o)
+void *create_object(const unsigned char *sha1, void *o)
 {
 	struct object *obj = o;
 
 	obj->parsed = 0;
 	obj->used = 0;
-	obj->type = type;
 	obj->flags = 0;
 	hashcpy(obj->sha1, sha1);
 
@@ -163,7 +162,7 @@ struct object *lookup_unknown_object(const unsigned char *sha1)
 {
 	struct object *obj = lookup_object(sha1);
 	if (!obj)
-		obj = create_object(sha1, OBJ_NONE, alloc_object_node());
+		obj = create_object(sha1, alloc_object_node());
 	return obj;
 }
 
diff --git a/object.h b/object.h
index 6e12f2c..8020ace 100644
--- a/object.h
+++ b/object.h
@@ -79,7 +79,7 @@ extern struct object *get_indexed_object(unsigned int);
  */
 struct object *lookup_object(const unsigned char *sha1);
 
-extern void *create_object(const unsigned char *sha1, int type, void *obj);
+extern void *create_object(const unsigned char *sha1, void *obj);
 
 /*
  * Returns the object, having parsed it to find out what it is.
diff --git a/tag.c b/tag.c
index 7b07921..79552c7 100644
--- a/tag.c
+++ b/tag.c
@@ -40,7 +40,7 @@ struct tag *lookup_tag(const unsigned char *sha1)
 {
 	struct object *obj = lookup_object(sha1);
 	if (!obj)
-		return create_object(sha1, OBJ_TAG, alloc_tag_node());
+		return create_object(sha1, alloc_tag_node());
 	if (!obj->type)
 		obj->type = OBJ_TAG;
 	if (obj->type != OBJ_TAG) {
diff --git a/tree.c b/tree.c
index c8c49d7..ed66575 100644
--- a/tree.c
+++ b/tree.c
@@ -183,7 +183,7 @@ struct tree *lookup_tree(const unsigned char *sha1)
 {
 	struct object *obj = lookup_object(sha1);
 	if (!obj)
-		return create_object(sha1, OBJ_TREE, alloc_tree_node());
+		return create_object(sha1, alloc_tree_node());
 	if (!obj->type)
 		obj->type = OBJ_TREE;
 	if (obj->type != OBJ_TREE) {
-- 
2.0.0.566.gfe3e6b2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/7] parse_object_buffer: do not set object type
  2014-07-11  8:41   ` [PATCH 0/7] ensure index is set for all OBJ_COMMIT objects variable Jeff King
  2014-07-11  8:42     ` [PATCH 1/7] alloc.c: remove the alloc_raw_commit_node() function Jeff King
  2014-07-11  8:46     ` [PATCH 2/7] move setting of object->type to alloc_* functions Jeff King
@ 2014-07-11  8:46     ` Jeff King
  2014-07-11  8:48     ` [PATCH 4/7] add object_as_type helper for casting objects Jeff King
                       ` (4 subsequent siblings)
  7 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2014-07-11  8:46 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Junio C Hamano, GIT Mailing-list

The only way that "obj" can be non-NULL is if it came from
one of the lookup_* functions. These functions always ensure
that the object has the expected type (and return NULL
otherwise), so there is no need for us to set the type.

Signed-off-by: Jeff King <peff@peff.net>
---
 object.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/object.c b/object.c
index a950b85..472aa8d 100644
--- a/object.c
+++ b/object.c
@@ -213,8 +213,6 @@ struct object *parse_object_buffer(const unsigned char *sha1, enum object_type t
 		warning("object %s has unknown type id %d", sha1_to_hex(sha1), type);
 		obj = NULL;
 	}
-	if (obj && obj->type == OBJ_NONE)
-		obj->type = type;
 	return obj;
 }
 
-- 
2.0.0.566.gfe3e6b2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 4/7] add object_as_type helper for casting objects
  2014-07-11  8:41   ` [PATCH 0/7] ensure index is set for all OBJ_COMMIT objects variable Jeff King
                       ` (2 preceding siblings ...)
  2014-07-11  8:46     ` [PATCH 3/7] parse_object_buffer: do not set object type Jeff King
@ 2014-07-11  8:48     ` Jeff King
  2014-07-11 10:45       ` Ramsay Jones
  2014-07-11  8:48     ` [PATCH 5/7] alloc: factor out commit index Jeff King
                       ` (3 subsequent siblings)
  7 siblings, 1 reply; 35+ messages in thread
From: Jeff King @ 2014-07-11  8:48 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Junio C Hamano, GIT Mailing-list

When we call lookup_commit, lookup_tree, etc, the logic goes
something like:

  1. Look for an existing object struct. If we don't have
     one, allocate and return a new one.

  2. Double check that any object we have is the expected
     type (and complain and return NULL otherwise).

  3. Convert an object with type OBJ_NONE (from a prior
     call to lookup_unknown_object) to the expected type.

We can encapsulate steps 2 and 3 in a helper function which
checks whether we have the expected object type, converts
OBJ_NONE as appropriate, and returns the object.

Not only does this shorten the code, but it also provides
one central location for converting OBJ_NONE objects into
objects of other types. Future patches will use that to
enforce type-specific invariants.

Since this is a refactoring, we would want it to behave
exactly as the current code. It takes a little reasoning to
see that this is the case:

  - for lookup_{commit,tree,etc} functions, we are just
    pulling steps 2 and 3 into a function that does the same
    thing.

  - for the call in peel_object, we currently only do step 3
    (but we want to consolidate it with the others, as
    mentioned above). However, step 2 is a noop here, as the
    surrounding conditional makes sure we have OBJ_NONE
    (which we want to keep to avoid an extraneous call to
    sha1_object_info).

  - for the call in lookup_commit_reference_gently, we are
    currently doing step 2 but not step 3. However, step 3
    is a noop here. The object we got will have just come
    from deref_tag, which must have figured out the type for
    each object in order to know when to stop peeling.
    Therefore the type will never be OBJ_NONE.

Signed-off-by: Jeff King <peff@peff.net>
---
 blob.c   |  9 +--------
 commit.c | 19 ++-----------------
 object.c | 17 +++++++++++++++++
 object.h |  2 ++
 refs.c   |  3 +--
 tag.c    |  9 +--------
 tree.c   |  9 +--------
 7 files changed, 25 insertions(+), 43 deletions(-)

diff --git a/blob.c b/blob.c
index 5720a38..1fcb8e4 100644
--- a/blob.c
+++ b/blob.c
@@ -8,14 +8,7 @@ struct blob *lookup_blob(const unsigned char *sha1)
 	struct object *obj = lookup_object(sha1);
 	if (!obj)
 		return create_object(sha1, alloc_blob_node());
-	if (!obj->type)
-		obj->type = OBJ_BLOB;
-	if (obj->type != OBJ_BLOB) {
-		error("Object %s is a %s, not a blob",
-		      sha1_to_hex(sha1), typename(obj->type));
-		return NULL;
-	}
-	return (struct blob *) obj;
+	return object_as_type(obj, OBJ_BLOB, 0);
 }
 
 int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size)
diff --git a/commit.c b/commit.c
index 21ed310..dd638de 100644
--- a/commit.c
+++ b/commit.c
@@ -18,19 +18,6 @@ int save_commit_buffer = 1;
 
 const char *commit_type = "commit";
 
-static struct commit *check_commit(struct object *obj,
-				   const unsigned char *sha1,
-				   int quiet)
-{
-	if (obj->type != OBJ_COMMIT) {
-		if (!quiet)
-			error("Object %s is a %s, not a commit",
-			      sha1_to_hex(sha1), typename(obj->type));
-		return NULL;
-	}
-	return (struct commit *) obj;
-}
-
 struct commit *lookup_commit_reference_gently(const unsigned char *sha1,
 					      int quiet)
 {
@@ -38,7 +25,7 @@ struct commit *lookup_commit_reference_gently(const unsigned char *sha1,
 
 	if (!obj)
 		return NULL;
-	return check_commit(obj, sha1, quiet);
+	return object_as_type(obj, OBJ_COMMIT, quiet);
 }
 
 struct commit *lookup_commit_reference(const unsigned char *sha1)
@@ -65,9 +52,7 @@ struct commit *lookup_commit(const unsigned char *sha1)
 		struct commit *c = alloc_commit_node();
 		return create_object(sha1, c);
 	}
-	if (!obj->type)
-		obj->type = OBJ_COMMIT;
-	return check_commit(obj, sha1, 0);
+	return object_as_type(obj, OBJ_COMMIT, 0);
 }
 
 struct commit *lookup_commit_reference_by_name(const char *name)
diff --git a/object.c b/object.c
index 472aa8d..b2319f6 100644
--- a/object.c
+++ b/object.c
@@ -158,6 +158,23 @@ void *create_object(const unsigned char *sha1, void *o)
 	return obj;
 }
 
+void *object_as_type(struct object *obj, enum object_type type, int quiet)
+{
+	if (obj->type == type)
+		return obj;
+	else if (obj->type == OBJ_NONE) {
+		obj->type = type;
+		return obj;
+	}
+	else {
+		if (!quiet)
+			error("object %s is a %s, not a %s",
+			      sha1_to_hex(obj->sha1),
+			      typename(obj->type), typename(type));
+		return NULL;
+	}
+}
+
 struct object *lookup_unknown_object(const unsigned char *sha1)
 {
 	struct object *obj = lookup_object(sha1);
diff --git a/object.h b/object.h
index 8020ace..5e8d8ee 100644
--- a/object.h
+++ b/object.h
@@ -81,6 +81,8 @@ struct object *lookup_object(const unsigned char *sha1);
 
 extern void *create_object(const unsigned char *sha1, void *obj);
 
+void *object_as_type(struct object *obj, enum object_type type, int quiet);
+
 /*
  * Returns the object, having parsed it to find out what it is.
  *
diff --git a/refs.c b/refs.c
index 20e2bf1..5a18e2d 100644
--- a/refs.c
+++ b/refs.c
@@ -1729,9 +1729,8 @@ static enum peel_status peel_object(const unsigned char *name, unsigned char *sh
 
 	if (o->type == OBJ_NONE) {
 		int type = sha1_object_info(name, NULL);
-		if (type < 0)
+		if (type < 0 || !object_as_type(o, type, 0))
 			return PEEL_INVALID;
-		o->type = type;
 	}
 
 	if (o->type != OBJ_TAG)
diff --git a/tag.c b/tag.c
index 79552c7..82d841b 100644
--- a/tag.c
+++ b/tag.c
@@ -41,14 +41,7 @@ struct tag *lookup_tag(const unsigned char *sha1)
 	struct object *obj = lookup_object(sha1);
 	if (!obj)
 		return create_object(sha1, alloc_tag_node());
-	if (!obj->type)
-		obj->type = OBJ_TAG;
-	if (obj->type != OBJ_TAG) {
-		error("Object %s is a %s, not a tag",
-		      sha1_to_hex(sha1), typename(obj->type));
-		return NULL;
-	}
-	return (struct tag *) obj;
+	return object_as_type(obj, OBJ_TAG, 0);
 }
 
 static unsigned long parse_tag_date(const char *buf, const char *tail)
diff --git a/tree.c b/tree.c
index ed66575..bb02c1c 100644
--- a/tree.c
+++ b/tree.c
@@ -184,14 +184,7 @@ struct tree *lookup_tree(const unsigned char *sha1)
 	struct object *obj = lookup_object(sha1);
 	if (!obj)
 		return create_object(sha1, alloc_tree_node());
-	if (!obj->type)
-		obj->type = OBJ_TREE;
-	if (obj->type != OBJ_TREE) {
-		error("Object %s is a %s, not a tree",
-		      sha1_to_hex(sha1), typename(obj->type));
-		return NULL;
-	}
-	return (struct tree *) obj;
+	return object_as_type(obj, OBJ_TREE, 0);
 }
 
 int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size)
-- 
2.0.0.566.gfe3e6b2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 5/7] alloc: factor out commit index
  2014-07-11  8:41   ` [PATCH 0/7] ensure index is set for all OBJ_COMMIT objects variable Jeff King
                       ` (3 preceding siblings ...)
  2014-07-11  8:48     ` [PATCH 4/7] add object_as_type helper for casting objects Jeff King
@ 2014-07-11  8:48     ` Jeff King
  2014-07-11  8:49     ` [PATCH 6/7] object_as_type: set " Jeff King
                       ` (2 subsequent siblings)
  7 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2014-07-11  8:48 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Junio C Hamano, GIT Mailing-list

We keep a static counter to set the commit index on newly
allocated objects. However, since we also need to set the
index on any_objects which are converted to commits, let's
make the counter available as a public function.

While we're moving it, let's make sure the counter is
allocated as an unsigned integer to match the index field in
"struct commit".

Signed-off-by: Jeff King <peff@peff.net>
---
 alloc.c | 9 +++++++--
 cache.h | 1 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/alloc.c b/alloc.c
index fd5fcb7..21f3d81 100644
--- a/alloc.c
+++ b/alloc.c
@@ -64,11 +64,16 @@ DEFINE_ALLOCATOR(object, OBJ_NONE, union any_object)
 
 static struct alloc_state commit_state;
 
+unsigned int alloc_commit_index(void)
+{
+	static unsigned int count;
+	return count++;
+}
+
 void *alloc_commit_node(void)
 {
-	static int commit_count;
 	struct commit *c = alloc_node(&commit_state, OBJ_COMMIT, sizeof(struct commit));
-	c->index = commit_count++;
+	c->index = alloc_commit_index();
 	return c;
 }
 
diff --git a/cache.h b/cache.h
index df65231..42a5e86 100644
--- a/cache.h
+++ b/cache.h
@@ -1376,6 +1376,7 @@ extern void *alloc_commit_node(void);
 extern void *alloc_tag_node(void);
 extern void *alloc_object_node(void);
 extern void alloc_report(void);
+extern unsigned int alloc_commit_index(void);
 
 /* trace.c */
 __attribute__((format (printf, 1, 2)))
-- 
2.0.0.566.gfe3e6b2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 6/7] object_as_type: set commit index
  2014-07-11  8:41   ` [PATCH 0/7] ensure index is set for all OBJ_COMMIT objects variable Jeff King
                       ` (4 preceding siblings ...)
  2014-07-11  8:48     ` [PATCH 5/7] alloc: factor out commit index Jeff King
@ 2014-07-11  8:49     ` Jeff King
  2014-07-11  8:50     ` [PATCH 7/7] diff-tree: avoid lookup_unknown_object Jeff King
  2014-07-11 10:31     ` [PATCH 0/7] ensure index is set for all OBJ_COMMIT objects variable Ramsay Jones
  7 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2014-07-11  8:49 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Junio C Hamano, GIT Mailing-list

The point of the "index" field of struct commit is that
every allocated commit would have a uniquely allocated
value. It is supposed to be an invariant that whenever
object->type is set to OBJ_COMMIT, we have a unique index.

Commit 969eba6 (commit: push commit_index update into
alloc_commit_node, 2014-06-10) covered this case for
newly-allocated commits. However, we may also allocate an
OBJ_NONE object via lookup_unknown_object, and only later
convert it to a commit. We must make sure that we set the
commit index when we switch the type field.

Signed-off-by: Jeff King <peff@peff.net>
---
 object.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/object.c b/object.c
index b2319f6..69fbbbf 100644
--- a/object.c
+++ b/object.c
@@ -163,6 +163,8 @@ void *object_as_type(struct object *obj, enum object_type type, int quiet)
 	if (obj->type == type)
 		return obj;
 	else if (obj->type == OBJ_NONE) {
+		if (type == OBJ_COMMIT)
+			((struct commit *)obj)->index = alloc_commit_index();
 		obj->type = type;
 		return obj;
 	}
-- 
2.0.0.566.gfe3e6b2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 7/7] diff-tree: avoid lookup_unknown_object
  2014-07-11  8:41   ` [PATCH 0/7] ensure index is set for all OBJ_COMMIT objects variable Jeff King
                       ` (5 preceding siblings ...)
  2014-07-11  8:49     ` [PATCH 6/7] object_as_type: set " Jeff King
@ 2014-07-11  8:50     ` Jeff King
  2014-07-11 10:31     ` [PATCH 0/7] ensure index is set for all OBJ_COMMIT objects variable Ramsay Jones
  7 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2014-07-11  8:50 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Junio C Hamano, GIT Mailing-list

We generally want to avoid lookup_unknown_object, because it
results in allocating more memory for the object than may be
strictly necessary.

In this case, it is used to check whether we have an
already-parsed object before calling parse_object, to save
us from reading the object from disk. Using lookup_object
would be fine for that purpose, but we can take it a step
further. Since this code was written, parse_object already
learned the "check lookup_object" optimization; we can
simply call parse_object directly.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/diff-tree.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/builtin/diff-tree.c b/builtin/diff-tree.c
index ce0e019..1c4ad62 100644
--- a/builtin/diff-tree.c
+++ b/builtin/diff-tree.c
@@ -68,9 +68,7 @@ static int diff_tree_stdin(char *line)
 	line[len-1] = 0;
 	if (get_sha1_hex(line, sha1))
 		return -1;
-	obj = lookup_unknown_object(sha1);
-	if (!obj || !obj->parsed)
-		obj = parse_object(sha1);
+	obj = parse_object(sha1);
 	if (!obj)
 		return -1;
 	if (obj->type == OBJ_COMMIT)
-- 
2.0.0.566.gfe3e6b2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/2] alloc.c: remove the redundant commit_count variable
  2014-07-11  8:32     ` Jeff King
@ 2014-07-11  9:41       ` Ramsay Jones
  0 siblings, 0 replies; 35+ messages in thread
From: Ramsay Jones @ 2014-07-11  9:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, GIT Mailing-list

On 11/07/14 09:32, Jeff King wrote:
> On Fri, Jul 11, 2014 at 01:59:53AM +0100, Ramsay Jones wrote:
> 
>>> The code you're touching here was trying to make sure that each commit
>>> gets a unique index, under the assumption that commits only get
>>> allocated via alloc_commit_node. But I think that assumption is wrong.
>>> We can also get commit objects by allocating an OBJ_NONE (e.g., via
>>> lookup_unknown_object) and then converting it into an OBJ_COMMIT when we
>>> find out what it is.
>>
>> Hmm, I don't know how the object is converted, but the object allocator
>> is actually allocating an 'union any_object', so it's allocating more
>> space than for a struct object anyway.
> 
> Right, we would generally want to avoid lookup_unknown_object where we
> can for that reason.
> 
>> If you add an 'index' field to struct object, (and remove it from
>> struct commit) it could be set in alloc_object_node(). ie _all_ node
>> types get an index field.
> 
> That was something I considered when we did the original commit-slab
> work, as it would let you do similar tricks for any set of objects, not
> just commits. The reasons against it are:
> 
>   1. It would bloat the size of blob and tree structs by at least 4
>      bytes (probably 8 for alignment). In most repos, commits make up
>      only 10-20% of the total objects (so for linux.git, we're talking
>      about 25MB extra in the working set).
> 
>   2. It makes single types sparse in the index space. In cases where you
>      do just want to keep data on commits (and that is the main use),
>      you end up allocating a slab entry per object, rather than per
>      commit. That wastes memory (much worse than 25MB if your slab items
>      are large), and reduces cache locality.
> 

Ahem, yeah I keep telling myself not to post at 2am ... ;-)

Although I haven't given this too much extra thought, I'm beginning
to think that your best course would be to revert commit 969eba63
and add an convert_object_to_commit() function to commit.c which
would set c->index. Then track down each place an OBJ_NONE gets
converted to an OBJ_COMMIT and insert a call to
convert_object_to_commit(). (which may do more than just set the
index field ...)

Hmm, I've just noticed a new series in my in-box. I guess I will
discover what you decided to do shortly ... ;-P

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/7] ensure index is set for all OBJ_COMMIT objects variable
  2014-07-11  8:41   ` [PATCH 0/7] ensure index is set for all OBJ_COMMIT objects variable Jeff King
                       ` (6 preceding siblings ...)
  2014-07-11  8:50     ` [PATCH 7/7] diff-tree: avoid lookup_unknown_object Jeff King
@ 2014-07-11 10:31     ` Ramsay Jones
  7 siblings, 0 replies; 35+ messages in thread
From: Ramsay Jones @ 2014-07-11 10:31 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, GIT Mailing-list

On 11/07/14 09:41, Jeff King wrote:
> Here's a series to address the bug I mentioned earlier by catching the
> conversion of OBJ_NONE to OBJ_COMMIT in a central location and setting
> the index there.
> 
> I've included your patch 1/2 unchanged in the beginning, as I build on
> top of it (and your patch 2/2 is no longer applicable).  The rest is
> refactoring leading up to patch 6 to fix the bug. Patch 7 is a bonus
> cleanup.

I have just read this series in my email client (I will apply and test
them later), but this looks very good to me. :)

Only one patch gave me slight pause; see later.

> 
> I'd hoped to cap off the series by converting the "type" field of
> "struct commit" to a "const unsigned type : 3", which would avoid any
> new callers being added that would touch it without going through the
> proper procedure.  However, it's a bitfield, which makes it hard to cast
> the constness away in the actual setter function. My best attempt was to
> use a union with matching const and non-const members, but that would
> mean changing all of the sites which read the field (and there are many)
> to use "object->type.read".
> 
> There may be a clever solution hiding in a dark corner of C, but I
> suspect we are entering a realm of portability problems with older
> compilers (I even saw one compiler's documentation claim that "const"
> was forbidden on bitfields, even though C99 has an example which does
> it).

Yes, I've come across such compilers too; I wouldn't go there! ;-P

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/7] add object_as_type helper for casting objects
  2014-07-11  8:48     ` [PATCH 4/7] add object_as_type helper for casting objects Jeff King
@ 2014-07-11 10:45       ` Ramsay Jones
  2014-07-11 16:59         ` Jeff King
  0 siblings, 1 reply; 35+ messages in thread
From: Ramsay Jones @ 2014-07-11 10:45 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, GIT Mailing-list

On 11/07/14 09:48, Jeff King wrote:

[snip]

> diff --git a/object.c b/object.c
> index 472aa8d..b2319f6 100644
> --- a/object.c
> +++ b/object.c
> @@ -158,6 +158,23 @@ void *create_object(const unsigned char *sha1, void *o)
>  	return obj;
>  }
>  
> +void *object_as_type(struct object *obj, enum object_type type, int quiet)
> +{
> +	if (obj->type == type)
> +		return obj;
> +	else if (obj->type == OBJ_NONE) {
> +		obj->type = type;
> +		return obj;
> +	}
> +	else {
> +		if (!quiet)
> +			error("object %s is a %s, not a %s",
> +			      sha1_to_hex(obj->sha1),
> +			      typename(obj->type), typename(type));
> +		return NULL;
> +	}
> +}
> +
>  struct object *lookup_unknown_object(const unsigned char *sha1)
>  {
>  	struct object *obj = lookup_object(sha1);
> diff --git a/object.h b/object.h
> index 8020ace..5e8d8ee 100644
> --- a/object.h
> +++ b/object.h
> @@ -81,6 +81,8 @@ struct object *lookup_object(const unsigned char *sha1);
>  
>  extern void *create_object(const unsigned char *sha1, void *obj);
>  
> +void *object_as_type(struct object *obj, enum object_type type, int quiet);
> +
>  /*
>   * Returns the object, having parsed it to find out what it is.
>   *
> diff --git a/refs.c b/refs.c
> index 20e2bf1..5a18e2d 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1729,9 +1729,8 @@ static enum peel_status peel_object(const unsigned char *name, unsigned char *sh
>  
>  	if (o->type == OBJ_NONE) {
>  		int type = sha1_object_info(name, NULL);
> -		if (type < 0)
> +		if (type < 0 || !object_as_type(o, type, 0))
--------------------------------------------------------^^^

It is not possible here for object_as_type() to issue an error()
report, but I had to go look at the code to check. (It would have
been a change in behaviour, otherwise). So, it doesn't really matter
what you pass to the quiet argument, but setting it to 1 _may_ help
the next reader. (No, I'm not convinced either; the only reason I
knew it had anything to do with error messages was because I had
just read the code ...) Hmm, dunno.

>  			return PEEL_INVALID;
> -		o->type = type;
>  	}
>  
>  	if (o->type != OBJ_TAG)
> diff --git a/tag.c b/tag.c
> index 79552c7..82d841b 100644
> --- a/tag.c
> +++ b/tag.c
> @@ -41,14 +41,7 @@ struct tag *lookup_tag(const unsigned char *sha1)
>  	struct object *obj = lookup_object(sha1);
>  	if (!obj)
>  		return create_object(sha1, alloc_tag_node());
> -	if (!obj->type)
> -		obj->type = OBJ_TAG;
> -	if (obj->type != OBJ_TAG) {
> -		error("Object %s is a %s, not a tag",
> -		      sha1_to_hex(sha1), typename(obj->type));
> -		return NULL;
> -	}
> -	return (struct tag *) obj;
> +	return object_as_type(obj, OBJ_TAG, 0);
>  }
>  
>  static unsigned long parse_tag_date(const char *buf, const char *tail)
> diff --git a/tree.c b/tree.c
> index ed66575..bb02c1c 100644
> --- a/tree.c
> +++ b/tree.c
> @@ -184,14 +184,7 @@ struct tree *lookup_tree(const unsigned char *sha1)
>  	struct object *obj = lookup_object(sha1);
>  	if (!obj)
>  		return create_object(sha1, alloc_tree_node());
> -	if (!obj->type)
> -		obj->type = OBJ_TREE;
> -	if (obj->type != OBJ_TREE) {
> -		error("Object %s is a %s, not a tree",
> -		      sha1_to_hex(sha1), typename(obj->type));
> -		return NULL;
> -	}
> -	return (struct tree *) obj;
> +	return object_as_type(obj, OBJ_TREE, 0);
>  }
>  
>  int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size)
> 

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/7] add object_as_type helper for casting objects
  2014-07-11 10:45       ` Ramsay Jones
@ 2014-07-11 16:59         ` Jeff King
  0 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2014-07-11 16:59 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Junio C Hamano, GIT Mailing-list

On Fri, Jul 11, 2014 at 11:45:58AM +0100, Ramsay Jones wrote:

> > @@ -1729,9 +1729,8 @@ static enum peel_status peel_object(const unsigned char *name, unsigned char *sh
> >  
> >  	if (o->type == OBJ_NONE) {
> >  		int type = sha1_object_info(name, NULL);
> > -		if (type < 0)
> > +		if (type < 0 || !object_as_type(o, type, 0))
> --------------------------------------------------------^^^
> 
> It is not possible here for object_as_type() to issue an error()
> report, but I had to go look at the code to check. (It would have
> been a change in behaviour, otherwise). So, it doesn't really matter
> what you pass to the quiet argument, but setting it to 1 _may_ help
> the next reader. (No, I'm not convinced either; the only reason I
> knew it had anything to do with error messages was because I had
> just read the code ...) Hmm, dunno.

Right, as I mentioned in the commit message, the type-check part of
object_type is a noop here. In that sense you could write this as just:

  object_as_type(o, type. 1);

and ignore the return value. However, I'd rather err on the side of
checking for and reporting the error, even if we expect it to do nothing
right now. That decouples the two functions, and if object_as_type ever
learns to report other errors, then we automatically do the right
thing here.

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/7] move setting of object->type to alloc_* functions
  2014-07-11  8:46     ` [PATCH 2/7] move setting of object->type to alloc_* functions Jeff King
@ 2014-07-12 14:44       ` Ramsay Jones
  2014-07-12 18:05         ` Jeff King
  2014-07-12 14:55       ` Ramsay Jones
  1 sibling, 1 reply; 35+ messages in thread
From: Ramsay Jones @ 2014-07-12 14:44 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, GIT Mailing-list

On 11/07/14 09:46, Jeff King wrote:
> The "struct object" type implements basic object
> polymorphism.  Individual instances are allocated as
> concrete types (or as a union type that can store any
> object), and a "struct object *" can be cast into its real
> type after examining its "type" enum.  This means it is
> dangerous to have a type field that does not match the
> allocation (e.g., setting the type field of a "struct blob"
> to "OBJ_COMMIT" would mean that a reader might read past the
> allocated memory).
> 
> In most of the current code this is not a problem; the first
> thing we do after allocating an object is usually to set its
> type field by passing it to create_object. However, the
> virtual commits we create in merge-recursive.c do not ever
> get their type set. This does not seem to have caused
> problems in practice, though (presumably because we always
> pass around a "struct commit" pointer and never even look at
> the type).
> 
> We can fix this oversight and also make it harder for future
> code to get it wrong by setting the type directly in the
> object allocation functions.
> 
> This will also make it easier to fix problems with commit
> index allocation, as we know that any object allocated by
> alloc_commit_node will meet the invariant that an object
> with an OBJ_COMMIT type field will have a unique index
> number.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  alloc.c         | 18 ++++++++++--------
>  blob.c          |  2 +-
>  builtin/blame.c |  1 -
>  commit.c        |  2 +-
>  object.c        |  5 ++---
>  object.h        |  2 +-
>  tag.c           |  2 +-
>  tree.c          |  2 +-
>  8 files changed, 17 insertions(+), 17 deletions(-)
> 
> diff --git a/alloc.c b/alloc.c
> index d7c3605..fd5fcb7 100644
> --- a/alloc.c
> +++ b/alloc.c
> @@ -18,11 +18,11 @@
>  
>  #define BLOCKING 1024
>  
> -#define DEFINE_ALLOCATOR(name, type)				\
> +#define DEFINE_ALLOCATOR(name, flag, type)			\
>  static struct alloc_state name##_state;				\
>  void *alloc_##name##_node(void)					\
>  {								\
> -	return alloc_node(&name##_state, sizeof(type));		\
> +	return alloc_node(&name##_state, flag, sizeof(type));	\
>  }

I don't particularly like 'flag' here. (not a massive dislike, mind you:)

Perhaps: flag->object_type, type->node_type?
Or, if that's too verbose, maybe just: flag->type, type->node?

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/7] move setting of object->type to alloc_* functions
  2014-07-11  8:46     ` [PATCH 2/7] move setting of object->type to alloc_* functions Jeff King
  2014-07-12 14:44       ` Ramsay Jones
@ 2014-07-12 14:55       ` Ramsay Jones
  2014-07-12 18:07         ` Jeff King
  1 sibling, 1 reply; 35+ messages in thread
From: Ramsay Jones @ 2014-07-12 14:55 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, GIT Mailing-list

On 11/07/14 09:46, Jeff King wrote:

[snip]

Sorry, hit send too early ...

> diff --git a/blob.c b/blob.c
> index ae320bd..5720a38 100644
> --- a/blob.c
> +++ b/blob.c
> @@ -7,7 +7,7 @@ struct blob *lookup_blob(const unsigned char *sha1)
>  {
>  	struct object *obj = lookup_object(sha1);
>  	if (!obj)
> -		return create_object(sha1, OBJ_BLOB, alloc_blob_node());
> +		return create_object(sha1, alloc_blob_node());
>  	if (!obj->type)
>  		obj->type = OBJ_BLOB;
>  	if (obj->type != OBJ_BLOB) {
> diff --git a/builtin/blame.c b/builtin/blame.c
> index d3b256e..8f3e311 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -2287,7 +2287,6 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
>  	commit = alloc_commit_node();
>  	commit->object.parsed = 1;
>  	commit->date = now;
> -	commit->object.type = OBJ_COMMIT;
>  	parent_tail = &commit->parents;
>  
>  	if (!resolve_ref_unsafe("HEAD", head_sha1, 1, NULL))
> diff --git a/commit.c b/commit.c
> index fb7897c..21ed310 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -63,7 +63,7 @@ struct commit *lookup_commit(const unsigned char *sha1)
>  	struct object *obj = lookup_object(sha1);
>  	if (!obj) {
>  		struct commit *c = alloc_commit_node();
> -		return create_object(sha1, OBJ_COMMIT, c);
> +		return create_object(sha1, c);
>  	}

perhaps:
	if (!obj)
		return create_object(sha1, alloc_commit_node());

(increasing similarity with other calls here ...)

>  	if (!obj->type)
>  		obj->type = OBJ_COMMIT;
> diff --git a/object.c b/object.c
> index 9c31e9a..a950b85 100644
> --- a/object.c
> +++ b/object.c
> @@ -141,13 +141,12 @@ static void grow_object_hash(void)
>  	obj_hash_size = new_hash_size;
>  }
>  
> -void *create_object(const unsigned char *sha1, int type, void *o)
> +void *create_object(const unsigned char *sha1, void *o)
>  {
>  	struct object *obj = o;
>  
>  	obj->parsed = 0;
>  	obj->used = 0;
> -	obj->type = type;
>  	obj->flags = 0;
>  	hashcpy(obj->sha1, sha1);
>  
> @@ -163,7 +162,7 @@ struct object *lookup_unknown_object(const unsigned char *sha1)
>  {
>  	struct object *obj = lookup_object(sha1);
>  	if (!obj)
> -		obj = create_object(sha1, OBJ_NONE, alloc_object_node());
> +		obj = create_object(sha1, alloc_object_node());
>  	return obj;
>  }
>  
> diff --git a/object.h b/object.h
> index 6e12f2c..8020ace 100644
> --- a/object.h
> +++ b/object.h
> @@ -79,7 +79,7 @@ extern struct object *get_indexed_object(unsigned int);
>   */
>  struct object *lookup_object(const unsigned char *sha1);
>  
> -extern void *create_object(const unsigned char *sha1, int type, void *obj);
> +extern void *create_object(const unsigned char *sha1, void *obj);
>  
>  /*
>   * Returns the object, having parsed it to find out what it is.
> diff --git a/tag.c b/tag.c
> index 7b07921..79552c7 100644
> --- a/tag.c
> +++ b/tag.c
> @@ -40,7 +40,7 @@ struct tag *lookup_tag(const unsigned char *sha1)
>  {
>  	struct object *obj = lookup_object(sha1);
>  	if (!obj)
> -		return create_object(sha1, OBJ_TAG, alloc_tag_node());
> +		return create_object(sha1, alloc_tag_node());
>  	if (!obj->type)
>  		obj->type = OBJ_TAG;
>  	if (obj->type != OBJ_TAG) {
> diff --git a/tree.c b/tree.c
> index c8c49d7..ed66575 100644
> --- a/tree.c
> +++ b/tree.c
> @@ -183,7 +183,7 @@ struct tree *lookup_tree(const unsigned char *sha1)
>  {
>  	struct object *obj = lookup_object(sha1);
>  	if (!obj)
> -		return create_object(sha1, OBJ_TREE, alloc_tree_node());
> +		return create_object(sha1, alloc_tree_node());
>  	if (!obj->type)
>  		obj->type = OBJ_TREE;
>  	if (obj->type != OBJ_TREE) {
> 

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/7] move setting of object->type to alloc_* functions
  2014-07-12 14:44       ` Ramsay Jones
@ 2014-07-12 18:05         ` Jeff King
  2014-07-13  6:41           ` Jeff King
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff King @ 2014-07-12 18:05 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Junio C Hamano, GIT Mailing-list

On Sat, Jul 12, 2014 at 03:44:06PM +0100, Ramsay Jones wrote:

> > -	return alloc_node(&name##_state, sizeof(type));		\
> > +	return alloc_node(&name##_state, flag, sizeof(type));	\
> >  }
> 
> I don't particularly like 'flag' here. (not a massive dislike, mind you:)
> 
> Perhaps: flag->object_type, type->node_type?
> Or, if that's too verbose, maybe just: flag->type, type->node?

Me either, but as you noticed, type was taken. Your suggestions seem
fine. We could also just do away with the macro as discussed earlier (we
already do in the commit_node case, anyway...).

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/7] move setting of object->type to alloc_* functions
  2014-07-12 14:55       ` Ramsay Jones
@ 2014-07-12 18:07         ` Jeff King
  0 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2014-07-12 18:07 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Junio C Hamano, GIT Mailing-list

On Sat, Jul 12, 2014 at 03:55:35PM +0100, Ramsay Jones wrote:

> >  	if (!obj) {
> >  		struct commit *c = alloc_commit_node();
> > -		return create_object(sha1, OBJ_COMMIT, c);
> > +		return create_object(sha1, c);
> >  	}
> 
> perhaps:
> 	if (!obj)
> 		return create_object(sha1, alloc_commit_node());
> 
> (increasing similarity with other calls here ...)

Yeah, I noticed that but didn't change it to keep the diff small. The
one that should have is 969eba6, but it is not a big deal to do it here.

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/7] move setting of object->type to alloc_* functions
  2014-07-12 18:05         ` Jeff King
@ 2014-07-13  6:41           ` Jeff King
  2014-07-13  6:41             ` [PATCH v2 1/8] alloc.c: remove the alloc_raw_commit_node() function Jeff King
                               ` (8 more replies)
  0 siblings, 9 replies; 35+ messages in thread
From: Jeff King @ 2014-07-13  6:41 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Junio C Hamano, GIT Mailing-list

On Sat, Jul 12, 2014 at 02:05:39PM -0400, Jeff King wrote:

> > I don't particularly like 'flag' here. (not a massive dislike, mind you:)
> > 
> > Perhaps: flag->object_type, type->node_type?
> > Or, if that's too verbose, maybe just: flag->type, type->node?
> 
> Me either, but as you noticed, type was taken. Your suggestions seem
> fine. We could also just do away with the macro as discussed earlier (we
> already do in the commit_node case, anyway...).

Thinking on this more, writing out the definitions is the only sane
thing to do here, now that alloc_commit_node does not use the macro.
Otherwise you are inviting people to modify the macro, but fail to
notice that the commit allocator also needs updating.

Here's a re-roll. The interesting bit is the addition of the second
patch (but the rest needed to be rebased on top).

  [1/8]: alloc.c: remove the alloc_raw_commit_node() function
  [2/8]: alloc: write out allocator definitions
  [3/8]: move setting of object->type to alloc_* functions
  [4/8]: parse_object_buffer: do not set object type
  [5/8]: add object_as_type helper for casting objects
  [6/8]: alloc: factor out commit index
  [7/8]: object_as_type: set commit index
  [8/8]: diff-tree: avoid lookup_unknown_object

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 1/8] alloc.c: remove the alloc_raw_commit_node() function
  2014-07-13  6:41           ` Jeff King
@ 2014-07-13  6:41             ` Jeff King
  2014-07-15 20:06               ` Junio C Hamano
  2014-07-13  6:41             ` [PATCH v2 2/8] alloc: write out allocator definitions Jeff King
                               ` (7 subsequent siblings)
  8 siblings, 1 reply; 35+ messages in thread
From: Jeff King @ 2014-07-13  6:41 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Junio C Hamano, GIT Mailing-list

From: Ramsay Jones <ramsay@ramsay1.demon.co.uk>

In order to encapsulate the setting of the unique commit index, commit
969eba63 ("commit: push commit_index update into alloc_commit_node",
10-06-2014) introduced a (logically private) intermediary allocator
function. However, this function (alloc_raw_commit_node()) was declared
as a public function, which undermines its entire purpose.

Introduce an inline function, alloc_node(), which implements the main
logic of the allocator used by DEFINE_ALLOCATOR, and redefine the macro
in terms of the new function. In addition, use the new function in the
implementation of the alloc_commit_node() allocator, rather than the
intermediary allocator, which can now be removed.

Noticed by sparse ("symbol 'alloc_raw_commit_node' was not declared.
Should it be static?").

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Jeff King <peff@peff.net>
---
 alloc.c | 47 +++++++++++++++++++++++++++++------------------
 1 file changed, 29 insertions(+), 18 deletions(-)

diff --git a/alloc.c b/alloc.c
index eb22a45..d7c3605 100644
--- a/alloc.c
+++ b/alloc.c
@@ -19,22 +19,10 @@
 #define BLOCKING 1024
 
 #define DEFINE_ALLOCATOR(name, type)				\
-static unsigned int name##_allocs;				\
+static struct alloc_state name##_state;				\
 void *alloc_##name##_node(void)					\
 {								\
-	static int nr;						\
-	static type *block;					\
-	void *ret;						\
-								\
-	if (!nr) {						\
-		nr = BLOCKING;					\
-		block = xmalloc(BLOCKING * sizeof(type));	\
-	}							\
-	nr--;							\
-	name##_allocs++;					\
-	ret = block++;						\
-	memset(ret, 0, sizeof(type));				\
-	return ret;						\
+	return alloc_node(&name##_state, sizeof(type));		\
 }
 
 union any_object {
@@ -45,16 +33,39 @@ union any_object {
 	struct tag tag;
 };
 
+struct alloc_state {
+	int count; /* total number of nodes allocated */
+	int nr;    /* number of nodes left in current allocation */
+	void *p;   /* first free node in current allocation */
+};
+
+static inline void *alloc_node(struct alloc_state *s, size_t node_size)
+{
+	void *ret;
+
+	if (!s->nr) {
+		s->nr = BLOCKING;
+		s->p = xmalloc(BLOCKING * node_size);
+	}
+	s->nr--;
+	s->count++;
+	ret = s->p;
+	s->p = (char *)s->p + node_size;
+	memset(ret, 0, node_size);
+	return ret;
+}
+
 DEFINE_ALLOCATOR(blob, struct blob)
 DEFINE_ALLOCATOR(tree, struct tree)
-DEFINE_ALLOCATOR(raw_commit, struct commit)
 DEFINE_ALLOCATOR(tag, struct tag)
 DEFINE_ALLOCATOR(object, union any_object)
 
+static struct alloc_state commit_state;
+
 void *alloc_commit_node(void)
 {
 	static int commit_count;
-	struct commit *c = alloc_raw_commit_node();
+	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
 	c->index = commit_count++;
 	return c;
 }
@@ -66,13 +77,13 @@ static void report(const char *name, unsigned int count, size_t size)
 }
 
 #define REPORT(name, type)	\
-    report(#name, name##_allocs, name##_allocs * sizeof(type) >> 10)
+    report(#name, name##_state.count, name##_state.count * sizeof(type) >> 10)
 
 void alloc_report(void)
 {
 	REPORT(blob, struct blob);
 	REPORT(tree, struct tree);
-	REPORT(raw_commit, struct commit);
+	REPORT(commit, struct commit);
 	REPORT(tag, struct tag);
 	REPORT(object, union any_object);
 }
-- 
2.0.0.566.gfe3e6b2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 2/8] alloc: write out allocator definitions
  2014-07-13  6:41           ` Jeff King
  2014-07-13  6:41             ` [PATCH v2 1/8] alloc.c: remove the alloc_raw_commit_node() function Jeff King
@ 2014-07-13  6:41             ` Jeff King
  2014-07-15 20:11               ` Junio C Hamano
  2014-07-13  6:41             ` [PATCH v2 3/8] move setting of object->type to alloc_* functions Jeff King
                               ` (6 subsequent siblings)
  8 siblings, 1 reply; 35+ messages in thread
From: Jeff King @ 2014-07-13  6:41 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Junio C Hamano, GIT Mailing-list

Because the allocator functions for tree, blobs, etc are all
very similar, we originally used a macro to avoid repeating
ourselves. Since the prior commit, though, the heavy lifting
is done by an inline helper function.  The macro does still
save us a few lines, but at some readability cost.  It
obfuscates the function definitions (and makes them hard to
find via grep).

Much worse, though, is the fact that it isn't used
consistently for all allocators. Somebody coming later may
be tempted to modify DEFINE_ALLOCATOR, but they would miss
alloc_commit_node, which is treated specially.

Let's just drop the macro and write everything out
explicitly.

Signed-off-by: Jeff King <peff@peff.net>
---
 alloc.c | 38 +++++++++++++++++++++++++++-----------
 1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/alloc.c b/alloc.c
index d7c3605..03e458b 100644
--- a/alloc.c
+++ b/alloc.c
@@ -18,13 +18,6 @@
 
 #define BLOCKING 1024
 
-#define DEFINE_ALLOCATOR(name, type)				\
-static struct alloc_state name##_state;				\
-void *alloc_##name##_node(void)					\
-{								\
-	return alloc_node(&name##_state, sizeof(type));		\
-}
-
 union any_object {
 	struct object object;
 	struct blob blob;
@@ -55,10 +48,33 @@ static inline void *alloc_node(struct alloc_state *s, size_t node_size)
 	return ret;
 }
 
-DEFINE_ALLOCATOR(blob, struct blob)
-DEFINE_ALLOCATOR(tree, struct tree)
-DEFINE_ALLOCATOR(tag, struct tag)
-DEFINE_ALLOCATOR(object, union any_object)
+static struct alloc_state blob_state;
+void *alloc_blob_node(void)
+{
+	struct blob *b = alloc_node(&blob_state, sizeof(struct blob));
+	return b;
+}
+
+static struct alloc_state tree_state;
+void *alloc_tree_node(void)
+{
+	struct tree *t = alloc_node(&tree_state, sizeof(struct tree));
+	return t;
+}
+
+static struct alloc_state tag_state;
+void *alloc_tag_node(void)
+{
+	struct tag *t = alloc_node(&tag_state, sizeof(struct tag));
+	return t;
+}
+
+static struct alloc_state object_state;
+void *alloc_object_node(void)
+{
+	struct object *obj = alloc_node(&object_state, sizeof(union any_object));
+	return obj;
+}
 
 static struct alloc_state commit_state;
 
-- 
2.0.0.566.gfe3e6b2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 3/8] move setting of object->type to alloc_* functions
  2014-07-13  6:41           ` Jeff King
  2014-07-13  6:41             ` [PATCH v2 1/8] alloc.c: remove the alloc_raw_commit_node() function Jeff King
  2014-07-13  6:41             ` [PATCH v2 2/8] alloc: write out allocator definitions Jeff King
@ 2014-07-13  6:41             ` Jeff King
  2014-07-15 20:12               ` Junio C Hamano
  2014-07-13  6:42             ` [PATCH v2 4/8] parse_object_buffer: do not set object type Jeff King
                               ` (5 subsequent siblings)
  8 siblings, 1 reply; 35+ messages in thread
From: Jeff King @ 2014-07-13  6:41 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Junio C Hamano, GIT Mailing-list

The "struct object" type implements basic object
polymorphism.  Individual instances are allocated as
concrete types (or as a union type that can store any
object), and a "struct object *" can be cast into its real
type after examining its "type" enum.  This means it is
dangerous to have a type field that does not match the
allocation (e.g., setting the type field of a "struct blob"
to "OBJ_COMMIT" would mean that a reader might read past the
allocated memory).

In most of the current code this is not a problem; the first
thing we do after allocating an object is usually to set its
type field by passing it to create_object. However, the
virtual commits we create in merge-recursive.c do not ever
get their type set. This does not seem to have caused
problems in practice, though (presumably because we always
pass around a "struct commit" pointer and never even look at
the type).

We can fix this oversight and also make it harder for future
code to get it wrong by setting the type directly in the
object allocation functions.

This will also make it easier to fix problems with commit
index allocation, as we know that any object allocated by
alloc_commit_node will meet the invariant that an object
with an OBJ_COMMIT type field will have a unique index
number.

Signed-off-by: Jeff King <peff@peff.net>
---
 alloc.c         | 5 +++++
 blob.c          | 2 +-
 builtin/blame.c | 1 -
 commit.c        | 6 ++----
 object.c        | 5 ++---
 object.h        | 2 +-
 tag.c           | 2 +-
 tree.c          | 2 +-
 8 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/alloc.c b/alloc.c
index 03e458b..fd2e32d 100644
--- a/alloc.c
+++ b/alloc.c
@@ -52,6 +52,7 @@ static struct alloc_state blob_state;
 void *alloc_blob_node(void)
 {
 	struct blob *b = alloc_node(&blob_state, sizeof(struct blob));
+	b->object.type = OBJ_BLOB;
 	return b;
 }
 
@@ -59,6 +60,7 @@ static struct alloc_state tree_state;
 void *alloc_tree_node(void)
 {
 	struct tree *t = alloc_node(&tree_state, sizeof(struct tree));
+	t->object.type = OBJ_TREE;
 	return t;
 }
 
@@ -66,6 +68,7 @@ static struct alloc_state tag_state;
 void *alloc_tag_node(void)
 {
 	struct tag *t = alloc_node(&tag_state, sizeof(struct tag));
+	t->object.type = OBJ_TAG;
 	return t;
 }
 
@@ -73,6 +76,7 @@ static struct alloc_state object_state;
 void *alloc_object_node(void)
 {
 	struct object *obj = alloc_node(&object_state, sizeof(union any_object));
+	obj->type = OBJ_NONE;
 	return obj;
 }
 
@@ -82,6 +86,7 @@ void *alloc_commit_node(void)
 {
 	static int commit_count;
 	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
+	c->object.type = OBJ_COMMIT;
 	c->index = commit_count++;
 	return c;
 }
diff --git a/blob.c b/blob.c
index ae320bd..5720a38 100644
--- a/blob.c
+++ b/blob.c
@@ -7,7 +7,7 @@ struct blob *lookup_blob(const unsigned char *sha1)
 {
 	struct object *obj = lookup_object(sha1);
 	if (!obj)
-		return create_object(sha1, OBJ_BLOB, alloc_blob_node());
+		return create_object(sha1, alloc_blob_node());
 	if (!obj->type)
 		obj->type = OBJ_BLOB;
 	if (obj->type != OBJ_BLOB) {
diff --git a/builtin/blame.c b/builtin/blame.c
index d3b256e..8f3e311 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2287,7 +2287,6 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
 	commit = alloc_commit_node();
 	commit->object.parsed = 1;
 	commit->date = now;
-	commit->object.type = OBJ_COMMIT;
 	parent_tail = &commit->parents;
 
 	if (!resolve_ref_unsafe("HEAD", head_sha1, 1, NULL))
diff --git a/commit.c b/commit.c
index acb74b5..c1815c8 100644
--- a/commit.c
+++ b/commit.c
@@ -61,10 +61,8 @@ struct commit *lookup_commit_or_die(const unsigned char *sha1, const char *ref_n
 struct commit *lookup_commit(const unsigned char *sha1)
 {
 	struct object *obj = lookup_object(sha1);
-	if (!obj) {
-		struct commit *c = alloc_commit_node();
-		return create_object(sha1, OBJ_COMMIT, c);
-	}
+	if (!obj)
+		return create_object(sha1, alloc_commit_node());
 	if (!obj->type)
 		obj->type = OBJ_COMMIT;
 	return check_commit(obj, sha1, 0);
diff --git a/object.c b/object.c
index 9c31e9a..a950b85 100644
--- a/object.c
+++ b/object.c
@@ -141,13 +141,12 @@ static void grow_object_hash(void)
 	obj_hash_size = new_hash_size;
 }
 
-void *create_object(const unsigned char *sha1, int type, void *o)
+void *create_object(const unsigned char *sha1, void *o)
 {
 	struct object *obj = o;
 
 	obj->parsed = 0;
 	obj->used = 0;
-	obj->type = type;
 	obj->flags = 0;
 	hashcpy(obj->sha1, sha1);
 
@@ -163,7 +162,7 @@ struct object *lookup_unknown_object(const unsigned char *sha1)
 {
 	struct object *obj = lookup_object(sha1);
 	if (!obj)
-		obj = create_object(sha1, OBJ_NONE, alloc_object_node());
+		obj = create_object(sha1, alloc_object_node());
 	return obj;
 }
 
diff --git a/object.h b/object.h
index 6e12f2c..8020ace 100644
--- a/object.h
+++ b/object.h
@@ -79,7 +79,7 @@ extern struct object *get_indexed_object(unsigned int);
  */
 struct object *lookup_object(const unsigned char *sha1);
 
-extern void *create_object(const unsigned char *sha1, int type, void *obj);
+extern void *create_object(const unsigned char *sha1, void *obj);
 
 /*
  * Returns the object, having parsed it to find out what it is.
diff --git a/tag.c b/tag.c
index 7b07921..79552c7 100644
--- a/tag.c
+++ b/tag.c
@@ -40,7 +40,7 @@ struct tag *lookup_tag(const unsigned char *sha1)
 {
 	struct object *obj = lookup_object(sha1);
 	if (!obj)
-		return create_object(sha1, OBJ_TAG, alloc_tag_node());
+		return create_object(sha1, alloc_tag_node());
 	if (!obj->type)
 		obj->type = OBJ_TAG;
 	if (obj->type != OBJ_TAG) {
diff --git a/tree.c b/tree.c
index c8c49d7..ed66575 100644
--- a/tree.c
+++ b/tree.c
@@ -183,7 +183,7 @@ struct tree *lookup_tree(const unsigned char *sha1)
 {
 	struct object *obj = lookup_object(sha1);
 	if (!obj)
-		return create_object(sha1, OBJ_TREE, alloc_tree_node());
+		return create_object(sha1, alloc_tree_node());
 	if (!obj->type)
 		obj->type = OBJ_TREE;
 	if (obj->type != OBJ_TREE) {
-- 
2.0.0.566.gfe3e6b2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 4/8] parse_object_buffer: do not set object type
  2014-07-13  6:41           ` Jeff King
                               ` (2 preceding siblings ...)
  2014-07-13  6:41             ` [PATCH v2 3/8] move setting of object->type to alloc_* functions Jeff King
@ 2014-07-13  6:42             ` Jeff King
  2014-07-13  6:42             ` [PATCH v2 5/8] add object_as_type helper for casting objects Jeff King
                               ` (4 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2014-07-13  6:42 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Junio C Hamano, GIT Mailing-list

The only way that "obj" can be non-NULL is if it came from
one of the lookup_* functions. These functions always ensure
that the object has the expected type (and return NULL
otherwise), so there is no need for us to set the type.

Signed-off-by: Jeff King <peff@peff.net>
---
 object.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/object.c b/object.c
index a950b85..472aa8d 100644
--- a/object.c
+++ b/object.c
@@ -213,8 +213,6 @@ struct object *parse_object_buffer(const unsigned char *sha1, enum object_type t
 		warning("object %s has unknown type id %d", sha1_to_hex(sha1), type);
 		obj = NULL;
 	}
-	if (obj && obj->type == OBJ_NONE)
-		obj->type = type;
 	return obj;
 }
 
-- 
2.0.0.566.gfe3e6b2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 5/8] add object_as_type helper for casting objects
  2014-07-13  6:41           ` Jeff King
                               ` (3 preceding siblings ...)
  2014-07-13  6:42             ` [PATCH v2 4/8] parse_object_buffer: do not set object type Jeff King
@ 2014-07-13  6:42             ` Jeff King
  2014-07-13  6:42             ` [PATCH v2 6/8] alloc: factor out commit index Jeff King
                               ` (3 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2014-07-13  6:42 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Junio C Hamano, GIT Mailing-list

When we call lookup_commit, lookup_tree, etc, the logic goes
something like:

  1. Look for an existing object struct. If we don't have
     one, allocate and return a new one.

  2. Double check that any object we have is the expected
     type (and complain and return NULL otherwise).

  3. Convert an object with type OBJ_NONE (from a prior
     call to lookup_unknown_object) to the expected type.

We can encapsulate steps 2 and 3 in a helper function which
checks whether we have the expected object type, converts
OBJ_NONE as appropriate, and returns the object.

Not only does this shorten the code, but it also provides
one central location for converting OBJ_NONE objects into
objects of other types. Future patches will use that to
enforce type-specific invariants.

Since this is a refactoring, we would want it to behave
exactly as the current code. It takes a little reasoning to
see that this is the case:

  - for lookup_{commit,tree,etc} functions, we are just
    pulling steps 2 and 3 into a function that does the same
    thing.

  - for the call in peel_object, we currently only do step 3
    (but we want to consolidate it with the others, as
    mentioned above). However, step 2 is a noop here, as the
    surrounding conditional makes sure we have OBJ_NONE
    (which we want to keep to avoid an extraneous call to
    sha1_object_info).

  - for the call in lookup_commit_reference_gently, we are
    currently doing step 2 but not step 3. However, step 3
    is a noop here. The object we got will have just come
    from deref_tag, which must have figured out the type for
    each object in order to know when to stop peeling.
    Therefore the type will never be OBJ_NONE.

Signed-off-by: Jeff King <peff@peff.net>
---
 blob.c   |  9 +--------
 commit.c | 19 ++-----------------
 object.c | 17 +++++++++++++++++
 object.h |  2 ++
 refs.c   |  3 +--
 tag.c    |  9 +--------
 tree.c   |  9 +--------
 7 files changed, 25 insertions(+), 43 deletions(-)

diff --git a/blob.c b/blob.c
index 5720a38..1fcb8e4 100644
--- a/blob.c
+++ b/blob.c
@@ -8,14 +8,7 @@ struct blob *lookup_blob(const unsigned char *sha1)
 	struct object *obj = lookup_object(sha1);
 	if (!obj)
 		return create_object(sha1, alloc_blob_node());
-	if (!obj->type)
-		obj->type = OBJ_BLOB;
-	if (obj->type != OBJ_BLOB) {
-		error("Object %s is a %s, not a blob",
-		      sha1_to_hex(sha1), typename(obj->type));
-		return NULL;
-	}
-	return (struct blob *) obj;
+	return object_as_type(obj, OBJ_BLOB, 0);
 }
 
 int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size)
diff --git a/commit.c b/commit.c
index c1815c8..08816d3 100644
--- a/commit.c
+++ b/commit.c
@@ -18,19 +18,6 @@ int save_commit_buffer = 1;
 
 const char *commit_type = "commit";
 
-static struct commit *check_commit(struct object *obj,
-				   const unsigned char *sha1,
-				   int quiet)
-{
-	if (obj->type != OBJ_COMMIT) {
-		if (!quiet)
-			error("Object %s is a %s, not a commit",
-			      sha1_to_hex(sha1), typename(obj->type));
-		return NULL;
-	}
-	return (struct commit *) obj;
-}
-
 struct commit *lookup_commit_reference_gently(const unsigned char *sha1,
 					      int quiet)
 {
@@ -38,7 +25,7 @@ struct commit *lookup_commit_reference_gently(const unsigned char *sha1,
 
 	if (!obj)
 		return NULL;
-	return check_commit(obj, sha1, quiet);
+	return object_as_type(obj, OBJ_COMMIT, quiet);
 }
 
 struct commit *lookup_commit_reference(const unsigned char *sha1)
@@ -63,9 +50,7 @@ struct commit *lookup_commit(const unsigned char *sha1)
 	struct object *obj = lookup_object(sha1);
 	if (!obj)
 		return create_object(sha1, alloc_commit_node());
-	if (!obj->type)
-		obj->type = OBJ_COMMIT;
-	return check_commit(obj, sha1, 0);
+	return object_as_type(obj, OBJ_COMMIT, 0);
 }
 
 struct commit *lookup_commit_reference_by_name(const char *name)
diff --git a/object.c b/object.c
index 472aa8d..b2319f6 100644
--- a/object.c
+++ b/object.c
@@ -158,6 +158,23 @@ void *create_object(const unsigned char *sha1, void *o)
 	return obj;
 }
 
+void *object_as_type(struct object *obj, enum object_type type, int quiet)
+{
+	if (obj->type == type)
+		return obj;
+	else if (obj->type == OBJ_NONE) {
+		obj->type = type;
+		return obj;
+	}
+	else {
+		if (!quiet)
+			error("object %s is a %s, not a %s",
+			      sha1_to_hex(obj->sha1),
+			      typename(obj->type), typename(type));
+		return NULL;
+	}
+}
+
 struct object *lookup_unknown_object(const unsigned char *sha1)
 {
 	struct object *obj = lookup_object(sha1);
diff --git a/object.h b/object.h
index 8020ace..5e8d8ee 100644
--- a/object.h
+++ b/object.h
@@ -81,6 +81,8 @@ struct object *lookup_object(const unsigned char *sha1);
 
 extern void *create_object(const unsigned char *sha1, void *obj);
 
+void *object_as_type(struct object *obj, enum object_type type, int quiet);
+
 /*
  * Returns the object, having parsed it to find out what it is.
  *
diff --git a/refs.c b/refs.c
index 82e4842..7f8960d 100644
--- a/refs.c
+++ b/refs.c
@@ -1730,9 +1730,8 @@ static enum peel_status peel_object(const unsigned char *name, unsigned char *sh
 
 	if (o->type == OBJ_NONE) {
 		int type = sha1_object_info(name, NULL);
-		if (type < 0)
+		if (type < 0 || !object_as_type(o, type, 0))
 			return PEEL_INVALID;
-		o->type = type;
 	}
 
 	if (o->type != OBJ_TAG)
diff --git a/tag.c b/tag.c
index 79552c7..82d841b 100644
--- a/tag.c
+++ b/tag.c
@@ -41,14 +41,7 @@ struct tag *lookup_tag(const unsigned char *sha1)
 	struct object *obj = lookup_object(sha1);
 	if (!obj)
 		return create_object(sha1, alloc_tag_node());
-	if (!obj->type)
-		obj->type = OBJ_TAG;
-	if (obj->type != OBJ_TAG) {
-		error("Object %s is a %s, not a tag",
-		      sha1_to_hex(sha1), typename(obj->type));
-		return NULL;
-	}
-	return (struct tag *) obj;
+	return object_as_type(obj, OBJ_TAG, 0);
 }
 
 static unsigned long parse_tag_date(const char *buf, const char *tail)
diff --git a/tree.c b/tree.c
index ed66575..bb02c1c 100644
--- a/tree.c
+++ b/tree.c
@@ -184,14 +184,7 @@ struct tree *lookup_tree(const unsigned char *sha1)
 	struct object *obj = lookup_object(sha1);
 	if (!obj)
 		return create_object(sha1, alloc_tree_node());
-	if (!obj->type)
-		obj->type = OBJ_TREE;
-	if (obj->type != OBJ_TREE) {
-		error("Object %s is a %s, not a tree",
-		      sha1_to_hex(sha1), typename(obj->type));
-		return NULL;
-	}
-	return (struct tree *) obj;
+	return object_as_type(obj, OBJ_TREE, 0);
 }
 
 int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size)
-- 
2.0.0.566.gfe3e6b2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 6/8] alloc: factor out commit index
  2014-07-13  6:41           ` Jeff King
                               ` (4 preceding siblings ...)
  2014-07-13  6:42             ` [PATCH v2 5/8] add object_as_type helper for casting objects Jeff King
@ 2014-07-13  6:42             ` Jeff King
  2014-07-13  6:42             ` [PATCH v2 7/8] object_as_type: set " Jeff King
                               ` (2 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2014-07-13  6:42 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Junio C Hamano, GIT Mailing-list

We keep a static counter to set the commit index on newly
allocated objects. However, since we also need to set the
index on any_objects which are converted to commits, let's
make the counter available as a public function.

While we're moving it, let's make sure the counter is
allocated as an unsigned integer to match the index field in
"struct commit".

Signed-off-by: Jeff King <peff@peff.net>
---
 alloc.c | 9 +++++++--
 cache.h | 1 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/alloc.c b/alloc.c
index fd2e32d..12afadf 100644
--- a/alloc.c
+++ b/alloc.c
@@ -82,12 +82,17 @@ void *alloc_object_node(void)
 
 static struct alloc_state commit_state;
 
+unsigned int alloc_commit_index(void)
+{
+	static unsigned int count;
+	return count++;
+}
+
 void *alloc_commit_node(void)
 {
-	static int commit_count;
 	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
 	c->object.type = OBJ_COMMIT;
-	c->index = commit_count++;
+	c->index = alloc_commit_index();
 	return c;
 }
 
diff --git a/cache.h b/cache.h
index 44aa439..ba68e11 100644
--- a/cache.h
+++ b/cache.h
@@ -1380,6 +1380,7 @@ extern void *alloc_commit_node(void);
 extern void *alloc_tag_node(void);
 extern void *alloc_object_node(void);
 extern void alloc_report(void);
+extern unsigned int alloc_commit_index(void);
 
 /* trace.c */
 __attribute__((format (printf, 1, 2)))
-- 
2.0.0.566.gfe3e6b2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 7/8] object_as_type: set commit index
  2014-07-13  6:41           ` Jeff King
                               ` (5 preceding siblings ...)
  2014-07-13  6:42             ` [PATCH v2 6/8] alloc: factor out commit index Jeff King
@ 2014-07-13  6:42             ` Jeff King
  2014-07-13  6:42             ` [PATCH v2 8/8] diff-tree: avoid lookup_unknown_object Jeff King
  2014-07-13 19:27             ` [PATCH 2/7] move setting of object->type to alloc_* functions Ramsay Jones
  8 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2014-07-13  6:42 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Junio C Hamano, GIT Mailing-list

The point of the "index" field of struct commit is that
every allocated commit would have one. It is supposed to be
an invariant that whenever object->type is set to
OBJ_COMMIT, we have a unique index.

Commit 969eba6 (commit: push commit_index update into
alloc_commit_node, 2014-06-10) covered this case for
newly-allocated commits. However, we may also allocate an
"unknown" object via lookup_unknown_object, and only later
convert it to a commit. We must make sure that we set the
commit index when we switch the type field.

Signed-off-by: Jeff King <peff@peff.net>
---
 object.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/object.c b/object.c
index b2319f6..69fbbbf 100644
--- a/object.c
+++ b/object.c
@@ -163,6 +163,8 @@ void *object_as_type(struct object *obj, enum object_type type, int quiet)
 	if (obj->type == type)
 		return obj;
 	else if (obj->type == OBJ_NONE) {
+		if (type == OBJ_COMMIT)
+			((struct commit *)obj)->index = alloc_commit_index();
 		obj->type = type;
 		return obj;
 	}
-- 
2.0.0.566.gfe3e6b2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 8/8] diff-tree: avoid lookup_unknown_object
  2014-07-13  6:41           ` Jeff King
                               ` (6 preceding siblings ...)
  2014-07-13  6:42             ` [PATCH v2 7/8] object_as_type: set " Jeff King
@ 2014-07-13  6:42             ` Jeff King
  2014-07-13 19:27             ` [PATCH 2/7] move setting of object->type to alloc_* functions Ramsay Jones
  8 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2014-07-13  6:42 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Junio C Hamano, GIT Mailing-list

We generally want to avoid lookup_unknown_object, because it
results in allocating more memory for the object than may be
strictly necessary.

In this case, it is used to check whether we have an
already-parsed object before calling parse_object, to save
us from reading the object from disk. Using lookup_object
would be fine for that purpose, but we can take it a step
further. Since this code was written, parse_object already
learned the "check lookup_object" optimization, so we can
simply call parse_object directly.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/diff-tree.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/builtin/diff-tree.c b/builtin/diff-tree.c
index ce0e019..1c4ad62 100644
--- a/builtin/diff-tree.c
+++ b/builtin/diff-tree.c
@@ -68,9 +68,7 @@ static int diff_tree_stdin(char *line)
 	line[len-1] = 0;
 	if (get_sha1_hex(line, sha1))
 		return -1;
-	obj = lookup_unknown_object(sha1);
-	if (!obj || !obj->parsed)
-		obj = parse_object(sha1);
+	obj = parse_object(sha1);
 	if (!obj)
 		return -1;
 	if (obj->type == OBJ_COMMIT)
-- 
2.0.0.566.gfe3e6b2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/7] move setting of object->type to alloc_* functions
  2014-07-13  6:41           ` Jeff King
                               ` (7 preceding siblings ...)
  2014-07-13  6:42             ` [PATCH v2 8/8] diff-tree: avoid lookup_unknown_object Jeff King
@ 2014-07-13 19:27             ` Ramsay Jones
  2014-07-14  5:57               ` Jeff King
  8 siblings, 1 reply; 35+ messages in thread
From: Ramsay Jones @ 2014-07-13 19:27 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, GIT Mailing-list

On 13/07/14 07:41, Jeff King wrote:
> On Sat, Jul 12, 2014 at 02:05:39PM -0400, Jeff King wrote:
> 
>>> I don't particularly like 'flag' here. (not a massive dislike, mind you:)
>>>
>>> Perhaps: flag->object_type, type->node_type?
>>> Or, if that's too verbose, maybe just: flag->type, type->node?
>>
>> Me either, but as you noticed, type was taken. Your suggestions seem
>> fine. We could also just do away with the macro as discussed earlier (we
>> already do in the commit_node case, anyway...).
> 
> Thinking on this more, writing out the definitions is the only sane
> thing to do here, now that alloc_commit_node does not use the macro.
> Otherwise you are inviting people to modify the macro, but fail to
> notice that the commit allocator also needs updating.

Hmm, well I could argue that using the macro for all allocators, apart
from alloc_commit_node(), clearly shows which allocator is the odd-man
out (and conversely, that all others are the same)! :-P

No, I don't think this is a telling advantage; I don't think it makes
that much difference. (six of one, half-a-dozen of the other).

BTW, I tested the previous series on Linux 32-bit, Cygwin 32-bit, MinGW
32-bit and Cygwin 64-bit. (I can't test on Linux 64-bit, since I can't
get Linux installed on my new laptop :( ). Admittedly, the testing on
MinGW and Cygwin was only fairly light (it takes *hours* to run the full
testsuite, and I just don't have the time).

I was slightly concerned, when reading through this new series, that the
alloc_node() function may no longer be inlined in the new allocators.
However, I have just tested on Linux (only using gcc this time), and it
was just fine. I will test the new series on the above systems later
(probably tomorrow) but don't expect to find any problems.

> 
> Here's a re-roll. The interesting bit is the addition of the second
> patch (but the rest needed to be rebased on top).

Yep, this looks good. Thanks!

> 
>   [1/8]: alloc.c: remove the alloc_raw_commit_node() function
>   [2/8]: alloc: write out allocator definitions
>   [3/8]: move setting of object->type to alloc_* functions
>   [4/8]: parse_object_buffer: do not set object type
>   [5/8]: add object_as_type helper for casting objects
>   [6/8]: alloc: factor out commit index
>   [7/8]: object_as_type: set commit index
>   [8/8]: diff-tree: avoid lookup_unknown_object

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/7] move setting of object->type to alloc_* functions
  2014-07-13 19:27             ` [PATCH 2/7] move setting of object->type to alloc_* functions Ramsay Jones
@ 2014-07-14  5:57               ` Jeff King
  2014-07-14 11:03                 ` Ramsay Jones
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff King @ 2014-07-14  5:57 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Junio C Hamano, GIT Mailing-list

On Sun, Jul 13, 2014 at 08:27:51PM +0100, Ramsay Jones wrote:

> > Thinking on this more, writing out the definitions is the only sane
> > thing to do here, now that alloc_commit_node does not use the macro.
> > Otherwise you are inviting people to modify the macro, but fail to
> > notice that the commit allocator also needs updating.
> 
> Hmm, well I could argue that using the macro for all allocators, apart
> from alloc_commit_node(), clearly shows which allocator is the odd-man
> out (and conversely, that all others are the same)! :-P
> 
> No, I don't think this is a telling advantage; I don't think it makes
> that much difference. (six of one, half-a-dozen of the other).

Yeah, I agree with your final statement in parentheses. I am OK with it
either way (but I have a slight preference for what I posted).

> I was slightly concerned, when reading through this new series, that the
> alloc_node() function may no longer be inlined in the new allocators.
> However, I have just tested on Linux (only using gcc this time), and it
> was just fine. I will test the new series on the above systems later
> (probably tomorrow) but don't expect to find any problems.

That should not be due to my patches (which are just expanding macros),
but rather to your 1/8, right?

I do not know that it matters that much anyway. Yes, we allocate a lot
of objects in some workloads. But I think it is not so tight a loop that
the extra function call is going to kill us (and we tend to _read_ the
allocated objects much more than we allocate them).

> > Here's a re-roll. The interesting bit is the addition of the second
> > patch (but the rest needed to be rebased on top).
> 
> Yep, this looks good. Thanks!

Thanks for reviewing, as usual.

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/7] move setting of object->type to alloc_* functions
  2014-07-14  5:57               ` Jeff King
@ 2014-07-14 11:03                 ` Ramsay Jones
  0 siblings, 0 replies; 35+ messages in thread
From: Ramsay Jones @ 2014-07-14 11:03 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, GIT Mailing-list

On 14/07/14 06:57, Jeff King wrote:
> On Sun, Jul 13, 2014 at 08:27:51PM +0100, Ramsay Jones wrote:
> 
>>> Thinking on this more, writing out the definitions is the only sane
>>> thing to do here, now that alloc_commit_node does not use the macro.
>>> Otherwise you are inviting people to modify the macro, but fail to
>>> notice that the commit allocator also needs updating.
>>
>> Hmm, well I could argue that using the macro for all allocators, apart
>> from alloc_commit_node(), clearly shows which allocator is the odd-man
>> out (and conversely, that all others are the same)! :-P
>>
>> No, I don't think this is a telling advantage; I don't think it makes
>> that much difference. (six of one, half-a-dozen of the other).
> 
> Yeah, I agree with your final statement in parentheses. I am OK with it
> either way (but I have a slight preference for what I posted).

Yep, once again, this looks good to me. :-)

>> I was slightly concerned, when reading through this new series, that the
>> alloc_node() function may no longer be inlined in the new allocators.
>> However, I have just tested on Linux (only using gcc this time), and it
>> was just fine. I will test the new series on the above systems later
>> (probably tomorrow) but don't expect to find any problems.
> 
> That should not be due to my patches (which are just expanding macros),
> but rather to your 1/8, right?

Ah, no. Over the years, I've used many compilers which would happily
inline this:

    void *alloc_blob_node(void)
    {
    	return alloc_node(&blob_state, sizeof(struct blob));
    }

but not this:

    void *alloc_blob_node(void)
    {
    	struct blob *b = alloc_node(&blob_state, sizeof(struct blob));
    	return b;
    }

Admittedly, it has been many years since I've had to use such a compiler,
but old habits die hard and I always check! (The compiler is not obliged
to inline the code anyway).

> I do not know that it matters that much anyway. Yes, we allocate a lot
> of objects in some workloads. But I think it is not so tight a loop that
> the extra function call is going to kill us (and we tend to _read_ the
> allocated objects much more than we allocate them).

Yes, my testing indicates that it wouldn't be noticeable if it wasn't
inlined, at least on my git.git repo. (The largest repo I have is an
ffmpeg repo, which is only about twice the size of git.git; but I used
git.git for the timing/profiling tests). So, I'm always slightly nervous
about affecting users of really large repos.

I don't think it will be an issue either way (famous last words!). ;-)

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/8] alloc.c: remove the alloc_raw_commit_node() function
  2014-07-13  6:41             ` [PATCH v2 1/8] alloc.c: remove the alloc_raw_commit_node() function Jeff King
@ 2014-07-15 20:06               ` Junio C Hamano
  0 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2014-07-15 20:06 UTC (permalink / raw)
  To: Ramsay Jones, Jeff King; +Cc: GIT Mailing-list

Jeff King <peff@peff.net> writes:

>  #define DEFINE_ALLOCATOR(name, type)				\
> +static struct alloc_state name##_state;				\
>  void *alloc_##name##_node(void)					\
>  {								\
> +	return alloc_node(&name##_state, sizeof(type));		\
>  }

This is really nice.  Thanks.

> +static struct alloc_state commit_state;
> +
>  void *alloc_commit_node(void)
>  {
>  	static int commit_count;
> -	struct commit *c = alloc_raw_commit_node();
> +	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
>  	c->index = commit_count++;
>  	return c;
>  }

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 2/8] alloc: write out allocator definitions
  2014-07-13  6:41             ` [PATCH v2 2/8] alloc: write out allocator definitions Jeff King
@ 2014-07-15 20:11               ` Junio C Hamano
  0 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2014-07-15 20:11 UTC (permalink / raw)
  To: Jeff King; +Cc: Ramsay Jones, GIT Mailing-list

Jeff King <peff@peff.net> writes:

> Because the allocator functions for tree, blobs, etc are all
> very similar, we originally used a macro to avoid repeating
> ourselves. Since the prior commit, though, the heavy lifting
> is done by an inline helper function.  The macro does still
> save us a few lines, but at some readability cost.  It
> obfuscates the function definitions (and makes them hard to
> find via grep).
>
> Much worse, though, is the fact that it isn't used
> consistently for all allocators. Somebody coming later may
> be tempted to modify DEFINE_ALLOCATOR, but they would miss
> alloc_commit_node, which is treated specially.
>
> Let's just drop the macro and write everything out
> explicitly.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  alloc.c | 38 +++++++++++++++++++++++++++-----------
>  1 file changed, 27 insertions(+), 11 deletions(-)
> ...
> +static struct alloc_state blob_state;
> +void *alloc_blob_node(void)
> +{
> +	struct blob *b = alloc_node(&blob_state, sizeof(struct blob));
> +	return b;
> +}

I think the change makes the code nicer overall, but it looks
strange to see a (void *) that was returned by alloc_node()
implicitly casted to (struct blob *) by assignment to b and then
again implicitly casted to (void *) by it being the return type of
the function.

Is there a reason why it is not like so?

	void *alloc_blob_node(void)
        {
                return alloc_node(&blob_state, sizeof(struct blob));
	}

I may have missed previous discussion on it, in which case I'd
apologize in advance.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 3/8] move setting of object->type to alloc_* functions
  2014-07-13  6:41             ` [PATCH v2 3/8] move setting of object->type to alloc_* functions Jeff King
@ 2014-07-15 20:12               ` Junio C Hamano
  0 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2014-07-15 20:12 UTC (permalink / raw)
  To: Jeff King; +Cc: Ramsay Jones, GIT Mailing-list

Jeff King <peff@peff.net> writes:

> diff --git a/alloc.c b/alloc.c
> index 03e458b..fd2e32d 100644
> --- a/alloc.c
> +++ b/alloc.c
> @@ -52,6 +52,7 @@ static struct alloc_state blob_state;
>  void *alloc_blob_node(void)
>  {
>  	struct blob *b = alloc_node(&blob_state, sizeof(struct blob));
> +	b->object.type = OBJ_BLOB;
>  	return b;
>  }

Ahh, please disregard my question on 2/8 ;-)

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2014-07-15 20:13 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-10 23:59 [PATCH v3 2/2] alloc.c: remove the redundant commit_count variable Ramsay Jones
2014-07-11  0:30 ` Jeff King
2014-07-11  0:59   ` Ramsay Jones
2014-07-11  8:32     ` Jeff King
2014-07-11  9:41       ` Ramsay Jones
2014-07-11  8:41   ` [PATCH 0/7] ensure index is set for all OBJ_COMMIT objects variable Jeff King
2014-07-11  8:42     ` [PATCH 1/7] alloc.c: remove the alloc_raw_commit_node() function Jeff King
2014-07-11  8:46     ` [PATCH 2/7] move setting of object->type to alloc_* functions Jeff King
2014-07-12 14:44       ` Ramsay Jones
2014-07-12 18:05         ` Jeff King
2014-07-13  6:41           ` Jeff King
2014-07-13  6:41             ` [PATCH v2 1/8] alloc.c: remove the alloc_raw_commit_node() function Jeff King
2014-07-15 20:06               ` Junio C Hamano
2014-07-13  6:41             ` [PATCH v2 2/8] alloc: write out allocator definitions Jeff King
2014-07-15 20:11               ` Junio C Hamano
2014-07-13  6:41             ` [PATCH v2 3/8] move setting of object->type to alloc_* functions Jeff King
2014-07-15 20:12               ` Junio C Hamano
2014-07-13  6:42             ` [PATCH v2 4/8] parse_object_buffer: do not set object type Jeff King
2014-07-13  6:42             ` [PATCH v2 5/8] add object_as_type helper for casting objects Jeff King
2014-07-13  6:42             ` [PATCH v2 6/8] alloc: factor out commit index Jeff King
2014-07-13  6:42             ` [PATCH v2 7/8] object_as_type: set " Jeff King
2014-07-13  6:42             ` [PATCH v2 8/8] diff-tree: avoid lookup_unknown_object Jeff King
2014-07-13 19:27             ` [PATCH 2/7] move setting of object->type to alloc_* functions Ramsay Jones
2014-07-14  5:57               ` Jeff King
2014-07-14 11:03                 ` Ramsay Jones
2014-07-12 14:55       ` Ramsay Jones
2014-07-12 18:07         ` Jeff King
2014-07-11  8:46     ` [PATCH 3/7] parse_object_buffer: do not set object type Jeff King
2014-07-11  8:48     ` [PATCH 4/7] add object_as_type helper for casting objects Jeff King
2014-07-11 10:45       ` Ramsay Jones
2014-07-11 16:59         ` Jeff King
2014-07-11  8:48     ` [PATCH 5/7] alloc: factor out commit index Jeff King
2014-07-11  8:49     ` [PATCH 6/7] object_as_type: set " Jeff King
2014-07-11  8:50     ` [PATCH 7/7] diff-tree: avoid lookup_unknown_object Jeff King
2014-07-11 10:31     ` [PATCH 0/7] ensure index is set for all OBJ_COMMIT objects variable Ramsay Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).