git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Fix type-punning issues
@ 2009-05-12  1:17 Dan McGee
  2009-05-12  7:57 ` Junio C Hamano
  2009-05-12  8:13 ` [PATCH] Fix type-punning issues Johannes Schindelin
  0 siblings, 2 replies; 9+ messages in thread
From: Dan McGee @ 2009-05-12  1:17 UTC (permalink / raw)
  To: git; +Cc: Dan McGee

In these two places we are casting part of our unsigned char sha1 array into
an unsigned int, which violates GCCs strict-aliasing rules (and probably
other compilers). In addition, we had two hashing functions defined in
object.c. Keep the one function that is "correct" and adopt the other ones
to fit.

decorate.c: In function ‘hash_obj’:
decorate.c:11: warning: dereferencing type-punned pointer will break
strict-aliasing rules

Signed-off-by: Dan McGee <dpmcgee@gmail.com>
---

Let me know if this patch is completely off-base. A quick glance at the
generated assembly seems to show this isn't much of a performance hit,
especially given that these functions are inlined anyway. As an FYI the above
warning came when compiling with GCC 4.4; I'm not sure if <4.4 showed this
behavior.

-Dan

 decorate.c |    7 ++++---
 object.c   |   20 +++++++-------------
 2 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/decorate.c b/decorate.c
index 82d9e22..bf83750 100644
--- a/decorate.c
+++ b/decorate.c
@@ -8,7 +8,8 @@
 
 static unsigned int hash_obj(const struct object *obj, unsigned int n)
 {
-	unsigned int hash = *(unsigned int *)obj->sha1;
+	unsigned int hash;
+	memcpy(&hash, obj->sha1, sizeof(unsigned int));
 	return hash % n;
 }
 
@@ -16,7 +17,7 @@ static void *insert_decoration(struct decoration *n, const struct object *base,
 {
 	int size = n->size;
 	struct object_decoration *hash = n->hash;
-	int j = hash_obj(base, size);
+	unsigned int j = hash_obj(base, size);
 
 	while (hash[j].base) {
 		if (hash[j].base == base) {
@@ -68,7 +69,7 @@ void *add_decoration(struct decoration *n, const struct object *obj,
 /* Lookup a decoration pointer */
 void *lookup_decoration(struct decoration *n, const struct object *obj)
 {
-	int j;
+	unsigned int j;
 
 	/* nothing to lookup */
 	if (!n->size)
diff --git a/object.c b/object.c
index 7e6a92c..96ef32d 100644
--- a/object.c
+++ b/object.c
@@ -43,15 +43,16 @@ int type_from_string(const char *str)
 	die("invalid object type \"%s\"", str);
 }
 
-static unsigned int hash_obj(struct object *obj, unsigned int n)
+static unsigned int hash_char(const unsigned char *sha1, unsigned int n)
 {
-	unsigned int hash = *(unsigned int *)obj->sha1;
-	return hash % n;
+	unsigned int i;
+	memcpy(&i, sha1, sizeof(unsigned int));
+	return (int)(i % n);
 }
 
 static void insert_obj_hash(struct object *obj, struct object **hash, unsigned int size)
 {
-	int j = hash_obj(obj, size);
+	unsigned int j = hash_char(obj->sha1, size);
 
 	while (hash[j]) {
 		j++;
@@ -61,22 +62,15 @@ static void insert_obj_hash(struct object *obj, struct object **hash, unsigned i
 	hash[j] = obj;
 }
 
-static int hashtable_index(const unsigned char *sha1)
-{
-	unsigned int i;
-	memcpy(&i, sha1, sizeof(unsigned int));
-	return (int)(i % obj_hash_size);
-}
-
 struct object *lookup_object(const unsigned char *sha1)
 {
-	int i;
+	unsigned int i;
 	struct object *obj;
 
 	if (!obj_hash)
 		return NULL;
 
-	i = hashtable_index(sha1);
+	i = hash_char(sha1, obj_hash_size);
 	while ((obj = obj_hash[i]) != NULL) {
 		if (!hashcmp(sha1, obj->sha1))
 			break;
-- 
1.6.2.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] Fix type-punning issues
  2009-05-12  1:17 [PATCH] Fix type-punning issues Dan McGee
@ 2009-05-12  7:57 ` Junio C Hamano
  2009-05-19  4:32   ` Dan McGee
  2009-05-12  8:13 ` [PATCH] Fix type-punning issues Johannes Schindelin
  1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2009-05-12  7:57 UTC (permalink / raw)
  To: Dan McGee; +Cc: git

Dan McGee <dpmcgee@gmail.com> writes:

> In these two places we are casting part of our unsigned char sha1 array into
> an unsigned int, which violates GCCs strict-aliasing rules (and probably
> other compilers).

Yay.

>  static unsigned int hash_obj(const struct object *obj, unsigned int n)
>  {
> -	unsigned int hash = *(unsigned int *)obj->sha1;
> +	unsigned int hash;
> +	memcpy(&hash, obj->sha1, sizeof(unsigned int));
>  	return hash % n;
>  }

I noticed this breakage when I borrowed a friend's FC11 preview (as I
still do not have a replacement machine X-<), but didn't manage to spend
enough time to fix it myself.  I was hoping a way to tell the compiler
that this particular pointer usage is Ok in a less hacky way that does not
upset older compilers, without resorting to low-level memcpy.  But your
version to use memcpy() is a literal translation of what the code does,
and I think it is an acceptable solution.

> @@ -16,7 +17,7 @@ static void *insert_decoration(struct decoration *n, const struct object *base,
>  {
>  	int size = n->size;
>  	struct object_decoration *hash = n->hash;
> -	int j = hash_obj(base, size);
> +	unsigned int j = hash_obj(base, size);

These type changes should be Ok, but I would have preferred them to be a
separate patch.  From the context, you cannot see if the function in
places outside the patch context uses "j" for purposes other than holding
the return value of hash_obj() that requires it to be able to hold a
netagive value to make sure that this conversion is correct; on the other
hand, we know n->size is a sane small value that unsigned vs signed int
does not matter, so in that sense making this change in the same patch
makes it not about fixing pointer aliasing warning.

But the code here is correct (I read outside the context).

> @@ -68,7 +69,7 @@ void *add_decoration(struct decoration *n, const struct object *obj,
>  /* Lookup a decoration pointer */
>  void *lookup_decoration(struct decoration *n, const struct object *obj)
>  {
> -	int j;
> +	unsigned int j;

Same here.

> diff --git a/object.c b/object.c
> index 7e6a92c..96ef32d 100644
> --- a/object.c
> +++ b/object.c
> @@ -43,15 +43,16 @@ int type_from_string(const char *str)
>  	die("invalid object type \"%s\"", str);
>  }
>  
> -static unsigned int hash_obj(struct object *obj, unsigned int n)
> +static unsigned int hash_char(const unsigned char *sha1, unsigned int n)
>  {
> -	unsigned int hash = *(unsigned int *)obj->sha1;
> -	return hash % n;
> +	unsigned int i;
> +	memcpy(&i, sha1, sizeof(unsigned int));
> +	return (int)(i % n);

Huh?  The original looks the same as the one in decorate.c but why is the
conversion different?  I am not talking about the difference between hash
and i, but am wondering about the cast to int (and then back to unsigned
int by the function signature).

It might make more sense to have one canonical

	unsigned int hash_obj(const struct object *obj, unsigned int n)

here, export it to object.h, and remove the one in decorate.c.

Or am I missing something?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Fix type-punning issues
  2009-05-12  1:17 [PATCH] Fix type-punning issues Dan McGee
  2009-05-12  7:57 ` Junio C Hamano
@ 2009-05-12  8:13 ` Johannes Schindelin
  1 sibling, 0 replies; 9+ messages in thread
From: Johannes Schindelin @ 2009-05-12  8:13 UTC (permalink / raw)
  To: Dan McGee; +Cc: git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 654 bytes --]

Hi,

On Mon, 11 May 2009, Dan McGee wrote:

> In these two places we are casting part of our unsigned char sha1 array 
> into an unsigned int, which violates GCCs strict-aliasing rules (and 
> probably other compilers). In addition, we had two hashing functions 
> defined in object.c. Keep the one function that is "correct" and adopt 
> the other ones to fit.
> 
> decorate.c: In function ‘hash_obj’:
> decorate.c:11: warning: dereferencing type-punned pointer will break
> strict-aliasing rules

FWIW we have the same issue in msysGit, having installed GCC 4.4.0 
recently, but I did not dare to send my "fix".  Yours is much nicer.

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Fix type-punning issues
  2009-05-12  7:57 ` Junio C Hamano
@ 2009-05-19  4:32   ` Dan McGee
  2009-05-19  4:34     ` [PATCH 1/3] Unify signedness in hashing calls Dan McGee
  0 siblings, 1 reply; 9+ messages in thread
From: Dan McGee @ 2009-05-19  4:32 UTC (permalink / raw)
  Cc: git

On Tue, May 12, 2009 at 2:57 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Dan McGee <dpmcgee@gmail.com> writes:
>
>> In these two places we are casting part of our unsigned char sha1 array into
>> an unsigned int, which violates GCCs strict-aliasing rules (and probably
>> other compilers).
>
> Yay.
>

> It might make more sense to have one canonical
>
>        unsigned int hash_obj(const struct object *obj, unsigned int n)
>
> here, export it to object.h, and remove the one in decorate.c.
>
> Or am I missing something?

(argh: sorry Junio for sending the last reply to just you)

So due to me taking so long to resubmit, I see you committed a
stripped-down version of my patch. I had a patch just like this (minus
one newline diff), but it was in a series of 4 I will submit in just a
second.

The three remaining patches implement the above suggestion of having
one canonical "hash" function. However, the name changes to hash_char
and takes a unsigned char pointer rather than a struct object pointer
so we can use the same function for both insertion into the hashes as
well as lookup.

Looking forward to any feedback.

-Dan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] Unify signedness in hashing calls
  2009-05-19  4:32   ` Dan McGee
@ 2009-05-19  4:34     ` Dan McGee
  2009-05-19  4:34       ` [PATCH 2/3] Convert hash functions to char instead of struct object Dan McGee
  0 siblings, 1 reply; 9+ messages in thread
From: Dan McGee @ 2009-05-19  4:34 UTC (permalink / raw)
  To: git; +Cc: Johannes.Schindelin, gitster, Dan McGee

Our hash_obj and hashtable_index calls and functions were doing a lot of
funny things with signedness. Unify all of it to 'unsigned int'.

Signed-off-by: Dan McGee <dpmcgee@gmail.com>
---
 decorate.c |    4 ++--
 object.c   |    8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/decorate.c b/decorate.c
index e6fd8a7..2f8a63e 100644
--- a/decorate.c
+++ b/decorate.c
@@ -18,7 +18,7 @@ static void *insert_decoration(struct decoration *n, const struct object *base,
 {
 	int size = n->size;
 	struct object_decoration *hash = n->hash;
-	int j = hash_obj(base, size);
+	unsigned int j = hash_obj(base, size);
 
 	while (hash[j].base) {
 		if (hash[j].base == base) {
@@ -70,7 +70,7 @@ void *add_decoration(struct decoration *n, const struct object *obj,
 /* Lookup a decoration pointer */
 void *lookup_decoration(struct decoration *n, const struct object *obj)
 {
-	int j;
+	unsigned int j;
 
 	/* nothing to lookup */
 	if (!n->size)
diff --git a/object.c b/object.c
index e1feef9..a6ef439 100644
--- a/object.c
+++ b/object.c
@@ -52,7 +52,7 @@ static unsigned int hash_obj(struct object *obj, unsigned int n)
 
 static void insert_obj_hash(struct object *obj, struct object **hash, unsigned int size)
 {
-	int j = hash_obj(obj, size);
+	unsigned int j = hash_obj(obj, size);
 
 	while (hash[j]) {
 		j++;
@@ -62,16 +62,16 @@ static void insert_obj_hash(struct object *obj, struct object **hash, unsigned i
 	hash[j] = obj;
 }
 
-static int hashtable_index(const unsigned char *sha1)
+static unsigned int hashtable_index(const unsigned char *sha1)
 {
 	unsigned int i;
 	memcpy(&i, sha1, sizeof(unsigned int));
-	return (int)(i % obj_hash_size);
+	return i % obj_hash_size;
 }
 
 struct object *lookup_object(const unsigned char *sha1)
 {
-	int i;
+	unsigned int i;
 	struct object *obj;
 
 	if (!obj_hash)
-- 
1.6.3.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/3] Convert hash functions to char instead of struct object
  2009-05-19  4:34     ` [PATCH 1/3] Unify signedness in hashing calls Dan McGee
@ 2009-05-19  4:34       ` Dan McGee
  2009-05-19  4:34         ` [PATCH 3/3] Unify sha1 char hash functions Dan McGee
  2009-05-19  6:23         ` [PATCH 2/3] Convert hash functions to char instead of struct object Johannes Sixt
  0 siblings, 2 replies; 9+ messages in thread
From: Dan McGee @ 2009-05-19  4:34 UTC (permalink / raw)
  To: git; +Cc: Johannes.Schindelin, gitster, Dan McGee

This will allow us to unify the three hash functions into just one.

Signed-off-by: Dan McGee <dpmcgee@gmail.com>
---
 decorate.c |    9 ++++-----
 object.c   |    6 +++---
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/decorate.c b/decorate.c
index 2f8a63e..3c08b96 100644
--- a/decorate.c
+++ b/decorate.c
@@ -6,11 +6,10 @@
 #include "object.h"
 #include "decorate.h"
 
-static unsigned int hash_obj(const struct object *obj, unsigned int n)
+static unsigned int hash_chars(const unsigned char *c, unsigned int n)
 {
 	unsigned int hash;
-
-	memcpy(&hash, obj->sha1, sizeof(unsigned int));
+	memcpy(&hash, c, sizeof(unsigned int));
 	return hash % n;
 }
 
@@ -18,7 +17,7 @@ static void *insert_decoration(struct decoration *n, const struct object *base,
 {
 	int size = n->size;
 	struct object_decoration *hash = n->hash;
-	unsigned int j = hash_obj(base, size);
+	unsigned int j = hash_chars(base->sha1, size);
 
 	while (hash[j].base) {
 		if (hash[j].base == base) {
@@ -75,7 +74,7 @@ void *lookup_decoration(struct decoration *n, const struct object *obj)
 	/* nothing to lookup */
 	if (!n->size)
 		return NULL;
-	j = hash_obj(obj, n->size);
+	j = hash_chars(obj->sha1, n->size);
 	for (;;) {
 		struct object_decoration *ref = n->hash + j;
 		if (ref->base == obj)
diff --git a/object.c b/object.c
index a6ef439..09c4d3c 100644
--- a/object.c
+++ b/object.c
@@ -43,16 +43,16 @@ int type_from_string(const char *str)
 	die("invalid object type \"%s\"", str);
 }
 
-static unsigned int hash_obj(struct object *obj, unsigned int n)
+static unsigned int hash_chars(const unsigned char *c, unsigned int n)
 {
 	unsigned int hash;
-	memcpy(&hash, obj->sha1, sizeof(unsigned int));
+	memcpy(&hash, c, sizeof(unsigned int));
 	return hash % n;
 }
 
 static void insert_obj_hash(struct object *obj, struct object **hash, unsigned int size)
 {
-	unsigned int j = hash_obj(obj, size);
+	unsigned int j = hash_chars(obj->sha1, size);
 
 	while (hash[j]) {
 		j++;
-- 
1.6.3.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/3] Unify sha1 char hash functions
  2009-05-19  4:34       ` [PATCH 2/3] Convert hash functions to char instead of struct object Dan McGee
@ 2009-05-19  4:34         ` Dan McGee
  2009-05-19  6:23         ` [PATCH 2/3] Convert hash functions to char instead of struct object Johannes Sixt
  1 sibling, 0 replies; 9+ messages in thread
From: Dan McGee @ 2009-05-19  4:34 UTC (permalink / raw)
  To: git; +Cc: Johannes.Schindelin, gitster, Dan McGee

Expose a canonical one in object.c; convert the hashtable_index call and
the calls in decorate.c.

Signed-off-by: Dan McGee <dpmcgee@gmail.com>
---
 decorate.c |    7 -------
 object.c   |   11 ++---------
 object.h   |    1 +
 3 files changed, 3 insertions(+), 16 deletions(-)

diff --git a/decorate.c b/decorate.c
index 3c08b96..4332924 100644
--- a/decorate.c
+++ b/decorate.c
@@ -6,13 +6,6 @@
 #include "object.h"
 #include "decorate.h"
 
-static unsigned int hash_chars(const unsigned char *c, unsigned int n)
-{
-	unsigned int hash;
-	memcpy(&hash, c, sizeof(unsigned int));
-	return hash % n;
-}
-
 static void *insert_decoration(struct decoration *n, const struct object *base, void *decoration)
 {
 	int size = n->size;
diff --git a/object.c b/object.c
index 09c4d3c..34f65e5 100644
--- a/object.c
+++ b/object.c
@@ -43,7 +43,7 @@ int type_from_string(const char *str)
 	die("invalid object type \"%s\"", str);
 }
 
-static unsigned int hash_chars(const unsigned char *c, unsigned int n)
+unsigned int hash_chars(const unsigned char *c, unsigned int n)
 {
 	unsigned int hash;
 	memcpy(&hash, c, sizeof(unsigned int));
@@ -62,13 +62,6 @@ static void insert_obj_hash(struct object *obj, struct object **hash, unsigned i
 	hash[j] = obj;
 }
 
-static unsigned int hashtable_index(const unsigned char *sha1)
-{
-	unsigned int i;
-	memcpy(&i, sha1, sizeof(unsigned int));
-	return i % obj_hash_size;
-}
-
 struct object *lookup_object(const unsigned char *sha1)
 {
 	unsigned int i;
@@ -77,7 +70,7 @@ struct object *lookup_object(const unsigned char *sha1)
 	if (!obj_hash)
 		return NULL;
 
-	i = hashtable_index(sha1);
+	i = hash_chars(sha1, obj_hash_size);
 	while ((obj = obj_hash[i]) != NULL) {
 		if (!hashcmp(sha1, obj->sha1))
 			break;
diff --git a/object.h b/object.h
index 89dd0c4..ed73a0a 100644
--- a/object.h
+++ b/object.h
@@ -37,6 +37,7 @@ struct object {
 
 extern const char *typename(unsigned int type);
 extern int type_from_string(const char *str);
+extern unsigned int hash_chars(const unsigned char *c, unsigned int n);
 
 extern unsigned int get_max_object_index(void);
 extern struct object *get_indexed_object(unsigned int);
-- 
1.6.3.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] Convert hash functions to char instead of struct object
  2009-05-19  4:34       ` [PATCH 2/3] Convert hash functions to char instead of struct object Dan McGee
  2009-05-19  4:34         ` [PATCH 3/3] Unify sha1 char hash functions Dan McGee
@ 2009-05-19  6:23         ` Johannes Sixt
  2009-05-19  7:04           ` Junio C Hamano
  1 sibling, 1 reply; 9+ messages in thread
From: Johannes Sixt @ 2009-05-19  6:23 UTC (permalink / raw)
  To: Dan McGee; +Cc: git, Johannes.Schindelin, gitster

Dan McGee schrieb:
> -static unsigned int hash_obj(struct object *obj, unsigned int n)
> +static unsigned int hash_chars(const unsigned char *c, unsigned int n)

hash_chars suggests that this function hashes arbitrary character
sequences, but it doesn't do that. Wouldn't hash_object_id be a better
name? (And the parameter would then obviously be named sha1 or id.)

-- Hannes

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] Convert hash functions to char instead of struct object
  2009-05-19  6:23         ` [PATCH 2/3] Convert hash functions to char instead of struct object Johannes Sixt
@ 2009-05-19  7:04           ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2009-05-19  7:04 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Dan McGee, git, Johannes.Schindelin, gitster

Johannes Sixt <j.sixt@viscovery.net> writes:

> Dan McGee schrieb:
>> -static unsigned int hash_obj(struct object *obj, unsigned int n)
>> +static unsigned int hash_chars(const unsigned char *c, unsigned int n)
>
> hash_chars suggests that this function hashes arbitrary character
> sequences, but it doesn't do that. Wouldn't hash_object_id be a better
> name? (And the parameter would then obviously be named sha1 or id.)

Yes, the parameter to this function is what we call "unsigned char *sha1"
everywhere else in the code.  hash-object-id or hash-object-name is a good
name for the function.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-05-19  7:14 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-12  1:17 [PATCH] Fix type-punning issues Dan McGee
2009-05-12  7:57 ` Junio C Hamano
2009-05-19  4:32   ` Dan McGee
2009-05-19  4:34     ` [PATCH 1/3] Unify signedness in hashing calls Dan McGee
2009-05-19  4:34       ` [PATCH 2/3] Convert hash functions to char instead of struct object Dan McGee
2009-05-19  4:34         ` [PATCH 3/3] Unify sha1 char hash functions Dan McGee
2009-05-19  6:23         ` [PATCH 2/3] Convert hash functions to char instead of struct object Johannes Sixt
2009-05-19  7:04           ` Junio C Hamano
2009-05-12  8:13 ` [PATCH] Fix type-punning issues Johannes Schindelin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).