git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] type_from_string_gently: make sure length matches
@ 2015-04-17 14:52 Jeff King
  2015-04-17 20:54 ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Jeff King @ 2015-04-17 14:52 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Karthik Nayak, Junio C Hamano

When commit fe8e3b7 refactored type_from_string to allow
input that was not NUL-terminated, it switched to using
strncmp instead of strcmp. But this means we check only the
first "len" bytes of the strings, and ignore any remaining
bytes in the object_type_string. We should make sure that it
is also "len" bytes, or else we would accept "comm" as
"commit", and so forth.

Signed-off-by: Jeff King <peff@peff.net>
---
Since the strings we are matching are literals, we could also record
their sizes in the object_type_strings array and check the length first
before even calling strncmp. I doubt this is a performance hot-spot,
though.

You could also potentially just use strlen(object_type_strings[i]), but
I'm not sure if compilers will optimize out the strlen in this case,
since it is in a loop.

 object.c               | 3 ++-
 t/t1007-hash-object.sh | 8 ++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/object.c b/object.c
index 23d6c96..980ac5f 100644
--- a/object.c
+++ b/object.c
@@ -41,7 +41,8 @@ int type_from_string_gently(const char *str, ssize_t len, int gentle)
 		len = strlen(str);
 
 	for (i = 1; i < ARRAY_SIZE(object_type_strings); i++)
-		if (!strncmp(str, object_type_strings[i], len))
+		if (!strncmp(str, object_type_strings[i], len) &&
+		    object_type_strings[i][len] == '\0')
 			return i;
 
 	if (gentle)
diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index f83df8e..ebb3a69 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -201,4 +201,12 @@ test_expect_success 'corrupt tag' '
 	test_must_fail git hash-object -t tag --stdin </dev/null
 '
 
+test_expect_success 'hash-object complains about bogus type name' '
+	test_must_fail git hash-object -t bogus --stdin </dev/null
+'
+
+test_expect_success 'hash-object complains about truncated type name' '
+	test_must_fail git hash-object -t bl --stdin </dev/null
+'
+
 test_done
-- 
2.4.0.rc2.384.g7297a4a

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] type_from_string_gently: make sure length matches
  2015-04-17 14:52 [PATCH] type_from_string_gently: make sure length matches Jeff King
@ 2015-04-17 20:54 ` Junio C Hamano
  2015-04-17 21:07   ` Jeff King
  0 siblings, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2015-04-17 20:54 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Johannes Schindelin, Karthik Nayak

Jeff King <peff@peff.net> writes:

> When commit fe8e3b7 refactored type_from_string to allow
> input that was not NUL-terminated, it switched to using
> strncmp instead of strcmp. But this means we check only the
> first "len" bytes of the strings, and ignore any remaining
> bytes in the object_type_string. We should make sure that it
> is also "len" bytes, or else we would accept "comm" as
> "commit", and so forth.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> Since the strings we are matching are literals, we could also record
> their sizes in the object_type_strings array and check the length first
> before even calling strncmp. I doubt this is a performance hot-spot,
> though.
>
> You could also potentially just use strlen(object_type_strings[i]), but
> I'm not sure if compilers will optimize out the strlen in this case,
> since it is in a loop.

That thought crossed my mind while reading your patch.  It could
even make it go faster if we made object_type_strings into an array
of counted strings (i.e. "struct { const char *str; int len; }")
and then took advantage of the fact that we have lengths of both.


 object.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/object.c b/object.c
index aedac24..51584ea 100644
--- a/object.c
+++ b/object.c
@@ -18,19 +18,22 @@ struct object *get_indexed_object(unsigned int idx)
 	return obj_hash[idx];
 }
 
-static const char *object_type_strings[] = {
-	NULL,		/* OBJ_NONE = 0 */
-	"commit",	/* OBJ_COMMIT = 1 */
-	"tree",		/* OBJ_TREE = 2 */
-	"blob",		/* OBJ_BLOB = 3 */
-	"tag",		/* OBJ_TAG = 4 */
+static struct {
+	const char *str;
+	int len;
+} object_type_name[] = {
+	{ NULL, 0 },      /* OBJ_NONE = 0 */
+	{ "commit", 6 },  /* OBJ_COMMIT = 1 */
+	{ "tree", 4 },	  /* OBJ_TREE = 2 */
+	{ "blob", 4 },    /* OBJ_BLOB = 3 */
+	{ "tag", 3 },     /* OBJ_TAG = 4 */
 };
 
 const char *typename(unsigned int type)
 {
-	if (type >= ARRAY_SIZE(object_type_strings))
+	if (type >= ARRAY_SIZE(object_type_name))
 		return NULL;
-	return object_type_strings[type];
+	return object_type_name[type].str;
 }
 
 int type_from_string_gently(const char *str, ssize_t len, int gentle)
@@ -40,8 +43,9 @@ int type_from_string_gently(const char *str, ssize_t len, int gentle)
 	if (len < 0)
 		len = strlen(str);
 
-	for (i = 1; i < ARRAY_SIZE(object_type_strings); i++)
-		if (!strncmp(str, object_type_strings[i], len))
+	for (i = 1; i < ARRAY_SIZE(object_type_name); i++)
+		if (object_type_name[i].len == len &&
+			    !strncmp(str, object_type_name[i].str, len))
 			return i;
 
 	if (gentle)

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] type_from_string_gently: make sure length matches
  2015-04-17 20:54 ` Junio C Hamano
@ 2015-04-17 21:07   ` Jeff King
  2015-04-17 21:11     ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Jeff King @ 2015-04-17 21:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Johannes Schindelin, Karthik Nayak

On Fri, Apr 17, 2015 at 01:54:27PM -0700, Junio C Hamano wrote:

> > Since the strings we are matching are literals, we could also record
> > their sizes in the object_type_strings array and check the length first
> > before even calling strncmp. I doubt this is a performance hot-spot,
> > though.
> >
> > You could also potentially just use strlen(object_type_strings[i]), but
> > I'm not sure if compilers will optimize out the strlen in this case,
> > since it is in a loop.
> 
> That thought crossed my mind while reading your patch.  It could
> even make it go faster if we made object_type_strings into an array
> of counted strings (i.e. "struct { const char *str; int len; }")
> and then took advantage of the fact that we have lengths of both.

Right, that was what I meant.

I'd be surprised if it appreciably speeds things up, but I guess it is
not too complicated to do.

> +static struct {
> +	const char *str;
> +	int len;
> +} object_type_name[] = {
> +	{ NULL, 0 },      /* OBJ_NONE = 0 */
> +	{ "commit", 6 },  /* OBJ_COMMIT = 1 */
> +	{ "tree", 4 },	  /* OBJ_TREE = 2 */
> +	{ "blob", 4 },    /* OBJ_BLOB = 3 */
> +	{ "tag", 3 },     /* OBJ_TAG = 4 */
>  };

I had envisioned a macro like:

  #define SIZED_STRING(x) { (x), (sizeof(x) - 1) }

though perhaps that is overkill for such a short list (that we don't
even expect to change).

-Peff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] type_from_string_gently: make sure length matches
  2015-04-17 21:07   ` Jeff King
@ 2015-04-17 21:11     ` Junio C Hamano
  0 siblings, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2015-04-17 21:11 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Johannes Schindelin, Karthik Nayak

Jeff King <peff@peff.net> writes:

> I'd be surprised if it appreciably speeds things up, but I guess it is
> not too complicated to do.
>
>> +static struct {
>> +	const char *str;
>> +	int len;
>> +} object_type_name[] = {
>> +	{ NULL, 0 },      /* OBJ_NONE = 0 */
>> +	{ "commit", 6 },  /* OBJ_COMMIT = 1 */
>> +	{ "tree", 4 },	  /* OBJ_TREE = 2 */
>> +	{ "blob", 4 },    /* OBJ_BLOB = 3 */
>> +	{ "tag", 3 },     /* OBJ_TAG = 4 */
>>  };
>
> I had envisioned a macro like:
>
>   #define SIZED_STRING(x) { (x), (sizeof(x) - 1) }
>
> though perhaps that is overkill for such a short list (that we don't
> even expect to change).

Sounds good (either way ;-)

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-04-17 21:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-17 14:52 [PATCH] type_from_string_gently: make sure length matches Jeff King
2015-04-17 20:54 ` Junio C Hamano
2015-04-17 21:07   ` Jeff King
2015-04-17 21:11     ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).