* [PATCH] fix signed range problems with hex conversions
@ 2007-05-30 17:09 Nicolas Pitre
2007-05-30 17:32 ` Linus Torvalds
0 siblings, 1 reply; 4+ messages in thread
From: Nicolas Pitre @ 2007-05-30 17:09 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
The get_sha1_hex() function is defined as accepting a char array.
Because the char type is signed by default on many architectures,
get_sha1_hex() can be passed a pointer to negative chars. This can
happen with user input containing chars with the top bit set.
Then those chars are passed to hexval() which is defined as accepting an
unsigned int value. Whenever a signed char is promoted to an int, the
promotion is always signed and then the result is stored in the unsigned
int variable. In the negative char case that means really large
unsigned int values will result, and then the hexval_table is happily
indexed with that value.
On 32-bit architectures the large int value will create a wrap-around
and a byte located somewhere before the hexval_table array in memory
will be fetched. Depending on that byte value a bogus SHA1 value could
be returned.
On 64-bit architectures the large int value will most probably cause a
segmentation fault.
This patch adds a range test to hexval() in order to prevent this. Also
let's index the hexval_table array directly in get_sha1_hex() using
explicitly unsigned chars to avoid the range test producing faster
code.
While at it, make hexval_table const.
Signed-off-by: Nicolas Pitre <nico@cam.org>
---
diff --git a/cache.h b/cache.h
index f675223..30fcaa9 100644
--- a/cache.h
+++ b/cache.h
@@ -359,10 +359,10 @@ extern void *map_sha1_file(const unsigned char *sha1, unsigned long *);
extern int has_pack_file(const unsigned char *sha1);
extern int has_pack_index(const unsigned char *sha1);
-extern signed char hexval_table[256];
+extern const signed char hexval_table[256];
static inline unsigned int hexval(unsigned int c)
{
- return hexval_table[c];
+ return (c & ~0xff) ? -1 : hexval_table[c];
}
/* Convert to/from hex/sha1 representation */
diff --git a/sha1_file.c b/sha1_file.c
index a3637d7..e10fb4b 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -33,7 +33,7 @@ const unsigned char null_sha1[20];
static unsigned int sha1_file_open_flag = O_NOATIME;
-signed char hexval_table[256] = {
+const signed char hexval_table[256] = {
-1, -1, -1, -1, -1, -1, -1, -1, /* 00-07 */
-1, -1, -1, -1, -1, -1, -1, -1, /* 08-0f */
-1, -1, -1, -1, -1, -1, -1, -1, /* 10-17 */
@@ -72,11 +72,12 @@ int get_sha1_hex(const char *hex, unsigned char *sha1)
{
int i;
for (i = 0; i < 20; i++) {
- unsigned int val = (hexval(hex[0]) << 4) | hexval(hex[1]);
+ unsigned char c0 = *hex++;
+ unsigned char c1 = *hex++;
+ unsigned int val = (hexval_table[c0] << 4) | hexval_table[c1];
if (val & ~0xff)
return -1;
*sha1++ = val;
- hex += 2;
}
return 0;
}
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] fix signed range problems with hex conversions
2007-05-30 17:09 [PATCH] fix signed range problems with hex conversions Nicolas Pitre
@ 2007-05-30 17:32 ` Linus Torvalds
2007-05-30 17:42 ` Linus Torvalds
2007-05-30 18:20 ` Nicolas Pitre
0 siblings, 2 replies; 4+ messages in thread
From: Linus Torvalds @ 2007-05-30 17:32 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Junio C Hamano, Git Mailing List
On Wed, 30 May 2007, Nicolas Pitre wrote:
>
> This patch adds a range test to hexval() in order to prevent this. Also
> let's index the hexval_table array directly in get_sha1_hex() using
> explicitly unsigned chars to avoid the range test producing faster
> code.
Please just make hexval() take a "unsigned char" instead, solving the
problem in a manner that makes it much easier for the compiler to realize
that it never needs to sign-extend the value or test the end result..
Ie I think your patch would be better off something like the following
instead (it would be a one-liner, but I agree that marking hexval_table
"const" is also a good idea).
With this, gcc can just generate:
movzbl (%rdi), %eax
movsbl hexval_table(%rax),%edx
movzbl 1(%rdi), %eax
movsbl hexval_table(%rax),%eax
sall $4, %edx
orl %eax, %edx
for the code to generate a byte from two hex characters.
Linus
----
diff --git a/cache.h b/cache.h
index ec85d93..0da7070 100644
--- a/cache.h
+++ b/cache.h
@@ -359,8 +359,8 @@ extern void *map_sha1_file(const unsigned char *sha1, unsigned long *);
extern int has_pack_file(const unsigned char *sha1);
extern int has_pack_index(const unsigned char *sha1);
-extern signed char hexval_table[256];
-static inline unsigned int hexval(unsigned int c)
+extern const signed char hexval_table[256];
+static inline unsigned int hexval(unsigned char c)
{
return hexval_table[c];
}
diff --git a/sha1_file.c b/sha1_file.c
index 12d2ef2..f959fe5 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -33,7 +33,7 @@ const unsigned char null_sha1[20];
static unsigned int sha1_file_open_flag = O_NOATIME;
-signed char hexval_table[256] = {
+const signed char hexval_table[256] = {
-1, -1, -1, -1, -1, -1, -1, -1, /* 00-07 */
-1, -1, -1, -1, -1, -1, -1, -1, /* 08-0f */
-1, -1, -1, -1, -1, -1, -1, -1, /* 10-17 */
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] fix signed range problems with hex conversions
2007-05-30 17:32 ` Linus Torvalds
@ 2007-05-30 17:42 ` Linus Torvalds
2007-05-30 18:20 ` Nicolas Pitre
1 sibling, 0 replies; 4+ messages in thread
From: Linus Torvalds @ 2007-05-30 17:42 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Junio C Hamano, Git Mailing List
On Wed, 30 May 2007, Linus Torvalds wrote:
>
> With this, gcc can just generate:
>
> movzbl (%rdi), %eax
> ...
Btw, in the kernel we have a rule for *.c -> *.s files exactly because
it's nice to be able to easily say "ok, what does that generate".
Here's a patch to add such a rule to git too, in case anybody is
interested. It makes it much simpler to just do
make sha1_file.s
and look at the compiler-generated output that way, rather than having to
fire up gdb on the resulting binary.
(Add -fverbose-asm or something if you want to, it can make the result
even more readable)
Linus
---
diff --git a/Makefile b/Makefile
index 7527734..7ffb803 100644
--- a/Makefile
+++ b/Makefile
@@ -846,6 +846,8 @@ git$X git.spec \
%.o: %.c GIT-CFLAGS
$(QUIET_CC)$(CC) -o $*.o -c $(ALL_CFLAGS) $<
+%.s: %.c GIT-CFLAGS
+ $(QUIET_CC)$(CC) -S $(ALL_CFLAGS) $<
%.o: %.S
$(QUIET_CC)$(CC) -o $*.o -c $(ALL_CFLAGS) $<
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] fix signed range problems with hex conversions
2007-05-30 17:32 ` Linus Torvalds
2007-05-30 17:42 ` Linus Torvalds
@ 2007-05-30 18:20 ` Nicolas Pitre
1 sibling, 0 replies; 4+ messages in thread
From: Nicolas Pitre @ 2007-05-30 18:20 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List
On Wed, 30 May 2007, Linus Torvalds wrote:
> On Wed, 30 May 2007, Nicolas Pitre wrote:
> >
> > This patch adds a range test to hexval() in order to prevent this. Also
> > let's index the hexval_table array directly in get_sha1_hex() using
> > explicitly unsigned chars to avoid the range test producing faster
> > code.
>
> Please just make hexval() take a "unsigned char" instead, solving the
> problem in a manner that makes it much easier for the compiler to realize
> that it never needs to sign-extend the value or test the end result..
Well, I did it the way I did for two reasons, the first being that
random hexval() usage won't mask a bad value if it is passed an int
which happens to be out of range (think EOF, or better yet -208, -207,
etc).
Yet gcc appears to be smart enough to skip the test if it is passed a
value that cannot exceed the test range, like an unsigned char.
> Ie I think your patch would be better off something like the following
> instead (it would be a one-liner, but I agree that marking hexval_table
> "const" is also a good idea).
>
> With this, gcc can just generate:
>
> movzbl (%rdi), %eax
> movsbl hexval_table(%rax),%edx
> movzbl 1(%rdi), %eax
> movsbl hexval_table(%rax),%eax
> sall $4, %edx
> orl %eax, %edx
>
> for the code to generate a byte from two hex characters.
My patch already produces code that looks like that for get_sha1_hex()
on i386. But my second reason for the patch is that on ARM it
should becomes:
/* r4 = hexval_table, ip = hex */
ldrb r1, [ip], #1
ldrb r2, [ip], #1
ldrsb r0, [r4, r1]
ldrsb r3, [r4, r2]
orr r0, r3, r0, asl #4
I.e. the compiler has a greater chance to fold the post increment with
the load byte instruction as above without the need for an extra
add instruction.
Nicolas
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2007-05-30 18:20 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-30 17:09 [PATCH] fix signed range problems with hex conversions Nicolas Pitre
2007-05-30 17:32 ` Linus Torvalds
2007-05-30 17:42 ` Linus Torvalds
2007-05-30 18:20 ` Nicolas Pitre
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).