From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=54820 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Orzxu-0004b2-R0 for qemu-devel@nongnu.org; Sat, 04 Sep 2010 17:08:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1Orzxt-0001lQ-0p for qemu-devel@nongnu.org; Sat, 04 Sep 2010 17:07:58 -0400 Received: from mail-qy0-f173.google.com ([209.85.216.173]:48579) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Orzxs-0001lI-Rn for qemu-devel@nongnu.org; Sat, 04 Sep 2010 17:07:56 -0400 Received: by qyk34 with SMTP id 34so1397964qyk.4 for ; Sat, 04 Sep 2010 14:07:56 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: From: Blue Swirl Date: Sat, 4 Sep 2010 21:07:36 +0000 Message-ID: Subject: Re: [Qemu-devel] [PATCH 1/5] Suppress some gcc warnings with -Wtype-limits Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: andrzej zaborowski Cc: qemu-devel On Sat, Sep 4, 2010 at 8:30 PM, andrzej zaborowski wrot= e: > Hi, > > On 4 September 2010 21:45, Blue Swirl wrote: >> On Sat, Sep 4, 2010 at 5:57 PM, andrzej zaborowski w= rote: >>>> =C2=A0- =C2=A0 =C2=A0if (event < 0 || event >=3D BLKDBG_EVENT_MAX) { >>>> =C2=A0+ =C2=A0 =C2=A0if ((int)event < 0 || event >=3D BLKDBG_EVENT_MAX= ) { >>> Is the behaviour incorrect for some values, or is it not correct C? >>> As far as I know it is correct in c99 because of type promotions >>> between enum, int and unsigned int. >> >> The problem is that the type of an enum may be signed or unsigned, >> which affects the comparison. For example, the following program >> produces different results when it's compiled with -DUNSIGNED and >> without: >> $ cat enum.c >> #include >> >> int main(int argc, const char **argv) >> { >> =C2=A0 =C2=A0enum { >> #ifdef UNSIGNED >> =C2=A0 =C2=A0 =C2=A0 =C2=A0A =3D 0, >> #else >> =C2=A0 =C2=A0 =C2=A0 =C2=A0A =3D -1, >> #endif >> =C2=A0 =C2=A0} a; >> >> =C2=A0 =C2=A0a =3D atoi(argv[1]); >> =C2=A0 =C2=A0if (a < 0) { >> =C2=A0 =C2=A0 =C2=A0 =C2=A0puts("1: smaller"); >> =C2=A0 =C2=A0} else { >> =C2=A0 =C2=A0 =C2=A0 =C2=A0puts("1: not smaller"); >> =C2=A0 =C2=A0} >> >> =C2=A0 =C2=A0if ((int)a < 0) { >> =C2=A0 =C2=A0 =C2=A0 =C2=A0puts("2: smaller"); >> =C2=A0 =C2=A0} else { >> =C2=A0 =C2=A0 =C2=A0 =C2=A0puts("2: not smaller"); >> =C2=A0 =C2=A0} >> >> =C2=A0 =C2=A0return 0; >> } >> $ gcc -DUNSIGNED enum.c -o enum-unsigned >> $ gcc enum.c -o enum-signed >> $ ./enum-signed 0 >> 1: not smaller >> 2: not smaller >> $ ./enum-signed -1 >> 1: smaller >> 2: smaller >> $ ./enum-unsigned 0 >> 1: not smaller >> 2: not smaller >> $ ./enum-unsigned -1 >> 1: not smaller >> 2: smaller > > Ah, that's a good example, however.. > >> >> This is only how GCC uses enums, other compilers have other rules. So >> it's better to avoid any assumptions about signedness of enums. >> >> In this specific case, because the enum does not have any negative >> items, it is unsigned if compiled with GCC. Unsigned items cannot be >> negative, thus the check is useless. > > I agree it's useless, but saying that it is wrong is a bit of a > stretch in my opinion. =C2=A0It actually depends on what the intent was -= - > if the intent was to be able to use the value as an array index, then > I think the check does the job independent of the signedness of the > operands. No. If BLKDBG_EVENT_MAX is < 0x80000000, it does not matter if there is the check or not. Because of unsigned arithmetic, only one comparison is enough. With a non-GCC compiler (or if there were negative enum items), the check may have the desired effect, just like with int cast. If BLKDBG_EVENT_MAX is >=3D 0x80000000, the first check is wrong, because you want to allow array access beyond 0x80000000, which will be blocked by the first check. A non-GCC compiler may actually reject an enum bigger than 0x80000000. Then the checks should be rewritten. The version with int cast is correct in more cases than the version which relies on enum signedness. > >> >> If the enum included also negative values (or compiled with a compiler >> using different rules), the check would succeed. Though then the check >> against 0 would be wrong because the author probably wanted to check >> against the smallest possible value. >> >> In both cases, the cast to int makes sure that the comparison is meaning= ful. >> >>>>>>> There are some projects that avoid using strcat for example, becaus= e, >>>>>>> if misused, it can cause crashes. >>>>>> >>>>>> We also do that, there's pstrcat() and others. >>>>> >>>>> I don't believe we avoid strcat everywhere. =C2=A0strcat and pstrcat = are >>>>> different functions to be used on different occasions. =C2=A0It'd be = the >>>>> same level of ridiculous as not using malloc or not using the number = 5 >>>>> (because pstrcat(NULL, 5, "a") causes a crash) with an added >>>>> performance penalty. >>>> >>>> There's no difference in using using strcat vs. pstrcat, or sprintf >>>> vs. snprintf. If the caller of strcat doesn't know the buffer size, >>>> someone down the call chain must know it, so it can be passed. The >>>> major benefit is that buffer overflows will be avoided, at the >>>> possible cost of extra parameter passing. Again, the benefit exceeds >>>> cost. >>> >>> Usually you'll allocate the buffer of the size that is needed, so you >>> can do for exmple >>> >>> buffer =3D qemu_malloc(strlen(this) + strlen(that) + 1); >>> and then call strcpy and strcat >> >> But then you can easily add >> buflen =3D strlen(this) + strlen(that) + 1; >> and use that for malloc and pstrcat. Cost: one additional variable. > > Plus the cost of the strlen inside pstrcat. =C2=A0pstrcat has to either > check the length of source string against the buffer size first, or do > it at the same time it copies the string from source to destination > character by character, but either way the penalty is of linear cost. > If it was an inline function, then perhaps the compiler could optimise > the second strlen away in some situations, specially since strcat, > strlen etc are now builtins. void pstrcpy(char *buf, int buf_size, const char *str) { int c; char *q =3D buf; if (buf_size <=3D 0) return; for(;;) { c =3D *str++; if (c =3D=3D 0 || q >=3D buf + buf_size - 1) break; *q++ =3D c; } *q =3D '\0'; } There is one extra check, q >=3D buf + buf_size - 1. No strlen(). 0000000000000000 : 0: 48 83 ec 18 sub $0x18,%rsp 4: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax b: 00 00 d: 48 89 44 24 10 mov %rax,0x10(%rsp) 12: 31 c0 xor %eax,%eax 14: 85 f6 test %esi,%esi 16: 7e 39 jle 51 18: 0f b6 0a movzbl (%rdx),%ecx 1b: 84 c9 test %cl,%cl 1d: 74 2f je 4e 1f: 48 63 c6 movslq %esi,%rax 22: 48 8d 44 07 ff lea -0x1(%rdi,%rax,1),%rax 27: 48 39 c7 cmp %rax,%rdi 2a: 73 22 jae 4e 2c: 48 83 c2 01 add $0x1,%rdx 30: eb 0b jmp 3d 32: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) The copying loop follows until 4c: 38: 48 39 c7 cmp %rax,%rdi 3b: 73 11 jae 4e 3d: 88 0f mov %cl,(%rdi) 3f: 0f b6 0a movzbl (%rdx),%ecx 42: 48 83 c7 01 add $0x1,%rdi 46: 48 83 c2 01 add $0x1,%rdx 4a: 84 c9 test %cl,%cl 4c: 75 ea jne 38 4e: c6 07 00 movb $0x0,(%rdi) 51: 48 8b 44 24 10 mov 0x10(%rsp),%rax 56: 64 48 33 04 25 28 00 xor %fs:0x28,%rax 5d: 00 00 5f: 75 05 jne 66 61: 48 83 c4 18 add $0x18,%rsp 65: c3 retq 66: 66 90 xchg %ax,%ax 68: e8 00 00 00 00 callq 6d 6d: 0f 1f 00 nopl (%rax) The extra check in the loop adds the instructions at 38 and 3b. Those are not memory accesses, so the cost should be marginal. > So the reason I dislike using it blindly is that often you already > know that a buffer overflow is out of question, and secondly if > misused, it can hide a real bug where the string gets unintentionally > truncated and as a result something worse than an overflow happens > (e.g. a file on disk is overwritten) without noticing whereas a > segfault would be noticed. The 'segfault' can be much, much worse: http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=3Dbuffer+overflow It's true that truncation can also cause bugs without proper checks, but still there shouldn't be a direct route to 'allow remote attackers to execute arbitrary code'.