From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=54820 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Orzxu-0004b2-R0
	for qemu-devel@nongnu.org; Sat, 04 Sep 2010 17:08:01 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <blauwirbel@gmail.com>) id 1Orzxt-0001lQ-0p
	for qemu-devel@nongnu.org; Sat, 04 Sep 2010 17:07:58 -0400
Received: from mail-qy0-f173.google.com ([209.85.216.173]:48579)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <blauwirbel@gmail.com>) id 1Orzxs-0001lI-Rn
	for qemu-devel@nongnu.org; Sat, 04 Sep 2010 17:07:56 -0400
Received: by qyk34 with SMTP id 34so1397964qyk.4
	for <qemu-devel@nongnu.org>; Sat, 04 Sep 2010 14:07:56 -0700 (PDT)
MIME-Version: 1.0
In-Reply-To: <AANLkTin80F--y2GmWFM_dBp0+9XVGiNj8XYotAhiSUS9@mail.gmail.com>
References: <AANLkTimrAJ7fJzOhJRXLVpRjc8F+CoxWsETHss6tmZG3@mail.gmail.com>
	<AANLkTi=FSEZTUKQ38hgRXpxhjvYJW4wW3eSiy-aZy-AZ@mail.gmail.com>
	<AANLkTikGjBQ+T8XUhSj=HbKmJbx0H72JT9V5JEk2mRb3@mail.gmail.com>
	<AANLkTimosQmj8YZf2UrRqux8GwdmcMBp2Rz2aGVXZVO4@mail.gmail.com>
	<AANLkTinPwd6VrEdp-G4Vx5adsPPUe+-LG9svcU4RW_S+@mail.gmail.com>
	<AANLkTi=j7egaoiwWi6=ijbpZGV8mavCUQCO_-g8bE7aM@mail.gmail.com>
	<AANLkTikhfC2U17B9jHuR6v6_qOAW3CPbjncoA6WCraHX@mail.gmail.com>
	<AANLkTin80F--y2GmWFM_dBp0+9XVGiNj8XYotAhiSUS9@mail.gmail.com>
From: Blue Swirl <blauwirbel@gmail.com>
Date: Sat, 4 Sep 2010 21:07:36 +0000
Message-ID: <AANLkTing_8g24OUb81duni05pMtO2KvEdCiQ0Xpd-QCr@mail.gmail.com>
Subject: Re: [Qemu-devel] [PATCH 1/5] Suppress some gcc warnings with
	-Wtype-limits
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: andrzej zaborowski <balrogg@gmail.com>
Cc: qemu-devel <qemu-devel@nongnu.org>

On Sat, Sep 4, 2010 at 8:30 PM, andrzej zaborowski <balrogg@gmail.com> wrot=
e:
> Hi,
>
> On 4 September 2010 21:45, Blue Swirl <blauwirbel@gmail.com> wrote:
>> On Sat, Sep 4, 2010 at 5:57 PM, andrzej zaborowski <balrogg@gmail.com> w=
rote:
>>>> =C2=A0- =C2=A0 =C2=A0if (event < 0 || event >=3D BLKDBG_EVENT_MAX) {
>>>> =C2=A0+ =C2=A0 =C2=A0if ((int)event < 0 || event >=3D BLKDBG_EVENT_MAX=
) {
>>> Is the behaviour incorrect for some values, or is it not correct C?
>>> As far as I know it is correct in c99 because of type promotions
>>> between enum, int and unsigned int.
>>
>> The problem is that the type of an enum may be signed or unsigned,
>> which affects the comparison. For example, the following program
>> produces different results when it's compiled with -DUNSIGNED and
>> without:
>> $ cat enum.c
>> #include <stdio.h>
>>
>> int main(int argc, const char **argv)
>> {
>> =C2=A0 =C2=A0enum {
>> #ifdef UNSIGNED
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0A =3D 0,
>> #else
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0A =3D -1,
>> #endif
>> =C2=A0 =C2=A0} a;
>>
>> =C2=A0 =C2=A0a =3D atoi(argv[1]);
>> =C2=A0 =C2=A0if (a < 0) {
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0puts("1: smaller");
>> =C2=A0 =C2=A0} else {
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0puts("1: not smaller");
>> =C2=A0 =C2=A0}
>>
>> =C2=A0 =C2=A0if ((int)a < 0) {
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0puts("2: smaller");
>> =C2=A0 =C2=A0} else {
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0puts("2: not smaller");
>> =C2=A0 =C2=A0}
>>
>> =C2=A0 =C2=A0return 0;
>> }
>> $ gcc -DUNSIGNED enum.c -o enum-unsigned
>> $ gcc enum.c -o enum-signed
>> $ ./enum-signed 0
>> 1: not smaller
>> 2: not smaller
>> $ ./enum-signed -1
>> 1: smaller
>> 2: smaller
>> $ ./enum-unsigned 0
>> 1: not smaller
>> 2: not smaller
>> $ ./enum-unsigned -1
>> 1: not smaller
>> 2: smaller
>
> Ah, that's a good example, however..
>
>>
>> This is only how GCC uses enums, other compilers have other rules. So
>> it's better to avoid any assumptions about signedness of enums.
>>
>> In this specific case, because the enum does not have any negative
>> items, it is unsigned if compiled with GCC. Unsigned items cannot be
>> negative, thus the check is useless.
>
> I agree it's useless, but saying that it is wrong is a bit of a
> stretch in my opinion. =C2=A0It actually depends on what the intent was -=
-
> if the intent was to be able to use the value as an array index, then
> I think the check does the job independent of the signedness of the
> operands.

No.

If BLKDBG_EVENT_MAX is < 0x80000000, it does not matter if there is
the check or not. Because of unsigned arithmetic, only one comparison
is enough. With a non-GCC compiler (or if there were negative enum
items), the check may have the desired effect, just like with int
cast.

If BLKDBG_EVENT_MAX is >=3D 0x80000000, the first check is wrong,
because you want to allow array access beyond 0x80000000, which will
be blocked by the first check. A non-GCC compiler may actually reject
an enum bigger than 0x80000000. Then the checks should be rewritten.

The version with int cast is correct in more cases than the version
which relies on enum signedness.

>
>>
>> If the enum included also negative values (or compiled with a compiler
>> using different rules), the check would succeed. Though then the check
>> against 0 would be wrong because the author probably wanted to check
>> against the smallest possible value.
>>
>> In both cases, the cast to int makes sure that the comparison is meaning=
ful.
>>
>>>>>>> There are some projects that avoid using strcat for example, becaus=
e,
>>>>>>> if misused, it can cause crashes.
>>>>>>
>>>>>> We also do that, there's pstrcat() and others.
>>>>>
>>>>> I don't believe we avoid strcat everywhere. =C2=A0strcat and pstrcat =
are
>>>>> different functions to be used on different occasions. =C2=A0It'd be =
the
>>>>> same level of ridiculous as not using malloc or not using the number =
5
>>>>> (because pstrcat(NULL, 5, "a") causes a crash) with an added
>>>>> performance penalty.
>>>>
>>>> There's no difference in using using strcat vs. pstrcat, or sprintf
>>>> vs. snprintf. If the caller of strcat doesn't know the buffer size,
>>>> someone down the call chain must know it, so it can be passed. The
>>>> major benefit is that buffer overflows will be avoided, at the
>>>> possible cost of extra parameter passing. Again, the benefit exceeds
>>>> cost.
>>>
>>> Usually you'll allocate the buffer of the size that is needed, so you
>>> can do for exmple
>>>
>>> buffer =3D qemu_malloc(strlen(this) + strlen(that) + 1);
>>> and then call strcpy and strcat
>>
>> But then you can easily add
>> buflen =3D strlen(this) + strlen(that) + 1;
>> and use that for malloc and pstrcat. Cost: one additional variable.
>
> Plus the cost of the strlen inside pstrcat. =C2=A0pstrcat has to either
> check the length of source string against the buffer size first, or do
> it at the same time it copies the string from source to destination
> character by character, but either way the penalty is of linear cost.
> If it was an inline function, then perhaps the compiler could optimise
> the second strlen away in some situations, specially since strcat,
> strlen etc are now builtins.

void pstrcpy(char *buf, int buf_size, const char *str)
{
    int c;
    char *q =3D buf;

    if (buf_size <=3D 0)
        return;

    for(;;) {
        c =3D *str++;
        if (c =3D=3D 0 || q >=3D buf + buf_size - 1)
            break;
        *q++ =3D c;
    }
    *q =3D '\0';
}

There is one extra check, q >=3D buf + buf_size - 1. No strlen().

0000000000000000 <pstrcpy>:
   0:   48 83 ec 18             sub    $0x18,%rsp
   4:   64 48 8b 04 25 28 00    mov    %fs:0x28,%rax
   b:   00 00
   d:   48 89 44 24 10          mov    %rax,0x10(%rsp)
  12:   31 c0                   xor    %eax,%eax
  14:   85 f6                   test   %esi,%esi
  16:   7e 39                   jle    51 <pstrcpy+0x51>
  18:   0f b6 0a                movzbl (%rdx),%ecx
  1b:   84 c9                   test   %cl,%cl
  1d:   74 2f                   je     4e <pstrcpy+0x4e>
  1f:   48 63 c6                movslq %esi,%rax
  22:   48 8d 44 07 ff          lea    -0x1(%rdi,%rax,1),%rax
  27:   48 39 c7                cmp    %rax,%rdi
  2a:   73 22                   jae    4e <pstrcpy+0x4e>
  2c:   48 83 c2 01             add    $0x1,%rdx
  30:   eb 0b                   jmp    3d <pstrcpy+0x3d>
  32:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)

The copying loop follows until 4c:
  38:   48 39 c7                cmp    %rax,%rdi
  3b:   73 11                   jae    4e <pstrcpy+0x4e>
  3d:   88 0f                   mov    %cl,(%rdi)
  3f:   0f b6 0a                movzbl (%rdx),%ecx
  42:   48 83 c7 01             add    $0x1,%rdi
  46:   48 83 c2 01             add    $0x1,%rdx
  4a:   84 c9                   test   %cl,%cl
  4c:   75 ea                   jne    38 <pstrcpy+0x38>

  4e:   c6 07 00                movb   $0x0,(%rdi)
  51:   48 8b 44 24 10          mov    0x10(%rsp),%rax
  56:   64 48 33 04 25 28 00    xor    %fs:0x28,%rax
  5d:   00 00
  5f:   75 05                   jne    66 <pstrcpy+0x66>
  61:   48 83 c4 18             add    $0x18,%rsp
  65:   c3                      retq
  66:   66 90                   xchg   %ax,%ax
  68:   e8 00 00 00 00          callq  6d <pstrcpy+0x6d>
  6d:   0f 1f 00                nopl   (%rax)

The extra check in the loop adds the instructions at 38 and 3b. Those
are not memory accesses, so the cost should be marginal.

> So the reason I dislike using it blindly is that often you already
> know that a buffer overflow is out of question, and secondly if
> misused, it can hide a real bug where the string gets unintentionally
> truncated and as a result something worse than an overflow happens
> (e.g. a file on disk is overwritten) without noticing whereas a
> segfault would be noticed.

The 'segfault' can be much, much worse:
http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=3Dbuffer+overflow

It's true that truncation can also cause bugs without proper checks,
but still there shouldn't be a direct route to 'allow remote attackers
to execute arbitrary code'.