Re: [Qemu-devel] [PATCH] target-sparc: fix 32-bit truncation in fpackfix

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Paolo Bonzini <pbonzini@redhat.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: Blue Swirl <blauwirbel@gmail.com>,
	Peter Maydell <peter.maydell@linaro.org>,
	Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>,
	QEMU Developers <qemu-devel@nongnu.org>,
	Richard Henderson <rth@twiddle.net>
Subject: Re: [Qemu-devel] [PATCH] target-sparc: fix 32-bit truncation in fpackfix
Date: Thu, 5 Nov 2015 10:11:10 +0100	[thread overview]
Message-ID: <563B1D2E.4060400@redhat.com> (raw)
In-Reply-To: <87lhadfy7x.fsf@blackfin.pond.sub.org>



On 04/11/2015 18:53, Markus Armbruster wrote:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
>> On 04/11/2015 15:07, Markus Armbruster wrote:
>>> Paolo Bonzini <pbonzini@redhat.com> writes:
>>>
>>>> On 04/11/2015 12:05, Richard Henderson wrote:
>>>>> On 11/04/2015 11:45 AM, Paolo Bonzini wrote:
>>>>>>>>>>>>            int32_t src = rs2 >> (word * 32);
>>>>>>>>>>>> -        int64_t scaled = src << scale;
>>>>>>>>>>>> +        int64_t scaled = (int64_t)src << scale;
>>>>>>>>>>>>            int64_t from_fixed = scaled >> 16;
>>>>> ...
>>>>>>>
>>>>>>> I do think we'd be better served by casting to uint64_t on that line.
>>>>>>> Note that fpackfix requires the same correction.  And it wouldn't hurt
>>>>>>> to cast to uint32_t in fpack16, lest we anger the self-same shifting
>>>>>>> gods.
>>>>>>
>>>>>> Hmmm.. say src = -0x80000000, scale = 1;
>>>>>>
>>>>>> scaled     = (uint64_t)-0x8000000 << 1 = 0xffffffff00000000
>>>>>> from_fixed = 0xffffffff00000000 >> 16  = 0x0000ffffffff0000
>>>>>>
>>>>>> Now from_fixed is positive and you get 32767 instead of -32768.  In
>>>>>> other words, we would have to cast to uint64_t on the scaled assignment,
>>>>>> and back to int64_t on the from_fixed assignment.  I must be
>>>>>> misunderstanding your suggestion.
>>>>>
>>>>>   int64_t scaled = (uint64_t)src << scale;
>>>>>
>>>>> I.e. one explicit conversion and one implicit conversion.
>>>>
>>>> That does the job, but it also does look like a typo...
>>>
>>> Make the implicit conversion explicit then.
>>
>> Sorry, but I'll say it again: there's _no way_ that a sane compiler will
>> _ever_ use this particular bit of undefined behavior.
>>
>> I'm generally against uglifying the code to placate ubsan, but
>> especially so in this case: it is not common code and it would only
>> affect people running fpackfix under ubsan.
> 
> Oh, I don't disagree at all with "let's not uglify code".
> 
> I wish compilers hadn't become so nasty, though.  I wish they had more
> respect for the imperfect real-world code they compile, and less
> benchmark worship.

It's not benchmark worship.  For example, being able to optimize "x * 6
/ 2" to "x * 3" is useful, but it's only possible if you can treat
integer overflow as undefined.  In fact GCC compiles it to

	leal	(%rdi,%rdi,2), %eax 	add	r0, r0, r0, lsl #1

(x86 and ARM respectively) for int, and to

	leal	(%rdi,%rdi,2), %eax	mov	r3, r0, asl #3
	andl	$2147483647, %eax	sub	r0, r3, r0, asl #1
					mov	r0, r0, lsr #1

for unsigned int.  This is less efficient; stuff like this happens in
*real programs* and even in hot loops.  For an even more extreme case,
"x * 10000000 / 1000000" with int and unsigned:

	leal	(%rdi,%rdi,4), %eax	mov	r3, r0, asl #3
	addl	%eax, %eax		add	r0, r3, r0, lsl #1

vs.

	imull	$10000000, %edi, %edx	movw	r3, #38528
	movl	$1125899907, %ecx	movw	r2, #56963
	movl	%edx, %eax		movt	r3, 152
	mull	%ecx			movt	r2, 17179
	movl	%edx, %eax		mul	r0, r3, r0
	shrl	$18, %eax		umull	r0, r1, r0, r2
					mov	r0, r1, lsr #18

(For completeness I'll note that this optimization may also _hide_ bugs
on ints, which can be both a good or a bad thing; compiler warnings and
static analysis can help fixing the code).

Similarly for optimizing

    for (i = 0; i <= n; i++)
      p[i] = 0;

to any of

    for (q = p, r = p + n; q <= r; q++)
      *q = 0;

or

    for (q = p, i = n + 1; i-- > 0; q++)
      *q = 0;

Both of which are cases of strength reduction that are expected by any
optimizing compiler (without even going into vectorization).  Yet they
are not possible without treating overflow as undefined.

The last may seem strange, but it's easy for a compiler to look at the
original loop and derive that it has n + 1 iterations.  This can be used
with hardware loop counter registers (e.g. CTR on PowerPC) or
decrement-and-loop instructions (e.g. bdnz on PowerPC, loop on x86).

DSP code, for example, is full of code like this where n is known at
compile time, and one would have to write assembly code if the compiler
didn't about these instruction.

As for the so-much-loathed type-based alias analysis, it lets you
optimize this:

    void f(float **a, float **b, int m, int n)
    {
      int i, j;
      for (i = 0; i < m; i++)
        for (j = 0; j < n; j++)
          b[i][j] = a[i][j];
    }

to this:

    void f(float **a, float **b, int m, int n)
    {
      int i, j;
      for (i = 0; i < m; i++) {
        float *ai = a[i], *bi = b[i];
        for (j = 0; j < n; j++)
          bi[j] = ai[j];
      }
    }

Compiler writers don't rely on undefined behavior out of spite.  There
_is_ careful analysis of what kind of code could be broken by it, and
typically there are also warning mechanisms (-Wstrict-aliasing,
-Wstrict-overflow, -Wunsafe-loop-optimizations) to help the programmer.

It's not a coincidence that left shifting of signed negative numbers a)
is not relied on by neither GCC nor clang; b) is the source of the wide
majority of ubsan failures for QEMU.

Paolo

next prev parent reply	other threads:[~2015-11-05  9:11 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-02 14:05 [Qemu-devel] [PATCH] target-sparc: fix 32-bit truncation in fpackfix Paolo Bonzini
2015-11-02 14:09 ` Peter Maydell
2015-11-02 14:48   ` Paolo Bonzini
2015-11-02 15:13     ` Peter Maydell
2015-11-02 15:50       ` Paolo Bonzini
2015-11-04 10:12       ` Richard Henderson
2015-11-04 10:45         ` Paolo Bonzini
2015-11-04 11:05           ` Richard Henderson
2015-11-04 12:46             ` Paolo Bonzini
2015-11-04 14:07               ` Markus Armbruster
2015-11-04 16:06                 ` Paolo Bonzini
2015-11-04 17:53                   ` Markus Armbruster
2015-11-05  9:11                     ` Paolo Bonzini [this message]
2015-11-04 23:36             ` Mark Cave-Ayland
2015-11-05  9:12               ` Paolo Bonzini
2015-11-05  9:20                 ` Richard Henderson
2015-11-05  9:25                   ` Paolo Bonzini
2015-11-05  9:28                     ` Richard Henderson
2015-11-05  9:43                       ` Paolo Bonzini
2015-11-06 15:33                     ` Mark Cave-Ayland
2015-11-06 15:43                       ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=563B1D2E.4060400@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=armbru@redhat.com \
    --cc=blauwirbel@gmail.com \
    --cc=mark.cave-ayland@ilande.co.uk \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).