gcc miscompiles csum_tcpudp_magic() on ARMv5

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: mbizon@freebox.fr (Maxime Bizon)
To: linux-arm-kernel@lists.infradead.org
Subject: gcc miscompiles csum_tcpudp_magic() on ARMv5
Date: Thu, 12 Dec 2013 16:26:23 +0100	[thread overview]
Message-ID: <1386861983.25449.13.camel@sakura.staff.proxad.net> (raw)
In-Reply-To: <20131212150027.GP4360@n2100.arm.linux.org.uk>


On Thu, 2013-12-12 at 15:00 +0000, Russell King - ARM Linux wrote:

> This is different - the compiler has recognised in both cases that the
> addition od 42 to foo is useless as the result is not used, and
> therefore has optimised the addition away.  In the second case, it has
> realised that the narrowing cast used to then add 42 to is also not
> used, and it has also optimised that away.

I was just trying to show that in the inline case the generated code was
not the same (apparently there was a misunderstanding on what I was
arguing about).

Now if you want a more interesting/on topic example:

#include <stdint.h>

static __attribute__((noinline)) unsigned int set_bit_2(unsigned int foo)
{
        foo |= 0x4;
        return foo;
}

#define local_swab16(x) ((uint16_t)(                                    \
        (((uint16_t)(x) & (uint16_t)0x00ffU) << 8) |                  \
        (((uint16_t)(x) & (uint16_t)0xff00U) >> 8)))

int func(uint16_t a)
{
        a = set_bit_2(local_swab16(a));
        if ((a & 0xffff) == 0x1234)
                return 1;
        return 0;
}

00000000 <set_bit_2>:
   0:	e3800004 	orr	r0, r0, #4
   4:	e12fff1e 	bx	lr

00000008 <func>:
   8:	e92d4008 	push	{r3, lr}
   c:	e1a03420 	lsr	r3, r0, #8
  10:	e1830400 	orr	r0, r3, r0, lsl #8
  14:	e1a00800 	lsl	r0, r0, #16
  18:	e1a00820 	lsr	r0, r0, #16
  1c:	ebfffff7 	bl	0 <set_bit_2>
  20:	e3a03c12 	mov	r3, #4608	; 0x1200
  24:	e1a00800 	lsl	r0, r0, #16
  28:	e2833034 	add	r3, r3, #52	; 0x34
  2c:	e1530820 	cmp	r3, r0, lsr #16
  30:	03a00001 	moveq	r0, #1
  34:	13a00000 	movne	r0, #0
  38:	e8bd8008 	pop	{r3, pc}

you can see here the full swab16 code, with the narrowing before calling 
set_bit_2() per calling convention.


Now remove the noninline:

00000000 <func>:
   0:	e3a03c12 	mov	r3, #4608	; 0x1200
   4:	e1a02420 	lsr	r2, r0, #8
   8:	e1820400 	orr	r0, r2, r0, lsl #8
   c:	e3802004 	orr	r2, r0, #4
  10:	e1a02802 	lsl	r2, r2, #16
  14:	e2833034 	add	r3, r3, #52	; 0x34
  18:	e1530822 	cmp	r3, r2, lsr #16
  1c:	03a00001 	moveq	r0, #1
  20:	13a00000 	movne	r0, #0
  24:	e12fff1e 	bx	lr

and you see that the double shift is removed because gcc knows it is not
needed by the remaining code, which uses only the lower 16 bits.

This is IMO exactly what happen in the csum case, the inline assembly
rule (the one we are not aware of) must be that you cannot use the 16
higher bits of the register in the assembly code, and so gcc takes the
liberty not to zero extend the register.

-- 
Maxime

next prev parent reply	other threads:[~2013-12-12 15:26 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-12 12:14 gcc miscompiles csum_tcpudp_magic() on ARMv5 Maxime Bizon
2013-12-12 12:40 ` Russell King - ARM Linux
2013-12-12 13:36   ` Maxime Bizon
2013-12-12 13:48     ` Måns Rullgård
2013-12-12 14:10       ` Maxime Bizon
2013-12-12 14:19         ` Willy Tarreau
2013-12-12 14:28           ` Maxime Bizon
2013-12-12 14:42             ` Måns Rullgård
2013-12-12 14:52               ` Maxime Bizon
2013-12-12 14:58                 ` Måns Rullgård
2013-12-12 15:00                 ` Russell King - ARM Linux
2013-12-12 15:26                   ` Maxime Bizon [this message]
2013-12-12 15:07               ` Willy Tarreau
2013-12-12 15:18                 ` Måns Rullgård
2013-12-12 15:28                   ` Willy Tarreau
2013-12-12 15:43                     ` Russell King - ARM Linux
2013-12-12 15:50                       ` Måns Rullgård
2013-12-12 14:37           ` Måns Rullgård
2013-12-12 14:40             ` Maxime Bizon
2013-12-12 14:47               ` Måns Rullgård
2013-12-12 14:26         ` Måns Rullgård
2013-12-12 14:48     ` Russell King - ARM Linux
2013-12-12 15:00       ` Måns Rullgård
2013-12-12 15:04       ` Maxime Bizon
2013-12-12 15:41         ` Russell King - ARM Linux
2013-12-12 16:04           ` Måns Rullgård
2013-12-12 16:04           ` Willy Tarreau
2013-12-12 16:47             ` Russell King - ARM Linux
2013-12-12 17:11               ` Willy Tarreau
2013-12-12 17:20                 ` Russell King - ARM Linux
2013-12-12 17:35                   ` Willy Tarreau
2013-12-12 18:07                   ` Nicolas Pitre
2013-12-12 22:30               ` Maxime Bizon
2013-12-12 22:36                 ` Russell King - ARM Linux
2013-12-12 22:44                   ` Maxime Bizon
2013-12-12 22:48                     ` Russell King - ARM Linux
2013-12-12 17:34           ` Maxime Bizon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1386861983.25449.13.camel@sakura.staff.proxad.net \
    --to=mbizon@freebox.fr \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).