From: mbizon@freebox.fr (Maxime Bizon)
To: linux-arm-kernel@lists.infradead.org
Subject: gcc miscompiles csum_tcpudp_magic() on ARMv5
Date: Thu, 12 Dec 2013 16:26:23 +0100 [thread overview]
Message-ID: <1386861983.25449.13.camel@sakura.staff.proxad.net> (raw)
In-Reply-To: <20131212150027.GP4360@n2100.arm.linux.org.uk>
On Thu, 2013-12-12 at 15:00 +0000, Russell King - ARM Linux wrote:
> This is different - the compiler has recognised in both cases that the
> addition od 42 to foo is useless as the result is not used, and
> therefore has optimised the addition away. In the second case, it has
> realised that the narrowing cast used to then add 42 to is also not
> used, and it has also optimised that away.
I was just trying to show that in the inline case the generated code was
not the same (apparently there was a misunderstanding on what I was
arguing about).
Now if you want a more interesting/on topic example:
#include <stdint.h>
static __attribute__((noinline)) unsigned int set_bit_2(unsigned int foo)
{
foo |= 0x4;
return foo;
}
#define local_swab16(x) ((uint16_t)( \
(((uint16_t)(x) & (uint16_t)0x00ffU) << 8) | \
(((uint16_t)(x) & (uint16_t)0xff00U) >> 8)))
int func(uint16_t a)
{
a = set_bit_2(local_swab16(a));
if ((a & 0xffff) == 0x1234)
return 1;
return 0;
}
00000000 <set_bit_2>:
0: e3800004 orr r0, r0, #4
4: e12fff1e bx lr
00000008 <func>:
8: e92d4008 push {r3, lr}
c: e1a03420 lsr r3, r0, #8
10: e1830400 orr r0, r3, r0, lsl #8
14: e1a00800 lsl r0, r0, #16
18: e1a00820 lsr r0, r0, #16
1c: ebfffff7 bl 0 <set_bit_2>
20: e3a03c12 mov r3, #4608 ; 0x1200
24: e1a00800 lsl r0, r0, #16
28: e2833034 add r3, r3, #52 ; 0x34
2c: e1530820 cmp r3, r0, lsr #16
30: 03a00001 moveq r0, #1
34: 13a00000 movne r0, #0
38: e8bd8008 pop {r3, pc}
you can see here the full swab16 code, with the narrowing before calling
set_bit_2() per calling convention.
Now remove the noninline:
00000000 <func>:
0: e3a03c12 mov r3, #4608 ; 0x1200
4: e1a02420 lsr r2, r0, #8
8: e1820400 orr r0, r2, r0, lsl #8
c: e3802004 orr r2, r0, #4
10: e1a02802 lsl r2, r2, #16
14: e2833034 add r3, r3, #52 ; 0x34
18: e1530822 cmp r3, r2, lsr #16
1c: 03a00001 moveq r0, #1
20: 13a00000 movne r0, #0
24: e12fff1e bx lr
and you see that the double shift is removed because gcc knows it is not
needed by the remaining code, which uses only the lower 16 bits.
This is IMO exactly what happen in the csum case, the inline assembly
rule (the one we are not aware of) must be that you cannot use the 16
higher bits of the register in the assembly code, and so gcc takes the
liberty not to zero extend the register.
--
Maxime
next prev parent reply other threads:[~2013-12-12 15:26 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-12 12:14 gcc miscompiles csum_tcpudp_magic() on ARMv5 Maxime Bizon
2013-12-12 12:40 ` Russell King - ARM Linux
2013-12-12 13:36 ` Maxime Bizon
2013-12-12 13:48 ` Måns Rullgård
2013-12-12 14:10 ` Maxime Bizon
2013-12-12 14:19 ` Willy Tarreau
2013-12-12 14:28 ` Maxime Bizon
2013-12-12 14:42 ` Måns Rullgård
2013-12-12 14:52 ` Maxime Bizon
2013-12-12 14:58 ` Måns Rullgård
2013-12-12 15:00 ` Russell King - ARM Linux
2013-12-12 15:26 ` Maxime Bizon [this message]
2013-12-12 15:07 ` Willy Tarreau
2013-12-12 15:18 ` Måns Rullgård
2013-12-12 15:28 ` Willy Tarreau
2013-12-12 15:43 ` Russell King - ARM Linux
2013-12-12 15:50 ` Måns Rullgård
2013-12-12 14:37 ` Måns Rullgård
2013-12-12 14:40 ` Maxime Bizon
2013-12-12 14:47 ` Måns Rullgård
2013-12-12 14:26 ` Måns Rullgård
2013-12-12 14:48 ` Russell King - ARM Linux
2013-12-12 15:00 ` Måns Rullgård
2013-12-12 15:04 ` Maxime Bizon
2013-12-12 15:41 ` Russell King - ARM Linux
2013-12-12 16:04 ` Måns Rullgård
2013-12-12 16:04 ` Willy Tarreau
2013-12-12 16:47 ` Russell King - ARM Linux
2013-12-12 17:11 ` Willy Tarreau
2013-12-12 17:20 ` Russell King - ARM Linux
2013-12-12 17:35 ` Willy Tarreau
2013-12-12 18:07 ` Nicolas Pitre
2013-12-12 22:30 ` Maxime Bizon
2013-12-12 22:36 ` Russell King - ARM Linux
2013-12-12 22:44 ` Maxime Bizon
2013-12-12 22:48 ` Russell King - ARM Linux
2013-12-12 17:34 ` Maxime Bizon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1386861983.25449.13.camel@sakura.staff.proxad.net \
--to=mbizon@freebox.fr \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).